Are large language models advanced enough to create autonomous AI agents that can plan tasks and assign them to other specialized agents? In the early stages of LLMs, there were problems like models getting stuck in loops or having small context windows. With the release of powerful models like Claude 3.5 Sonnet, O-1, and Gemini 2.0, have these issues been solved? Has anyone built autonomous systems that are actually ready for production use yet?
Welcome to this forum! Please check out the guidelines for posting.
Here are some quick rules for posting here:
- Posts should be more than 100 characters—please be as detailed as you can.
- Your question might have already been answered. Try searching before posting.
- Feel free to discuss both the pros and cons of AI, just keep it respectful.
- Don’t forget to provide links to support your points.
- No questions are too silly, unless you’re seriously asking if AI is the end of the world (it’s not).
Feel free to reach out to the moderators if you have any questions or need help!
Great question! I work with AI models daily at Jenova AI, and we’ve tested several agent frameworks. While there have been big improvements in task planning and reasoning with models like Claude 3.5 Sonnet and O-1, full autonomy is still a long way off.
The main issues aren’t just technical, like small context windows or loops. It’s more about reliability and control. Even the newest models still make unpredictable decisions or fail to properly break down complex tasks.
We’re seeing better results with “supervised autonomy,” where AI does most of the work, but humans step in for key decisions. At Jenova AI, we’ve been using this approach for things like model routing and search.
I’m hopeful that we’ll see specialized autonomous agents for specific tasks within 6-12 months, but general-purpose agents might still be years away.
@Uma
That’s pretty exciting! It’s a good point that there will be growing pains, but I’m sure in just a few years, things will be much better.
I think it will happen next week. I’m hoping OpenAI will release something like Operator.
Check out this blog post: Introducing Devin by Cognition.AI
We’ve been testing it out and it’s making some changes to how we work. It does a pretty good job with some tasks. It’s like assigning work to someone and checking in later.
@Brady
This looks pretty cool. Is this your project?
Zane said:
@Brady
This looks pretty cool. Is this your project?
I wish! I’ve been following them since March. They’ve raised a lot of money for this platform. Check out the Forbes interview. This might become a big player—or it might fail. Who knows?
Here’s a link to a Forbes article: Forbes Interview with Cognition Labs
Another platform to watch is Magic.dev. Eric Steinberger is one to keep an eye on in this space.
And here’s another article on Magic.ai: Magic AI Startup Valuation
@Brady
I’m watching it now! I’m 5 minutes in and it’s awesome. Thanks for sharing!
LLMs can do some cool stuff with agents, but they’re not fully autonomous or self-learning yet. I think active inference will be the answer.
Here are some resources on active inference and the free energy principle. This might be the key to achieving real-time learning and, eventually, AGI.
Behind the Scenes with Active Inference
I think it will be like fully autonomous cars—they’ll always be “nearly there” but never quite fully realized.
Soren said:
I think it will be like fully autonomous cars—they’ll always be “nearly there” but never quite fully realized.
Yeah, that makes a lot of sense.
Soren said:
I think it will be like fully autonomous cars—they’ll always be “nearly there” but never quite fully realized.
It’s like that joke about fusion, huh? Always 30 years away. Funny thing is, we actually achieved net positive energy in December 2022, so maybe we’re closer than we think.
There’s no simple answer to this question. It depends on how complex the task is. It’s not much different from asking, “When will LLMs write production-quality code?” In some cases, they already can. In others, it’s still years away. Same goes for agents.
@Vic
This sums it up perfectly. No perfect answer because there are so many variables. But it seems like AI is moving quickly behind the scenes, so we should see big breakthroughs in the next 2-3 years.
I’d say the first half of 2025.
JohnMarcRubio said:
I’d say the first half of 2025.
Yes, that’s what I think too. The technology is mostly there, but we’re still missing the infrastructure to fully deploy it. It’s coming soon.
Yeah, they look like real people too.
It’s more of a spectrum. To some degree, they’re already here.
I’d say 1-3 years is a safe estimate, especially for specific industries like customer support, automated operations, and data processing. AI can now break down complex tasks step-by-step, and with multimodal models, they’re more useful for real-world tasks. Models like Sonnet 3.5 and Gemini 2.0 can handle longer contexts, so issues like loops aren’t as big of a problem anymore.
But fully autonomous agents making decisions in all scenarios? Probably 5-10 years, maybe longer. There’s still a lot to figure out about long-term memory and true autonomy.