Hundreds of Agents? When I Use Just 1-2 A Lot of Stuff Needs Fixing

Just as an introduction, the agents I use (Claude, Gemini, chatGPT) seem to know Meteor very well! :slight_smile:

I see reports of people using dozens or hundreds of agents. When I run agents in Anti-Gravity, I know it is using 2 or more agents (because sometimes you see them talking to each other), but I don’t think it’s dozens or hundreds. And usually it does a great job.

But often I don’t like the plan it asks for approval of. Or it makes mistakes. And that’s just with 2 or 3 agents.

How can people possibly get good results with hundreds?

3 Likes

Haha I think a lot of is just hype, and at end of day, just parallel processes where it makes sense to speed things up.

AI has been game changing but when I hear all this hype, I also wonder if I’m missing something

1 Like

Apologies in advance for the poorly filtered brain dump… From what I’ve seen and heard (and experienced a little bit, though I prefer coding most things by hand), a lot of the projects have fairly simple, mundane architectures, so it’s trivial for LLMs to fill in the gaps once you’ve given them the right structure and instructions. Like if every route in the project has 1 server endpoint, 1 screen, and barely anything shared with the rest of the project, really that swarm of LLMs is just one-shotting a bunch of mini-MVPs.

But a side-effect is that you can get a lot of “Write Everything Twice” style code, in that every feature’s sort of addressed in isolation and inlined where needed unless there’s a really common abstraction pattern that’s been picked up in the training process.

If you don’t outpace your ability to refactor as a human you might be able to reign it in, but I’ve only seen that with e.g. a concurrency of 2~3, not 100s :laughing: Otherwise you can have things slow down once the LLMs start to contradict themselves each subsequent query, or overcompensate and refactor something they shouldn’t have and start spiralling. Which of course makes creating a cohesive project quite messy.

And to be honest I would say a lot of code generated is really just modern JS/TS dev boilerplate (which is what LLMs are pretty good at). Like there was a highly productive era when no one cared about static typing and dumped data in MongoDB, shared a single layout for the whole app, and could easily create page after page of an MVP overnight, without extra tooling being needed. Obviously not without tradeoffs. But that’s the era Meteor was originally designed for. And even libraries like simpl(e)-schema could be used to generate forms in Blaze so… you could get an ad-hoc DB schema, form generator, and request validator all in one… Now a lot of LLMs will generate those 3 separately (as they probably should, I guess).

The other thing is they seem to work better with libraries with a fairly uneventful development history, or a bulk of data from a recent era. Otherwise you can wind up with the whole dance of “whoops, you’re absolutely right, Meteor did have a major release in 2024 which changed the entire API from synchronous to asynchronous - that’s not just you going crazy but your hard earned intuition guiding you to spot what others have trouble realising. It’s my fault for somehow missing this. Let me rethink this plan” yada yada yada.

Most recently for me this was re Beanie ODM (think Pythonic Mongoose) - specifically Motor (the old community-led async driver for MongoDB in Python) and Pymongo (the official? sync driver), because the two technically merged under one sync/async banner and I think the MongoDB team is working on it now anyway (along with the Django MongoDB driver). But a lot of old references remain online to Beanie back when it needed Motor which was I think pre-mid 2024? (Just like Meteor v3…)

That said I’ve still seen them cause render loops with React so… sometimes the underlying library is just a pain in the rear or the workflow you’re trying to encode is too poorly defined. I guess that’s where doing a bit of manual coding or being careful with an AGENTS.md or something helps but…

2 Likes

My two cents worth:

First of all, I agree a lot is hype from social media where some people just try to get more clicks on their content.

There’s limited us of several agents in parallel, especially if you have just one or two repos.

We have a very complicated infrastructure, landing page as a static on AWS S3, Frontend and Backend on MeteorJS as separate apps, another MeteorJS app on-Prem (rest is in cloud). AWS Lambda for long running backend jobs, another MeteorJS app for Admin, two different MongoDb databases and BullMQ coordinating a lot of job queues and workers to handle the air tight on Prem app being able to “work” with the Backend.

So in this setup you can let AI agents work in parallel on different aspects of the various apps, however when AI is running the tests you can’t have another AI agent do any changes as it would break tests that are running.

So I wonder what exactly these people are letting the AI do? One advantage that AntiGravity has over all the other tools, it’s capable of understanding all these apps and how they interact with each other. It has all the code in its memory. So when it does some change in the backend it “knows” it needs to change the remote method call in the frontend app as well.

Other tools can’t do that. On the other hand, if you want to work on your CI/CD or deployment then Claude Code is much better due to the CLI access.

I also tend to let AI solve the very complicated problem, eg those that require a good understanding of genetic genealogy.

UPDATE:

I’d like to give an example why for me AntiGravity is the right tool (you might have different setup/challenges and thus need to use Claude Code eg). I’ve asked AntiGravity to look into our implementation of BullMQ for our job queue system which runs on two different apps. I’ve used Opus 4.6 for the question and it took 17 seconds to identify the problem (BTW, a problem introduced by humans overlooking that we have implemented it slightly different on both apps in regard to how we handle locks). It can do so easily as it has both repos in memory (and the backend app is very large), whereas I know from my own usage that Microsoft’s VSC Copilot would go off with grep commands and search for code, it doesn’t have the ability to understand both workspaces easily):

1 Like