Thoughts on code generation AI agents

2025-06-14 2025-06-14 336 words 2 minutes

Contents

I tried Google Jules and Claude Code on a relatively large project and had them refactor some aspects that touched many API surfaces. Here are my opinions on them.

Google Jules

Surprisingly good at understanding the task and coming up with a relatively solid implementation plan.
Very good at understanding the project structure and knowing exactly which files to modify for the task.

Negatives:

The feedback loop is very poor because it runs in its own sandbox environment. I can’t navigate or run the project there; I have to ask it to commit and pull the changes to work with them.
Although it understands most of the structure and coding standards, it gets many third-party libraries wrong and invents nonexistent methods. It also has a hard time with internal packages that have complex dependencies and often uses them incorrectly.
It requires a lot of manual intervention, but it doesn’t make this need clear. Sometimes I had to go through the full diff just to figure out what required attention.
The implementation was poor in some cases. It wrote too many comments, even for obvious code, and often left dead code in comments for edge cases.
Some of these issues could be improved with better prompts, but there is no obvious way to include rules, similar to Cursor’s rules or a claude.md file.

Claude Code

Excellent feedback loop. It runs locally, so I can quickly see and edit the changes. It’s also relatively fast.
Also quite good at understanding the task.
It provides clear instructions for manual intervention, and many tasks (like command execution) can be pre-approved in the configuration file.
Extra context can be provided via a CLAUDE.md file for things like rules, coding style, documentation instructions, etc.
Many of its implementations were very good, while some were just okay.

Negatives:

It runs for quite a while and often edits the same files multiple times.
It runs into errors frequently, which triggers a cycle of rethinking and re-editing.
It burns through tokens like crazy. Too expensive.