Thoughts on code generation AI agents
Contents
I tried Google Jules and Claude Code on a relatively large project and had them refactor some aspects that touched many API surfaces. Here are my opinions on them.
Google Jules
- Surprisingly good at understanding the task and coming up with a relatively solid implementation plan.
- Very good at understanding the project structure and knowing exactly which files to modify for the task.
Negatives:
- The feedback loop is very poor because it runs in its own sandbox environment. I can’t navigate or run the project there; I have to ask it to commit and pull the changes to work with them.
- Although it understands most of the structure and coding standards, it gets many third-party libraries wrong and invents nonexistent methods. It also has a hard time with internal packages that have complex dependencies and often uses them incorrectly.
- It requires a lot of manual intervention, but it doesn’t make this need clear. Sometimes I had to go through the full diff just to figure out what required attention.
- The implementation was poor in some cases. It wrote too many comments, even for obvious code, and often left dead code in comments for edge cases.
- Some of these issues could be improved with better prompts, but there is no obvious way to include rules, similar to Cursor’s rules or a
claude.md
file.
Claude Code
- Excellent feedback loop. It runs locally, so I can quickly see and edit the changes. It’s also relatively fast.
- Also quite good at understanding the task.
- It provides clear instructions for manual intervention, and many tasks (like command execution) can be pre-approved in the configuration file.
- Extra context can be provided via a
CLAUDE.md
file for things like rules, coding style, documentation instructions, etc. - Many of its implementations were very good, while some were just okay.
Negatives:
- It runs for quite a while and often edits the same files multiple times.
- It runs into errors frequently, which triggers a cycle of rethinking and re-editing.
- It burns through tokens like crazy. Too expensive.