Contents

Thoughts on code generation AI agents

I tried Google Jules and Claude Code on a relatively large project and had them refactor some aspects that touched many API surfaces. Here are my opinions on them.

Google Jules

  • Surprisingly good at understanding the task and coming up with a relatively solid implementation plan.
  • Very good at understanding the project structure and knowing exactly which files to modify for the task.
  • The feedback loop is very poor because it runs in its own sandbox environment. I can’t navigate or run the project there; I have to ask it to commit and pull the changes to work with them.
  • Although it understands most of the structure and coding standards, it gets many third-party libraries wrong and invents nonexistent methods. It also has a hard time with internal packages that have complex dependencies and often uses them incorrectly.
  • It requires a lot of manual intervention, but it doesn’t make this need clear. Sometimes I had to go through the full diff just to figure out what required attention.
  • The implementation was poor in some cases. It wrote too many comments, even for obvious code, and often left dead code in comments for edge cases.
  • Some of these issues could be improved with better prompts, but there is no obvious way to include rules, similar to Cursor’s rules or a claude.md file.

Claude Code

  • Excellent feedback loop. It runs locally, so I can quickly see and edit the changes. It’s also relatively fast.
  • Also quite good at understanding the task.
  • It provides clear instructions for manual intervention, and many tasks (like command execution) can be pre-approved in the configuration file.
  • Extra context can be provided via a CLAUDE.md file for things like rules, coding style, documentation instructions, etc.
  • Many of its implementations were very good, while some were just okay.
  • It runs for quite a while and often edits the same files multiple times.
  • It runs into errors frequently, which triggers a cycle of rethinking and re-editing.
  • It burns through tokens like crazy. Too expensive.