Show HN: Claude's Code – tracking the 19M+ commits generated by Claude on GitHub

www.claudescode.dev

23 points by phantomCupcake 3 weeks ago

I was curious about the amount of code on GitHub that is generated with Claude Code, and this is my attempt at finding that answer.

Spoiler alert: It's a lot - around 19M commits by my count

In a nutshell it is a dashboard that presents some basic, and hopefully, interesting stats about commits signed by Claude Code on GitHub - in public repos.

Not all commits are signed (via the author field, or a commit "trailer"), and many repos are private, which means Claude's reach is probably wider than what you see here. But I think it's enough to see the spread and learn a bit about how it's used.

Technology wise, it's a pretty basic Next.js app with Recharts for graphing and PostgreSQL for the DB. I started off with using BigQuery because I estimated I would need the analytical scale, but I eventually pivoted to Postgres because the small writes, and frequent reads for deduplication became too expensive.

The ingestion/backfill job is the more interesting part since I went from severely under-engineering it (start smol and all that) to ending up with a bare-bones, but capable, ETL pipeline.

Primarily, the challenge to overcome in reading the data, was GitHub's rate limits - both on their search API, and on their GraphQL API. On search it's 30 req/min, on GraphQL it's 5000 req/hour - per access token. Because of this difference, and response time differences, I split the work:

  1. There is a batch of search workers that write basic commit info to a table - paging and splitting to find as many of the commits as we can.

  2. Enrichment workers read from that table and fill in some of the info that we can't see in the search. Lines added/deleted, and repo information, is added this way.

There is a bit of lag in reading these commits currently, and it is still pulling historical commits, which is why the most recent dates are a bit low on commits, and why some repos don't yet have a language set yet.

I wouldn't say it's 100% done - I want to improve the ingestion still, and I think there is more I can extract from the data - but I have definitely enjoyed looking at what I have so far.

Let me know if you have an idea for what I can add to the dashboard, or can think of something else I should also be reading.

For some more info on my methodology and the evolution of the backfill job, head to the About page. :-)

louiereederson 3 weeks ago

This is great, thank you! I've been looking for something like this for a while now.

Interesting that whoever is developing BroadwayScore.com appears to be one of the biggest users of Claude Code, if I'm reading this correctly (25m lines created, 20m lines deleted when 'since launch' is toggled). Might give you an idea of how this is being used. https://github.com/thomaspryor/Broadwayscore

phantomCupcake 3 weeks ago

I'm glad you like it!
I have been enjoying looking into the projects that use it heavily. That one, for instance, was entirely built this year and the owner hasn't been active on GitHub before - again showing that agents are inviting people who either didn't have the skill or didn't have the time to build out some of their ideas.
Another view I like keeping an eye on is projects with higher star ratings - that often excludes the "pet projects" and gives you an idea of how larger teams or popular repos are applying it differently to the general "vibe coders".

avanwyk 3 weeks ago

Did you try out DuckDB?

phantomCupcake 3 weeks ago

I considered it, but when I started the project I was using a Supabase DB for the backend since their free-tier is quite nice, and after the switch to and from BigQuery, PostgreSQL was the easier migration. I also thought Postgres might be more suited to the backfill/ingestion job due to the frequent row-level reads and writes. That said, I know DuckDB can use Postgres in the backend, and I might consider that if the current model starts to struggle.
I have also seen some benchmarks that suggest the gap between DuckDB and Postgres isn't always so substantial: https://jsonbench.com/#eyJzeXN0ZW0iOnsiQ2xpY2tIb3VzZSI6dHJ1Z...
- tomjakubowski 3 weeks ago
  
  What makes performance on JSONBench relevant to this project? Are you storing and querying large JSON object blobs in the database?
  
  phantomCupcake 3 weeks ago
  
  That's a fair question, I shouldn't have posted that without context. I was initially considering storing JSON to speed up the ingestion since the results are in JSON, which is how I found JSONBench. But, of course, parsing the JSON isn't the bottleneck for me - rate limits and response time is - so I didn't end up going that route.
  What I mention first in my message was a much bigger driver - convenience. If the analytics become much more complex I might revisit DuckDB or another OLAP solution.