Testing distributed systems with AI agents

I like the “claim-driven” framing.

For stateful systems, tests named after setup details often get weakened over time. Tests named after the claim they are trying to falsify are harder to water down.

The part I’d be most interested in is how well this works for business invariants like idempotent posting, no lost acknowledgements and recovery after partial failure.

shenli3514 23 hours ago

Idempotency is what bites me most in practice — I've been driving these against an unreleased database I work on. The main trap is using the op_id as the idempotency key rather than a business key the client reuses on retry. When they're the same thing, the checker is trivially true and the test passes without testing anything.
No-lost-ack is conceptually the same shape with a simpler property (every acked write shows up at the end), but it breaks the same way most checkers break — if the recorder treats timeouts as success or failure instead of "unknown," real lost writes silently disappear.
Recovery after partial failure is where the AI-agent angle gets shaky honestly. Quiescence is the hard part. Agents will declare a system "recovered" while compaction is still running in the background. The skill forces a three-part check (no in-flight ops, no pending background work, replicas converged) before the invariant runs. How reliably that holds up against a specific SUT, I'm still figuring out.
cyanydeez 22 hours ago

I think all these scripts become poor where they're context based as opposed to actual guardrails; what we need is various silo'd protocols like a ssh protocol that keeps the harness producing work through the protocol rather than a bunch of loosely based bash scripts, etc. Plus, the harness needs to be outside the environment so it's not something you have to install ever on a remote system, whether it's a container, a vm, a ssh location. We shouldn't base everything around running bash without a secure tunnel into the location of interest.
The failure mode of these tools is self destructive in many cases.

aphyr 22 hours ago

Welp. Glad to see Li Shen's using the last fifteen years of my work to automate away my job. :-/

-- edit --

I've seen clients and some colleagues working on things like this, and I can't seem to put into words how disheartening it is. With the exception of some private analysis work, I've shared everything I've built, with everyone, for free. Papers like Elle took years to think through, implement, test, and write. That's free. High-quality checkers, Knossos, Jepsen itself, and the analyses I've put my life into: all public, all free. I put a lot of time into docs and support; essentially all unpaid. I teach classes and give conference talks to make these techniques broadly accessible because I want other engineers to be able to make high-quality systems.

At the same time, I've got a giant pile of debt from an old house that just won't quit throwing curveballs at me, and it's gonna be a few more decades before I can retire. The fact that my clients are willing to pay for this work is why I can invest so much time in R&D and give it all away. When I see someone roll in and just tell an LLM "Go use Jepsen and Elle and figure this out", it's like... well fuck. Is this even possible any more?

Thankfully, LLMs are still really bad at my job, but I don't know if, or how long, that will last. They also don't need to be good to be useful.

And if these LLM tools work, it's good, right? They find bugs, systems get safer. I want systems to be safer. On the other hand, I'm motivated to share what I do because I really want to help people. If it's just LLMs... it feels hollow. I think about this every time I've tried to work on open-source in the last few months. When I spend hours trying to figure out how to keep naming consistent, how to preserve compatibility over a decade, how to make complex code approachable through quality documentation... I have a person in mind. Someone I'll never meet, but they'll see that work, and their life will be a little easier, and maybe they'll smile. I've been talking with my therapist about it: how the work I used to do thinking about other human beings now feels purposeless. How the effort I put into making these tools and ideas accessible will inevitably cannibalize my own employment, because someone, somewhere, is going to tell an LLM "Hey, go do that", and I work in a very, very small niche. It feels like incipient depression.

Recently I've been thinking about taking Jepsen and its supporting libraries closed-source, and changing the way I write reports--instead of teaching people how to test and what to look for, just telling people the results. I don't want to do this. It's bad for everyone, but maybe it buys me a few years of runway. Enough to pay down some of the debt and figure out what I can do next with this body.

Fuck.

cyanydeez 21 hours ago

unfortunately, according to the cults in San Fran, your only choice is to "git good" less you become part of the permanent underclass.
In other words: fascism is coming and you either lick the boot or you get stomped.
- digitaltrees 18 hours ago
  
  Unfortunately history supports this broadly.
goosejuice 20 hours ago

I have no idea if it would be fun or a good return on investment for you, but I would happily pay for a digital version of your distsys class aimed at practitioners. Some kind of ebook, perhaps with accompanying whiteboard/lectures.
- cyberpunk 19 hours ago
  
  Yeah, I’d pay for this too. And a book. :)
- camyule 7 hours ago
  
  I’d also happily pay for a book covering the work you do aphyr.
cyberpunk 19 hours ago

Yeah, I feel ya man. It’s a weird time to be in tech when it often times everything feels like the last years of our work are now instantly reproducible.
I’m not sure i wanna stay on the ride much longer, at least in a corp setting. I guess i don’t have much of a choice.
Thanks for Jepsen, though, it’s made a couple of my applications much better in ways I wouldn’t have managed without it; even if I have to relearn clojure every time I pick it up, and those applications resulted in real jobs and careers for a bunch of people. It’s not going to pay for your house, but it’s all I’ve got.
hugs.
digitaltrees 18 hours ago

There are a lot of create open source projects that have a paid infrastructure product that lets people pay to use the core tech. Perhaps you could productize your system. I would personally to pay to have something set up and useable so I don’t have to. I think the pattern you describe is clearly valuable
chickensong 15 hours ago

> And if these LLM tools work, it's good, right? They find bugs, systems get safer. I want systems to be safer. On the other hand, I'm motivated to share what I do because I really want to help people. If it's just LLMs... it feels hollow.
I get that you have a financial issue, but perhaps you don't need to be conflicted about about open-sourcing your work as far as helping people goes? LLMs are tools for people. Code, research, standards, etc... are all means to an end. Maybe the agent operator doesn't read or understand your work, but the guy who built the agent skills likely did. Progress moves upward, while standing on the work of those who came before us.
LLMs have lowered the barrier to creating software and can hide a lot of source material, but your work is clearly having an impact here. If your goal is to help people make better software, that's still what's happening. The industry shift is happening regardless, so we might as well embrace the positives instead of focusing on the negatives IMHO.
Moving to a closed-source model for financial reasons is a totally separate issue IMO, and I wish you good luck and prosperity regardless of your decision.
squirrellous 14 hours ago

It’s shocking to me that you of all people should have financial issues. You are a legend in the community! By all means, take things closed source if it even just helps a little bit. As a profession we’ve all been hurt by over-sharing.
deterministic 11 hours ago

OK so let me get this:
1. You chose to give your work away for free.
2. You are complaining that you haven't made money from your work.
Is that a fair interpretation of your argument?
nvarsj 6 hours ago

This sucks to hear.
I honestly think the rise of LLMs will be the death of open source in the long run. Already, apparently, quality of OSS has dropped significantly since 2025 (so most models stop training on github after this).
I don't think a lot of OSS authors quite understand the extent to which models like claude/codex rely on their work. I'd bet money there are extensive curated tasks using your tooling for post-training. With 0 attribution or anything, these models are using your work wholesale to build sophisticated agents that can do your job.
Yeah it's depressing as hell. I guess it's the same thing for artists and musicians and writers.
P.S. I can symphathise with the old house issues! I bought a 1901 terraced property, it's an absolute money pit.
pjmlp 5 hours ago

I relate a ton for what you are going through.
I discuss a lot of stuff, but that is because I am a nerd at heart, and rather play with technology, read papers, podcasts and stuff, than whatching depressing TV content.
However in the world of enterprise consulting a similar trend has been happening during the last 20 years.
First offshoring, then raise of cloud based infra, serverless, SaaS and iPaaS, and now AI based orchestrations on top of iPaaS and serverless.
Meaning for the same kind of requirements, a team playing puzzle with those kind of products can be reduced to one third of what it used to be required about a decade ago.
Then what happens to the other two thirds that now don't have anything to do, and whose salary is used instead on those licenses?
mrothroc 43 minutes ago

I've been specializing in distributed systems for nearly 35 years. I've read your work, and it's shaped my thinking. When you say you have a person in mind when you write, I am that person. Thank you for what you've done.
I don't think this replaces you. The hard part of reliability is understanding the failure modes in the context of the business. No one has unlimited time or money, we always have to make tradeoffs. Only experienced humans have both the ability to interrogate the stakeholders and a vision broad enough to understand what to pursue versus what to give up.
Tools like this make the grind part of the job easier. They do not replace the holistic view you need to be able to confidently tell someone "worry about X, do not worry about Y".

jumploops 21 hours ago

Indirectly related, but has anyone else found repeatable success with pure markdown skills?

I’ve built a similar workflow (but for system design/execution) and it works surprisingly well with the frontier models.

The skill includes scripts to ensure the work was actually done/followed, but I’ve been testing it without the scripts and it does a decent job.

Yesterday in GPT-5.5 xhigh[0] however I noticed some hallucinations, where the model stated it had created files, when in fact it hadn’t.

A small hiccup like this is usually fine, as the model realizes the files don’t exist sometime later, but in this particular instance, it claimed the files were created and then just continued on.

tl;dr - I fell into the trap of trusting markdown-only workflows, just to be bitten by the models hallucinating steps.

[0]xhigh is on, but in this particular turn there was no reasoning presented, so it may have been a degradation of the LLM/harness.