Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

94 points by nab 1 day ago

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.

We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exclusively using Codex and Claude Code. It’s been a huge change to how we code, and it’s been exhilarating seeing the models keep getting better – but we eventually realized that developing on localhost was holding us back:

- Git worktrees are clunky to set up and use for parallelizing work - It’s 2026, but somehow everyone is still walking around with laptops cracked open or SSHing into mac minis in their garage so their agents don’t stop working. - Mobile is treated like an afterthought even though coding is just texting now We started hitting resource constraints when multiple parallel agents test their own work by running the full app locally. - We tried different products, but couldn’t find any that solved all of our pain points – so we pivoted and decided to just build the ADE we wanted for ourselves.

Boxes.dev is a desktop and mobile app that lets you run Claude Code, Codex (using your subscription!), and the full dev environment for whatever you’re building, all on remote compute. It’s similar to Conductor or the Codex desktop app, except everything is in the cloud.

We use coding agents to scan your local dev setup and port it to the cloud. Then every Claude Code/Codex thread starts from a snapshot of the full setup, with its own filesystem and compute. No more git worktrees, no more cracked-open laptops, and your coding agents can actually test their work end-to-end because they can run your full app in isolation.

We’ve mirrored the Claude Code and Codex UX to feel natural to power users, and also have a fully-featured mobile app (no handoffs or remote control), plus scheduled automations and a Slack integration.

We’re obviously biased, but we’ve been building boxes.dev with boxes.dev for months and it’s honestly been a gamechanger. It’s hard to go back once you realize how much localhost has been limiting you; based on early feedback from beta testers, we’re increasingly sure that cloud is the future of agentic coding.

We’d love for you to experience it yourselves! Would appreciate any feedback – and happy to answer any questions on this thread.

cadamsdotcom 18 hours ago

Hello gents, some quick feedback.

I think when you say “ditch localhost” you’re telling me to ditch my fast, instant-response laptop which I own and can peg the CPU of 24/7 for $0, in favour of a tiny cloud VM that I rent forever.

Your infra to run agents and builds for me is compared in my mind to a shell script an agent wrote a year ago and I reviewed once, that fires up my dev server and a local psql (5-10mb ram) on a dynamically allocated port hashed off the name of my current worktree, which it does so it doesn’t clash with other parallel work.

When the internet slows or dies I rarely notice.

As a cost conscious person who likes it when letters appear as I type, I think I might not be your ICP.

Am I being an asshole? Maybe. Am I going out of my way to tell you what goes on in the minds of people like me when we see offerings like this? Also maybe.

alasano 18 hours ago

It's clearly not for you, you built yourself a version that works for you.
It's not me for either currently but that doesn't mean it's not useful.
Although there's a ton of other providers with their own more or less managed/opinionated flavor of agentic sandboxes.
We cataloged a bunch here
https://engine.build/lab/agent-sandboxes
I'll also add boxes.dev to that list.
But yeah overall your tone was asshole flavored lol. Maybe being a bit of a hater too.

iloveluce 1 day ago

Interesting. Given that OpenAI and Anthropic are steadily moving down the stack (e.g. remote execution, Codex desktop, Claude Code integrations), how do you think about defensibility? Do you expect the labs to eventually offer a cloud-native ADE themselves, and if so, what advantage do you think an independent platform retains?

Also, do you see Boxes supporting OpenCode and self-hosted/local models in the future? If the rented machines have enough RAM and GPU access, it seems like there could be an interesting path toward a model-agnostic platform rather than being tied to the frontier labs.

nab 1 day ago

A few angles to this. One is that coding just went through a massive change over the past year, that is not yet fully settled. Remember when everyone insisted on using IDEs and seeing the code with a chat sidebar? It's hard to argue you'll still be reading code a year from now. And even today, most people are still developing locally, which we're betting will shift to the cloud over the next few years.
I imagine other players will build cloud support in their own apps, but even now there's a lot of distraction for them. Everyone is trying to still support local execution, which looks really different from cloud. A lot of the labs are taking their coding-focused teams and throwing non-coding on their plates as well (the same app for non-engineers slinging google sheets).
We think getting the cloud experience right for software engineers (as well as companies, with their own hosting/development needs) is going to be really hard, and the problem needs a team fully focused on that. We also think that companies are rightly nervous about putting all their eggs in one basket -- their long term development environment should be harness and model agnostic.
RE OpenCode + self-hosted/local models: definitely. There's nothing holding us back from supporting these since we're just linux machines. But we wanted to start with the most popular harnesses first and go from there.
- shivekkhurana 1 day ago
  
  I have gotten into the habit of keeping the Codex app open on my laptop, and using the ChatGPT app on my phone as a remote. Maybe hosting is the way to go!
- gazebo2 1 day ago
  
  >It's hard to argue you'll still be reading code a year from now
  groan
- hasteg 1 day ago
  
  Maybe I'm in the minority but I still program with an IDE and a chat window in the side at work, as well as when I work on side projects. I do like to actually see the code that is getting produced.
- asdev 23 hours ago
  
  how can I short the we won't read code anymore bet?
phsource 21 hours ago

Personally, with our company on Cursor, I can see why model makers are not the best people to go all the way down the stack. Using the right model for the situation will continue to be important, and model makers, by design, do not want to give you the choice to run different models.
Right now, we use:
- Kimi K2.5 for easy fixes, asking about the code, various agentic commands (e.g., summarizing Loom videos for Slack messages)
- Opus 4.8, Sonnet, or Kimi for planning (we find GPT-5.5 to have too terse outputs for plans)
- Kimi K2.5, Composer 2.5, GPT-5.4 mini, etc. for faster implementation (i.e. we don't have to wait around for the slower tokens-per-second generation on Sonnet, etc.)
If we had to only use Opus, Sonnet, and Haiku, I'd definitely be looking to switch harnesses

cohix 1 day ago

I really like the pricing model and focus on not shafting people by auto-sleeping when an agent is done working.

I’ve been working on an [OSS TUI](https://github.com/prettysmartdev/awman) for managing agent execution and workflows in containers (local or remotely) and would love to collaborate if you’re interested.

kordlessagain 23 hours ago

Awman looks great - just installed on Windows and it built the image. I'm trying to figure out how to launch an agent...
FWIW, I'm working on Nemesis8: https://github.com/DeepBlueDynamics/nemesis8 if you want to team up. I'm kordless at gmail or kord at deepbluedynamics
- cohix 20 hours ago
  
  Would love to have some collaboration on the Windows side. Windows builds get published but get little to no manual testing, and not “fully supported” as of now but I would like to. If you can file issues and/or PRs I’d happily review.
  Will check out your project as well, looks similar to where I started with awman but it’s morphed since then.
nab 19 hours ago

Nice, love the idea of having containers that can work on either local or remote. We may end up reaching out once we start thinking about that.
And thanks re: pricing model. It's a start, and we still have a lot of optimizations to go there to make this as cheap as possible, but we think it's a good base to build upon to make agents as efficient as possible compute-wise.
- cohix 15 hours ago
  
  Please do, interested to add Boxes as a runtime for awman, then workflows could execute in the cloud transparently.
  
  lancetipton 6 hours ago
  
  Hey, I'm working on a project that does a similar thing, and I'd be interested in working together, if your up for it? You can learn more about it at https://www.threadedstack.com The project is not live yet, but will be in the next month. Check it out, and if you're interested let me know. Thank
pploug 9 hours ago

Uh, this looks very nice - reminds me of a TUI version of Canopy, if you are interested, We've (docker) been working on a separate agent sandbox runtime called SBX built around a MicroVM with a private docker daemon inside, maybe there's potential for a collaboration to add support for this runtime - feel free to ping me: per(dot)krogslund(at)docker .com

indigodaddy 1 day ago

I might use this if it supported any old cloud or VPS, and was at most $10/mo. The fact that you have decided that this platform should only live in your own custom cloud is unappealing to me.

Or, open source it and let us run it on our own VPS and keep your expensive cloud for those who want to pay. As it stands would never consider it.

nab 1 day ago

Thanks a ton for the feedback. Yeah, this is something we'll try to solve in the long term. One of the things that makes this work really smoothly for setup and speed is the ability to have a template box that you can instantly snapshot and fork (disk and RAM) to spin up new machines. There aren't many sandbox providers that do that well for running a full app and development environment, but I'm sure there will be more over time. And the per-second pricing means that you only pay when your agent is running.
You could use VPS, but spinning up and down boxes on inactivity takes a long time, and making changes to the template for new machines is less trivial there. If you're only paying for 1 VPS box, then you lose the "multiple independent machines" benefit, and I imagine things start to get more expensive even in the VPS world when you have 10 of them running at the same time (one per thread).
- indigodaddy 1 day ago
  
  Pretty sure you could accomplish this in a large physical server or even a huge resource VM (that has KVM passthrough) with some sort of microvm technology? Then that would obviate the need for "multiple cloud instance per coding thread", it would just be a microvm on the large server.
  Then again, I'm just the guy running his mouth, and you guys are the ones actually doing the work :)
  BTW, looks very polished and thought-through, I may have to still give it a try!
  
  dregitsky 1 day ago
  
  Nope you're exactly right - we're using microVMs today (Firecracker VMs via E2B) and running that same shape but on customer-owned machines is definitely one approach we're looking into.
  And thank you!
- dpark 16 hours ago
  
  Don’t bother listening to people who give you feedback that your product should be free. They aren’t going to buy your product no matter what you do.
aliclark 1 day ago

I'm building something like this that you can run in your own cloud!
https://flexenv.com/
It's nowhere near advanced as boxes.dev but it's built on the premise of running on any cloud. Indeed I have it running on two different bare metal server providers and I'm about to add a third (Azure) as I'm using my day job as my first customer.
Can I grab your contact details and schedule a demo?

pickleglitch 1 day ago

You can pry localhost from my cold dead hands.

hasteg 23 hours ago

Lol ++. Although my local host for agent/codex stuff is a raspberry pi I connect to on LAN from my gaming/powerful desktop for sandboxing. However my use case seems to be the exact problem they are trying to solve! Might have to take a look into it at some point.
kordlessagain 23 hours ago

Exactly my thoughts when I built all this: https://deepbluedynamics.com
I do provide cloud support for somethings like embeddings and crawling, but you can run it local if you want. The only thing closed source is the memory system, but it still runs local if you want it.
nab 23 hours ago

Hahaha, it was a cheap shot :P
The fun thing is that in some way it's a bit inaccurate. We auto port-forward ports from the remote machines to your localhost, so you can still just go to localhost:3000 or whatever, and it goes to whatever machine you have selected in the desktop app. We'll give you a browser in the mobile app too soon to hit "localhost" on mobile.
- pickleglitch 4 hours ago
  
  Yeah, no. I don't give a fuck about port forwarding to make a remote machine look like it's running on my localhost. I don't want to cede any more of my computing needs to the cloud than I absolutely have to. I like to own my hardware, not rent it.

amirhirsch 1 day ago

This looks very clean, great job!

If your CTO didn't spend the past year making an orchestration tool and a baby is he even qualified?

I have a vibe-coded orchestrator that I use to manage my claude and codex sessions across multiple machines, can also spin up sprites from fly.

https://github.com/tinkerer/propanes

warning: it is probably totally unsuitable for anyone else to use except for me

The main idea is a widget that you embed in your apps that lets you select elements, paste screenshots, and prompt what to change. This workflow is very productive for me. I would encourage everyone to add element selection to their orchestrators prompt composers. If you watch the looms on the readme note that my CLAUDE.MD calls me a Meat Computer and reminds me to hydrate.

I have a native tauri version that lets you select UI elements through the macos accessibility api too.

The session service uses tmux so you can open a native terminal via ssh and tmux attach. I add a ton of features that are in varying degrees of half-baked: the "brainstorm" mode allows you to do microphone transcription while interacting with the DOM and it will suggest tickets automatically. I've also been working on "bd2sdd" which is supposed to take your strings of user inputs and transform it into a spec, presumably because I also desired regressions. There are Wiggums (which aren't relevant anymore with /goal) and "FAFO swarms" (fan-out, aggregate, filter, optimze) which I use to reverse engineer other pieces of software, PowWow for codex and claude to work together.

I stole the structured views and remote session control from my friend's Agent Portal project txcl.io which is more fully-baked and narrower scope than propanes.

The ticketing system / tmux / structured views has been slowly evolving into multi-agent chat with a primary "Chief of Staff." It integrated pretty nicely into Slack.

peterldowns 1 day ago

What kind of cpu/memory do the vms get? Is there a way to define the template that's used, so I can say to a new team member, log in to boxes.dev and all the repos and tools are already there for you? And where do you get the machines, can we bring our own? The orchestration layer and product experience ticks all the boxes for me but where Codex, Claude, and Cursor have fallen down for me in the past is:

- slow and outdated vms

- horrible/no way to standardize environments for my team

- no way to bring our own compute to help resolve these issues ^

dregitsky 22 hours ago

> What kind of cpu/memory do the vms get?
Default is 4 vCPU / 8 GB memory but it's configurable at the team/project level (can go higher).
> Is there a way to define the template that's used, so I can say to a new team member, log in to boxes.dev and all the repos and tools are already there for you?
Yes we're moving in this direction! For the current public version each person sets up their box and then agent threads start on a snapshot of that box. But for companies, what you laid out is 100% the vision and coming soon. No more eng onboarding, and maybe even give non-technical folks a default dev environment where they can spawn agents and prototype.
> And where do you get the machines, can we bring our own?
Right now we're using MicroVMs with E2B as our infra provider, but for companies we're exploring how to support bringing your own. Happy to chat if interested!
- yodon 21 hours ago
  
  Don't Microsoft and others already offer this?

mklifelife 8 hours ago

I've been using Codex for a small SaaS project recently.Curious whether running everything in the cloud changed your development speed or mainly improved collaboration.

2001zhaozhao 23 hours ago

Really cool tool!

I am building a self-hosted tool (OpenClaw-like) to solve the same problem (running agents 24/7 and access from monile), which I think is the main alterative approach to cloud tools. I'm glad that other people have recognized the problem.

We currently use worktrees btw. We have a port allocation system that sends ports to the agent automatically, which suffices for smoke testing web projects in parallel but requires some configuration. We've also found that asking agents to find a free port works as well. There's no way to get security-relevant isolation without a containerized system, but everything else can be worked around, and IMO more easily than the setup required to make a project ready for VM/container development.

nab 21 hours ago

Nice -- yeah I definitely think it's possible to get configuration figured out for worktrees, but does require a some setup. Glad you all are in a good place on that front.
RE: setup required to make a project ready for VM deployment, not sure how complex your app is, but we've found that coding agents do a pretty good job at finding your dependencies locally, installing them on the remote, and ensuring your app runs on the remote end. If you have a few minutes, try out our auto-setup. Most people haven't had to lift a finger to get their apps running in VMs.

sntran 19 hours ago

It would be nice if there is an extension for VSCode or its forks that let you monitor your agent *running inside* your local machine, or VSCode adds support for it. I want to run agents on the codes I have open, not pushing them to a cloud "box" to run agents on there. But I do like being able to monitor or pick up the next steps from my phone.

Last time I tried to let AI build such extension, it told me that VSCode did not expose extension API to monitor AI chat.

bruckie 20 hours ago

What's the security story? I would love to adopt cloud dev environments that are constrained enough that I can safely run agents in YOLO mode, but not so constrained that they are useless. I would want it to be safe enough to run 80 to 90% of typical development work without supervision, and then have an escape hatch that allows doing other things with human supervision.

edit: and if anyone knows of an existing service that has these properties, I'd love to know about it.

nab 20 hours ago

We're currently running Firecracker VMs in E2B, which separate kernel level isolation. Over the long term, we're open to making it cloud/provider agnostic if you don't like that and want to run in your own cloud.
Right now, since these are just linux machines, agents only have access to what you give them. For most development workflows, this means you're putting development environment variables and keys there.
We're also considering having some sort of key storage construct that allows you to require human confirmation for access to certain other keys, but curious if you have any thoughts on what the ideal UX is.
You can of course just build your ideal solution on the template box (perhaps 2 factor authentication via AWS secrets manager to get access to certain keys that require human confirmation), and update your skills. Then all future threads/forks will have access to that setup.

ai_slop_hater 1 day ago

> ditch localhost; run Claude Code and Codex in the cloud

Why would I want this and not the other way around?

__natty__ 1 day ago

Maybe I’m naive but the longest single workflow I ran was maybe 15 minutes. How do you steer agents to run “overnight”? And what is the quality of such execution?

notrealyme123 1 day ago

Usually coding where the closed loop evaluation takes time.
E.g code debugging
- nab 1 day ago
  
  This. Very few people are doing this right now (probably because it sucks having 5 copies of your app running in parallel on your laptop), but in the past few months models have gotten really good at testing your running app live. If you have an environment where you can run your full app and models can get it at via playwright and chromium, they can click around, take actions, and actually verify that their code works.
  With boxes.dev I've starting pushing agents harder to run the full app and test their work end to end, and send me screenshots as proof. This takes time, sometimes up to 30-40 minutes, but is much more likely to be bug free at the end of the day.
FergusArgyll 1 day ago

In codex, is you use /goal it can go for a while. I've never seen overnight but > 1 hr is common
ai_slop_hater 1 day ago

I think they are just bullshitting.
smrtinsert 1 day ago

"build me a 10 million dollar MRR saas, make no mistakes"
dregitsky 1 day ago

To add to what @nab said, the longest ("overnight") runs are usually after going back and forth to build out a big multi-phase plan doc -- especially when each phase has an extensive manual test plan (agent runs the app in a browser, clicks through the workflow, watches logs, confirms behavior, etc).
These can go for many hours from all the manual testing and debugging. Quality really depends on how much you spec things out beforehand, and how you define the test plan / "success" gates. If the agent can't even run the app to test it then things can definitely go off the rails!
Bnjoroge 1 day ago

Works well for very well defined task. If you have a really big feature like a front end migration, you can use /plan, and /goal which i think is in most harnesses. You can also use other tools that allow your agent to interact with other terminals(I use an ADE called orca) that has an orca skill where an agent can spin up different sessions(different from subtasks because they share the context and you can chose the harness/model unlike sub agents). Can also read from the terminal, use your browser or computer and task screenshots and after prepare a report or something.
alasano 18 hours ago

I'm building https://engine.build
It's meant for the implementation of well defined tasks/specs while orchestrating a review/fix/verify loop.
Every day I have implementations running for hours non stop, it's simply the time it takes to get a proper and well reviewed implementation with LLMs imo.
- yencabulator 2 hours ago
  
  Please stop spamming your project.

drnick1 23 hours ago

Why is this better than running Claude on my own home server? I can remotely monitor the agent with Termux from my phone.

layer8 23 hours ago

It’s home server as a service.
nab 23 hours ago

It's definitely possible to build something like this yourself, but there are a lot of little things we've done that we think add up to a much better UX:
- A dedicated app where you can scroll through your thread/chat history and start a new thread/fork/VM just by typing a new message, along with access to persistent terminals organized by thread/machine. Push notifications as well when your threads are done. Sort of doable via termux/tmux/ssh/etc.
- It takes a little while to get git worktrees set up well to have multiple threads running in parallel. You have to make sure each worktree starts your app on a different port, for example. But some folks are able to get it in a good place through some manual setup work.
- We started hitting resource limits running 5 full copies of our app on 1 laptop (so each agent can test its work separately), but again, if you have a beefy enough machine this might not be a problem.
- We auto-handle port forwarding for you on desktop (and on mobile soon too). Again, you can finagle something like this with tailscale, but it's a pain in the butt to manually track which thread maps to which port on the same machine. We have some magic where if you select a thread in the desktop app, we automatically remap localhost:3000 (or any other port running there) to that thread's machine, so you can just reload your browser locally to test.
These are a few examples. From building this ourselves, we're pretty convinced that you need some sort of UI to do remote development in a super clean way that feels like localhost. But if you're willing to put in the work, you can probably get relatively close yourself!

wmedrano 23 hours ago

Well, I wouldn't use this since I have my own box. In case its useful:

- I run hermes on the box and it has some scheduled cron jobs.

- I gave it an account on a custom Git forge. It cannot commit without my direct permission, though it can blow the setup up in other ways lol.

- I interact by assigning it issues and talking through Discord.

dregitsky 22 hours ago

Nice! We love hearing about personal setups to solve these same problems. One difference between boxes.dev and your setup is that we spawn an exact copy of the main box for each agent thread, so it's totally isolated. But doing parallel agents on one box can definitely work too, it's just more work to configure a project for it.
Our bet is that a lot of people will want something prebuilt, and that the last-mile UX for making a good coding workspace (including code review, etc) is actually nontrivial, especially at companies.

servercobra 1 day ago

Nice, this looks exactly like what I've been looking for. I tried Fly.io Sprites and it _almost_ got me there, but I got annoyed logging into my CC every new feature. Unfortunately I wound up going all in on Cursor Cloud Agents, which overall has been decent.

dregitsky 22 hours ago

Thanks! We were also excited about Sprites when it launched but it didn't quite work for us either. And Cursor Cloud Agents is definitely pretty similar -- one area where we differ is that Cursor only uses their custom harness, and we liked using the actual Codex/CC harnesses directly (and wanted to benefit from any improvements big LLM cos are making to their models+harnesses)

maCDzP 23 hours ago

I run Claude Code on my VPS and do /rc to run from my mobile. It’s really handy.

pavelpilyak 1 day ago

How does this handle MCP credentials - both for stdio servers that read tokens from local config, and for HTTP ones where harness holds an OAuth token? Either way those secrets end up in your cloud? Curious what the security model is

nab 1 day ago

Right now the way you'd do this is you'd select the "Main box" or template VM in the UI, pull up a terminal tab, and authenticate whatever MCPs you care about. These are stored however the MCP is storing them (likely filesystem) on the VM. When you're done, you can "snapshot" the template VM and all future forks/new threads will start from that snapshot of filesystem + RAM.
We recommend you auth with only development credentials (or use something like 2 factor confirmation if you have more sensitive things you want to confirm before the agent accesses), but it's still early for us and we're continuing to refine this as we go. For companies, we're down to brainstorm how they'd like this to ideally work for them. And over the long term we'll support hosting this in your own cloud.
Curious if you have a take on how you'd like this to work from a UX standpoint.

Bnjoroge 1 day ago

What are “box-hours”? Regular hours just running in boxes? Do I get charged the same when 1)the agent is doing some external thing say web search that takes a while, and 2) when the agent isnt running(say waiting for my input)?

dregitsky 1 day ago

It's just one hour of runtime. But we put the machines to sleep very quickly once the agent finishes its work, and then wake when you interact in the UI (e.g. terminal, filesystem, send the agent a followup). We're running on firecracker microVMs so can sleep/wake very quickly, which keeps things nice and responsive.
Re: web searches -- we're running a full linux kernel and the agent runs on the machine itself, so we can't sleep mid run. But conceptually, moving the agent off-box and sleeping during web searches etc would be interesting, but in our experience coding agents are running enough stuff on the machine itself (rg, bash, playwright, etc) that there wouldn't be much savings.

astrochicken 23 hours ago

Nice design. I love the added mobile app.

soco 1 day ago

It feels somehow weird to see a cloud tool usable only from Macs. Oh well.

Arcuru 1 day ago

It's even weirder that their long post doesn't mention it's Mac only.
nab 23 hours ago

Sorry about that. We should have made that more clear in the post but unfortunately HN doesn't let us edit it anymore. We're just 2 people right now and wanted to ship early. We want to support other platforms over the long term. We are cloud, but there is a local component for porting your local environment for the fast onboarding, so it requires some care. Are you on Windows?
- soco 20 hours ago
  
  I'm both Windows and Linux, so either would work for me.

imoreno 23 hours ago

Don't Anthropic and OpenAI both offer the same thing built in? What are the difference with this service?

gorgmah 23 hours ago

their product attempts to duplicate your local dev setup on a machine on the cloud, which means they copy your .env / local postgres db, local docker-compose stack etc. It worked quite well for me, I tried it just now (except for postgres + setting up git). I think the product is quite good but still needs a bit of polishing.
I'm a bit frustrated that they restrained EU users from downloading their app, but I guess they just want to avoid dealing with GDPR, which is fair for an early startup!
- nab 21 hours ago
  
  Oof, I don't think we paid super close attention to the country list when shipping our app. We'll fix this for you, but it might be a few days for things to make it out of review. Really appreciate you trying it out. It's still pretty early so things are still rough around the edges, but we'll be keeping an eye out on our logs for bugs, and feel free to reach out to feedback at boxes.dev if you notice any issues.