Chyzwar 1 day ago

When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

The second issue is that the quality of the model “operator” makes a massive difference in the outcomes. Highly skilled senior devs who know how to prompt and have high agency will outperform team people that lack motivation and foundational skills.

Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus and tiny distillations from DeepSeek that perform well only in benchmarks.

  • lelanthran 1 day ago

    > When discussing LLM pricing, people are missing the plot. [ ... snipped ...] Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

    And you think it is unreasonable to consider this unsustainable?

    • z2 1 day ago

      And the direction is definitely towards removing that subsidy really soon. We can see it with OpenAI's shift to API-equivalent pricing for enterprise customers last month. Anecdotally my company saw OpenAI credit usage grow 2x with stable use across the ChatGPT platform, which is pretty terrifying considering just 2% of the company uses Codex.

      For context, ChatGPT business subscriptions give you a fixed pool of credits to use, after which you get billed a la carte at inflated 1.75x rates vs API, or if you don't want to pay, you get access to anything but the non-reasoning models turned off for the month.

      We also tried Claude Enterprise, which was unusable as people blew through their monthly limits in a matter of hours.

    • wongarsu 1 day ago

      Depends on what their actual costs are. Either they are losing lots of money on subscriptions, or they make absolute bank on API pricing.

      Looking at the pricing of 1-2T models like Kimi or DeepSeek on the open market, I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing.

      Especially considering that subscriptions a) distribute load over time via rate limits, and b) will include a lot of users who get only a fraction of the possible value, whether they are on a personal account where they are on the rate limit on the weekend but barely use it during the week, or are corporate users who were issued an account they rarely use. Subscription prices are usually measured on the average case, not the most extreme value a power user can get out of it

      • Forgeties79 1 day ago

        Considering not one company is in the black yet I don’t really know how we can say anyone is making bank, unless we want to count absurd levels of VC funding (now slowing down) I guess.

        • wongarsu 1 day ago

          I am conveniently not counting training costs (since they add no marginal costs, selling more tokens doesn't impact them), and hardware and DC costs only amortized

          Of course they do have to "make bank" in some way to offset the insane training costs. But whether they go for high prices or high volume, or offer some services as a loss leader to drive profits elsewhere is somewhat orthogonal to that

        • anthonypasq 1 day ago
          • Forgeties79 1 day ago

            Let’s see it first. And without omitting training/infrastructure costs at that. Until then my comment is still accurate.

            • anthonypasq 1 day ago

              its a private company, what exactly do you expect to 'see'?

              • Forgeties79 1 day ago

                Anthropic IPO's in less than 5 months and I guarantee you any company that officially is in the black will proudly shout it from the rooftops.

                • anthonypasq 23 hours ago

                  > Anthropic IPO's in less than 5 months

                  pure speculation. about as valuable as my linked wsj reporting i suppose. given thats the case, maybe you shouldnt claim so confidently that they are money incinerators.

                  • Forgeties79 22 hours ago

                    “pure speculation” is a bit unfair.

                    Back to the point: No one is profitable yet, which I think we both agree is accurate. If you are going to lean on “they will be soon” then it’s fair to say they’re going to IPO soon.

                    Ease off the gas. We’re just discussing a tech company.

          • dminik 23 hours ago

            Does Anthropic really expect to double their income without also doubling their expenses?

            • Forgeties79 20 hours ago

              I’ve been hearing that anthropic is on the verge of profitability for probably a year straight. Until all the companies agree to stop the training arms race I just don’t see how it’s in the cards

            • i2km 10 hours ago

              This is one of the things people miss. If they double their customers, of course they double their expenses. Unlike SW, the marginal cost here is still high

              • dminik 4 hours ago

                I mean, it's possible that with the new datacenter from SpaceX, they could onboard more users than it costs them to rent. That's fair. But I kind of doubt that.

                One thing that really stinks to me is that various AI boosters have been claiming insane profit margins (40%, 50%, ...), yet apparently Anthropic stands to (possibly) make $500M profit on $11B in expenses, that's clearly nowhere near 50%. Not to mention that they're not making profit on inference now.

                So where do people get this confidence to pull random numbers from?

            • wongarsu 9 hours ago

              There we go back to the original question: are subscriptions profitable, API pricing wildly profitable, and they just lose all that money on fixed costs like model training; or do they actually barely make money on inference?

              That's why talking about the profitability of inference without accounting for model training is interesting, because that is the deciding factor in whether more customers would help getting them in the green

              • dminik 4 hours ago

                Without actual data I don't know. My gut feeling is that they overall lose money on subscriptions (and especially the free tier that accounts for 95% of all users). And make thin profit (~5%) on API pricing.

                But it's just that. A gut feeling.

      • runtime_terror 1 day ago

        > I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing

        So just going on vibes?

        While some people don't like his content, Ed Zitron shows a lot of evidence for your assumption being very wrong.

        These companies are bleeding cash at ungodly rates. It's likely their API pricing is still subsidized if you look at their overall financial picture.

        Related, there's a good reason those API prices keep going up a lot every new version and it's not just because the models are better.

        • wongarsu 1 day ago

          Selling inference for more than inference costs is not incompatible with bleeding cash at ungodly rates. They do in fact pay ungodly amounts of cash for other things, like training, marketing, etc. Heck, you can bleed cash while being profitable (in the accounting sense)

          Also, API prices going up a lot every new version is more an OpenAI thing, and even there it's a recent trend: GPT 5.0 was a big price drop compared to 4.1, and 4.1 was cheaper than 4o, which itself got a price cut at some point and is cheaper than 4. Meanwhile Anthropic's API pricing stayed stable for many versions, then got slashed to a third with the 4.2 release and have stayed at that level since.

          • runtime_terror 22 hours ago

            But explain to me how these companies will recoup these costs outside of increasing inference pricing?

            Their business model is selling inference but the training and other costs have to be accounted for somehow. Unless I'm missing something obvious, inference costs must go up drastically if these companies are going to survive beyond the subsidy stage.

            • wongarsu 22 hours ago

              Sell more. The hope is that there is a huge addressable market that includes huge per-worker demand in almost all white collar work and lots of inference in people's private lives

              If that doesn't work, then yes, then prices will have to go up

              • 0xffff2 20 hours ago

                Both anecdotally for myself and from what I'm reading in the news, it seems just as likely that AI usage has already largely peaked.

                There was a lot of hype and exploration of capabilities, but models aren't evolving fast enough to keep that going, so I'm settling down into a familiarity with what an LLM can and can't do that means I am using them less overall that I was 6 months ago when I was throwing everything under the sun at it just to see what happened.

                Without either new model breakthroughs or dramatically _lower_ costs, I will be very surprised if the ultimate market doesn't end up within an order of magnitude of where it is today.

                • d1sxeyes 6 hours ago

                  > AI usage has already largely peaked.

                  I think this is minimally likely. While as individuals on the bleeding edge, we're perhaps using these tools less and less, and our echo chamber reinforces that, the penetration of AI into the normal corporate workplace is still very low - emails rewritten with ChatGPT, meeting notes summaries generated by default, etc. There are a million use cases for LLMs which are not yet built out. The tokenmaxxers will begin using AI less, but the penetration into the mass market will continue at a huge velocity.

                  • runtime_terror 2 hours ago

                    I agree that more uses will be found and that maybe we're not at the peak. But it also seems very clear a few players have been actively working to inflate usage numbers by margins that might take a while to replace with legitimate uses

                • runtime_terror 2 hours ago

                  Exactly. Like how Meta has a "blow our money on LLMs" leaderboard. Seems like a few companies are attempting to inflate hype enough so all the investors can exit without losing their heads.

                  Reminds me of the crypto hype but where the hype agents are some of the largest companies in the world.

              • runtime_terror 2 hours ago

                Yeah from my understanding they'll need to create a few trillion dollars more demand to break into profitability if we look at all the debt/obligations/contracts

            • HDThoreaun 12 hours ago

              Obviously they need more paying users. The entire game in tech is taking advantage of (comparatively)low marginal costs to pay off capex once you corner the market

              • runtime_terror 2 hours ago

                I do think that's at least part of the strategy. The problem is that we've never seen a single product category so hyped in history, literally trillions of dollars invested. To recoup that, some not so trivial miracles will need to happen.

                • HDThoreaun 1 hour ago

                  I think that within 5-10 years most white collar workers around the world will be paying for AI assistants. There are 1.2-1.3 billion such people to sell ai to, so getting more users doesnt really seem like a miracle to me. I do think convincing everyone to use expensive proprietary models instead of open ones hosted cheaply by third parties will be a minor miracle for the AI labs. Definitely not out of the question though.

  • stingraycharles 1 day ago

    Also, your local hardware is in no way capable of running the types of models that the cloud providers do, it’s just not economically feasible, and it never will be.

    • zozbot234 1 day ago

      It can run open-weight models that are roughly as capable. It's going to be slow unless you're using actual datacenter hardware, but they'll run.

      • colonCapitalDee 1 day ago

        "roughly" is doing a lot of heavy lifting there

        • adrian_b 1 day ago

          The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run.

          Anything can also be run on a cheap computer.

          The difference is in speed. A cheap computer may run a big model up to a few orders of magnitude slower than datacenter hardware, depending on whether the LLM is small enough to fit in GPU memory, or it is small enough to fit in CPU memory or it is so big that it must spill on SSDs.

          Depending on the application, the tradeoff between run time and run cost may happen to favor using local hardware, despite a much slower speed.

          There are plenty of applications where doing them for negligible cost during an overnight job can be preferable to obtaining faster results at a very high price, for instance scanning for bugs in a mature code base using a great number of different open-weights LLMs, which can achieve similar bug coverage like using a single, but overpriced and unavailable SOTA LLM, e.g. Mythos.

          • stingraycharles 19 hours ago

            > The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run.

            You do realize that a model like Opus is (estimated to be) around 5T parameters, and uses around 5TB of GPU memory?

            These kind of things are just impossible to run locally.

            • adrian_b 19 hours ago

              This kind of things can certainly be run locally, even on a small mini-PC, like a NUC, or even on a laptop, with the weights stored on SSDs.

              Like I have said, the problem is not that they cannot be run, but that they may run more slowly than it is acceptable for a given application. Depending on the model, the speeds reported for inference with weights stored on SSDs vary from one token every few seconds to at most a few tokens per second.

              Computers could solve relatively huge problems even in the early days of vacuum tube computers, when the main memories were measured in kilobytes, because at that time it was not expected that the data needed for problem solving must fit inside the main memory or even in the next tier of memory, with magnetic drums or magnetic disks, but the really big problems were solved by a great number of passes over data stored on magnetic tapes.

              An LLM whose inference could not be run on a small mini-PC would have to be one hundred times bigger than the biggest existing SOTA LLMs.

              Any LLM that exists today can be run on almost any PC, just extremely slowly in comparison with datacenter hardware.

              • dns_snek 10 hours ago

                When people say that you "can't do" something what they actually mean is that it's completely impractical (if not impossible).

                • zozbot234 7 hours ago

                  Whether something is "impractical" depends on your expectations. High-latency unattended inference is definitely viable, even though it doesn't align much with what's being run in hyperscale datacenters.

                  • dns_snek 6 hours ago

                    I'd like to meet the person who's been using a 1 token/second system as their primary LLM for at least a few weeks. Anyone?

                    I think 1 token/second is optimistic here - and even then it's over 11 days per million tokens.

    • bachmeier 1 day ago

      Very much dependent on the situation. For many business tasks, local hardware is good enough. But what a lot of folks overlook when saying these things is that (a) workers do more than run AI models on a piece of hardware, (b) significant computer hardware is already sitting idle outside normal work hours, when it can be running batch jobs, and (c) employees can share local hardware.

    • devmor 1 day ago

      > it never will be.

      Giving strong “640k is enough for anyone” vibes here.

      • 3form 21 hours ago

        640k statement was absolute, this one is comparative.

        Cloud should have more compute and efficiency than local. I wouldn't be 100% sure, as I don't know what I might not be seeing, but still.

        Whether that comparative advantage will matter, though, is a completely different question.

        • devmor 17 hours ago

          Gotcha, I think I misunderstood the statement as saying today’s cloud-required will never be local-capable.

    • cortesoft 1 day ago

      NEVER will be is a pretty big leap. Never is a long time.

    • adrian_b 1 day ago

      Depends on what you mean by "economically feasible".

      Even very cheap mini-PCs and laptops can run any of the models run by cloud providers, albeit at a much lower speed (i.e. with the weights stored on SSDs).

      Whether such a low speed is useful, depends on the application. For something like a coding assistant or bug scanning, an instant response is desirable, but certainly not necessary.

      • christina97 23 hours ago

        The SSD would wear out in days while the laptop generates two responses a day. This is like saying you could power your home with AA batteries, yes technically you could but in practice entirely infeasible.

        • jyounker 23 hours ago

          Weights are write-once data.

        • adrian_b 19 hours ago

          There is no wear on the SSDs, because the weights are just read, they are not written during inference.

          For model training, the requirements are very different, and the training of a big LLM cannot be done with home equipment. On the other hand, inference can be done on almost any PC, even for LLMs with thousands of billions of parameters, just very slowly.

          The only problem is that the inference becomes limited by the SSD reading throughput. Most of the cheap new personal computers available today can read simultaneously only 2 SSDs (if there are more they share a reading path), which are typically 1 PCIe 5.0 SSD and 1 PCIe 4.0 SSD. This has an upper throughput limit of 24 Gbyte/s, with 15 to 20 GB/s achievable in practice.

          Then the speed in token/s is limited by the amount of weights that must be read per inference cycle. The ratio between output tokens and the amount of weights that must be read can be improved by various methods, like batching multiple tasks or using speculative decoding.

          • jurgenburgen 11 hours ago

            Does more RAM increase performance? This approach sounds like it could eventually be fast enough for local use as hardware and models improve.

            • zozbot234 10 hours ago

              Faster SSD access improves performance more than RAM does, at least until all of the model is being cached in RAM. So older and cheaper HEDT platforms with lots of PCIe lanes to attach storage to are best for this approach.

    • ajb 9 hours ago

      SanDisk has designed a flash equivalent to HBM, which has 1.6TB/s of bandwidth. I expect that it will be available initially to server manufacturers only, but once supply ramps up will be built into individual machines. At that point it will be practical to run local inference on much larger models. Of course, maybe the SOTA providers will find some way to use even larger ones, but it seems like the returns to scale aren't as much as they were.

  • simonw 1 day ago

    I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).

    So large companies are getting billed a lot more than those discount subscription plans.

    • alexriddle 1 day ago

      Anything over 150 seats means you need to pay at token rates plus the $20/user. My day job is operational (no coding at all) and I'm spending ~$300 a month on a few chats with Claude/Cowork a day over the course of a month.

      • stymaar 1 day ago

        I hope your company is keeping the input/response pair in case they need to break free at some point.

        • dd8601fn 1 day ago

          Wouldn’t people mostly just want any artifacts?

          • speed_spread 15 hours ago

            Like Slack history, LLM history can be used to build searchable knowledge base. Questions are often more valuable than answers.

      • m_kos 1 day ago

        $300 is my employer's monthly cap on Claude Enterprise. It lasts me at most a week of moderate use. I would much rather get Codex Pro and Claude Pro or Max, which would cost ≤ $200. For $300, one could also add Gemini Ultra to the mix so I could have all three review each other's code, etc.

        Claude can be very good but enterprise pricing doesn't make sense to me.

        • lunar_mycroft 23 hours ago

          The $200 plan you're talking about is subsidized by Anthropic. They cannot afford to keep offering that to everyone indefinitely. Absolute best case scenario for current users is that they can continue to subsidize it as way to sell enterprise plans, but there's no way that they can keep offering it to everyone at those prices.

          • wahnfrieden 21 hours ago

            They can if it is a way to get individuals hooked on it to then introduce it at their workplaces, who pay enterprise rates.

            • lunar_mycroft 18 hours ago

              Right, they can do it to sell enterprise plans, but they can't offer said plans to those enterprise customers indefinitely. So if your employer wants to spend $200/month on tokens, you're going to get however many tokens $200 buys you each month, not the order of magnitude more you can get with a consumer subscription.

              • wahnfrieden 16 hours ago

                That’s what I’m saying. Enterprise customers don’t use the subscription plan

                • addedGone 5 hours ago

                  Except that they do, we do.

                  A lot of startups pile up enormous amount of accounts, companies don't need the Enterprise Anthropic solution, they can just subscribe to many accounts and have their own staff KYC for each (1 codex, 1 claude, 1 google and so-on).

                  • Bnjoroge 3 hours ago

                    I imagine it’s also really trivial to build some kind of local “enterprise” proxy that gives you the same visibility in usage as the anthropic dashboard would give you. I use one for aggregating all my subs.

                  • lunar_mycroft 1 hour ago

                    That will be clamped down on by Anthropic (and other providers) for the same reason they don't offer those plans to enterprise customers already.

                  • wahnfrieden 1 hour ago

                    Startups do but do you know large enterprises that do?

                  • ilikehurdles 1 hour ago

                    We definitely pay enterprise api costs. Only way to get google vertex integration, and Enterprise is too sensitive to let all of their data leave their moat.

          • goosejuice 19 hours ago

            > They cannot afford to keep offering that to everyone indefinitely.

            Common talking point. There's enough evidence for the counter argument that this is essentially misinformation. I have no idea why it's so often repeated with confidence.

            • cdata 19 hours ago

              > There's enough evidence for the counter argument that this is essentially misinformation.

              > No evidence is shared

              Help an open-minded critic out.

              • goosejuice 18 hours ago

                Brand new industry, massive capital, dropping inference costs, increasing availability of compute, cost centers / subsidized subscriptions are common in SaaS, heavy competition, no public information on actual utilization rates.

                How much is Waymo burning a year? 3B on 300M ARR? Anthropic is what 5B on 20B ARR? Waymo is 3x older. Why don't we hear such confident statements about how subsidized their rides are?

                It's one thing to speculate it's another to parade it as fact. Even if the S1 reveals an unprofitable business today, you can still only claim it's unlikely.

                • lmm 18 hours ago

                  > How much is Waymo burning a year? 3B on 300M ARR? Anthropic is what 5B on 20B ARR? Waymo is 3x older. Why don't we hear such confident statements about how subsidized their rides are?

                  We do. We hear it less often because no-one is talking about how Waymo changes how we all need to work or whatever, that's all.

                • lunar_mycroft 18 hours ago

                  Do people commonly argue Waymo isn't subsidizing rates?

                  Also, we do have some evidence for my position:

                  - We know that the consumer Claude plans provide _way_ more tokens than you could get if you were paying API prices. This is a huge part of why Anthropic's limits on other harnesses for subscription customers is such a big deal. So either their profit margin on API tokens is absurdly high, most consumer subscribers don't come anywhere near their rate limits, or they're losing money on the consumer subscriptions. - It appears that complains about people running into rate limits are common, which suggests the "consumers usually don't use much of their subscription" explanation is incorrect. - We also know that Anthropic has just become profitable, almost certainly driven mostly by enterprise customers. This rules out the "they make a very high profit margin on the API" explanation, since if that was the case they'd likely have been profitable much earlier.

                  Taken together, I think the case that their consumer subscriptions lose them money on net is pretty strong, even though their enterprise subscriptions (and API pricing) does make them a profit.

                  • goosejuice 16 hours ago

                    > I think the case that their consumer subscriptions lose them money on net is pretty strong, even though their enterprise subscriptions (and API pricing) does make them a profit.

                    To be clear I'm not arguing against this position, just questioning the confidence with which people claim that the current consumer subs are not a sustainable offering and a merely temporary.

                    • fluidcruft 7 hours ago

                      Burning money is never sustainable. All you're actually saying is nobody can predict how long this particular bonfire will burn.

                      • goosejuice 3 hours ago

                        Again this is nonsense for the reasons I've already given. The costs aren't fixed.

        • ilikehurdles 20 hours ago

          That’s a shocking number. I don’t know how much my employer is billed, but based on the numbers reported by Claude code in its optional status bar, I’m often exceeding $300 in a day across sessions, when working on meatier tickets.

      • stavros 20 hours ago

        We deployed OpenWebUI with the Claude API the other day for employees. Someone sent ten messages (which appeared to just be reasonable day-to-day work), and we paid $200 for it. There were 44M input tokens, 100k output tokens, no cache hits at all. OpenWebUI reports 3M tokens used, Claude reports 44M, and I have no idea where the rest of the tokens went. This was all on a brand new API key, installed directly to the service, too.

        With this kind of opaque billing, how can I reasonably deploy any AI?

        • SyneRyder 7 hours ago

          No cache hits seems ominous, could this be an OpenWebUI issue? It also seems ominous that Anthropic models are basically nowhere on the OpenWebUI leaderboards.

          I'm only doing a cursory search, but it seems OpenWebUI doesn't support Anthropic caching, and they don't intend to? Other providers handle caching automatically (apparently?) but caching has to be specifically managed by the client with Anthropic. If that's correct that OpenWebUI doesn't support it, it would really send your costs spiralling, because you're being billed for all the tokens in the entire multi-turn conversation on every turn:

          https://github.com/open-webui/open-webui/issues/4887

          I have no experience with OpenWebUI though (honestly, first time I've heard of it). Just trying to be helpful. If I'm completely incorrect then apologies in advance for sending you down the wrong path.

          • stavros 4 hours ago

            Really? Huh, I've never heard of Anthropic caching needing to be specifically enabled. I'll look into that, thank you! Sounds like the culprit.

    • datadrivenangel 1 day ago

      I've heard that the $20/seat gets waved if you have large enough committed spend.

      • isoprophlex 1 day ago

        Would they even care at that scale, if the average employee spends $3000 every month because mgmt mandates slopmaxxing?

    • jgreid 1 day ago

      Governance and audit trail are incredibly valuable to large enterprise organizations, especially those working in regulated spaces. Companies will pay a premium if the security/privacy/compliance issues are handled effectively.

      • zaphirplane 16 hours ago

        What is the governance and audit trail on offer ?

    • zackify 1 day ago

      We are on it at my job. It saves money due to other parts of the org not using as many tokens.

      The real cost effective way is giving a team $20 cursor $20-100 Claude $20-200 codex.

      I'm spending 1k on Claude enterprise easily and that's with trying to spread it on codex and cursor using pi.

    • pyreko 23 hours ago

      Yep, where I work I know people easily spending over a few thousand dollars a month.

    • htrp 22 hours ago

      > I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).

      Can large enterprises just not use the API ? I have audit logs and what seem to be enterprise features through my anthropic account (platform.claude.ai)

      • thewebguyd 21 hours ago

        The devs can, sure. The "enterprise plan" is more for that + giving Claude to all the non-technical employees for access to the chatbot + Cowork. Plus SSO and all that jazz.

      • simonw 21 hours ago

        They can do that, but I expect they see individual user accounts and enterprise account management and easy rollout of Claude.ai/Cowork/Claude Code/etc as worth an extra $20/month/person.

      • opsnooperfax 18 hours ago

        Your CISO is paying to not be responsible. That’s it. That’s always the reason.

      • acdha 16 hours ago

        Enterprises can, but then they have to show their auditors that this has been done in a way which is robust and can’t be bypassed, and they have to build the kind of reports people need to be convinced of that — nothing is ever “just” in enterprise IT.

        Longer term, you also have to be careful about building things around details which could change at any time. OpenAI and Anthropic have a ton of pressure to start banking huge profits and they very closely monitor customer activity. A time-honored strategy in this space is to shuffle the features enterprise customers depend on but which aren’t deal-breakers for most other customers into expensive enterprise plans. There’s possibly some counter pressure from companies like Google which have healthier finances but I wouldn’t count on that since they also have MBAs who’d be all too happy to invent pretexts to hike their prices to match.

  • cyanydeez 1 day ago

    Isn't the plot that it's like an infinite bikeshed but 10% of the biksheds are actually trailer parks and when you finally realize it's a trailer park and not a bike shed you're down 10-100$ because it's token gen is faster than you can actually validate?

    Some might say the price wouldn't be great if you could actually process and validate it...

  • kelseyfrog 1 day ago

    > The quality of the model “operator” makes a massive difference in the outcomes.

    My hunch is that this is the source of much of the variability in outcomes upstream of HN commenters claiming extremes of, "This model changes everything!" to "This[same] model is crap."

    We haven't operationalized what it means to "be good at prompting," nor developed proxies/heuristics/shibboleths for accessing prompting skill. There's community skepticism over whether prompting skill even exists. Besides even if prompting skill is real, who wants to hear, "Actually you kinda suck at prompting."

    • danielmarkbruce 1 day ago

      It's 100% this. Many people suck at prompting. It's likely that habits from search are ingrained. But in general some people are just so bad at it .

      • latexr 1 day ago

        According to Google, “there’s no wrong way to prompt”.

        https://www.youtube.com/watch?v=9bBfYX8X5aU&t=48s

        • knollimar 21 hours ago

          No wrong way to [consume thing I sell that you'll consume more of if you do it poorly]

        • djeastm 20 hours ago

          Ehhh, their incentive in their marketing is to get normal people to not be intimidated by the big bad AI.

          Power users are always going to have to take the messaging companies send out to the masses with a grain of salt.

      • jyounker 23 hours ago

        Prompting is just writing specification documents. A lot of people are very bad at this. I suppose that more to the point, a lot of people are just bad at writing.

        • danielmarkbruce 19 hours ago

          This is probably correct. Perhaps prompting just brings out the very worst in specification.

      • FireCrack 14 hours ago

        IDK if it's just me, but I also find Claude, whether it be the model or the harness, is a lot more "forgiving" of poor prompts than many of the open models

  • stymaar 1 day ago

    > Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus

    What's your source for Opus being a 5T model?

    > and tiny distillations from DeepSeek that perform well only in benchmarks.

    I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.

    And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).

    • gpugreg 1 day ago

      > What's your source for Opus being a 5T model?

      Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m

      While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

      • stymaar 1 day ago

        > While this source's reliability is certainly debatable

        Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.

        > the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

        Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.

        (I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)

        • striking 1 day ago

          In tiny gray text right above the table is written "90% PI ≈ ±3.00× either side." Is GPT-5.5-Pro 3.4T or 30.8T in size, or somewhere in between? We just don't know.

      • UltraSane 13 hours ago

        Elon Musk has absolutely no credibility anymore. I'm more likely to believe the opposite of what he claims to be true.

        • kakacik 6 hours ago

          aka the russian strategy

      • orphea 8 hours ago
          Elon Musk tweeted
        

        Come on. The Onion would be a more credible source.

      • fluidcruft 6 hours ago

        Musk has a lot of incentive to explain away how horrible Grok is relative to Opus.

        It's certainly a better sell that Grok sucks because it's small and Opus is impressive because it's large, than the alternative that Grok is also large and sucks which points to xAI incompetence and mismanagement.

        Particularly when you're trying to IPO a rocket company based on rosy forecasted valuations of Grok dominating the market.

    • Chyzwar 23 hours ago
      • stymaar 22 hours ago

        That's not what the paper says though:

            Claude Opus 4.6 Anthropic 68.0% ∼5.3T [1.8–15.6T]
            Claude Opus 4.7 Anthropic 66.4% ∼4.0T [1.4–12.0T]
            Claude Opus 4.5 Anthropic 65.2% ∼3.4T [1.1–10.0T]
            Claude Opus 4.1 Anthropic 64.9% ∼3.2T [1.1–9.5T]
            Claude Opus 4 Anthropic 59.7% ∼1.4T [478B–4.2T
        

        According to their estimation, Opus is likely between 1T and 15T, which really doesn't tell you much that you couldn't have guessed otherwise. It doesn't say “Opus is a 5T model”.

        The fact that there's absolutely no consistency in the predicted size between models from the same lab should tell you all you need about the predictive power of this method (and they aren't really lying about their numbers, their confidence interval is huge enough to fit anything in it, but their prose is making very strong claims out of their statistical nothingburger).

        (somebody already posted this paper earlier, and I spent some time reading it, and this paper is really not that good even though there are a bunch of interesting ideas in it).

  • runtime_terror 1 day ago

    > The subscription token price is 10x-40x cheaper than API pricing

    This is a temporary phenomenon. Expect either drastic price increases or draconian throttling or both in the coming months.

    These companies are operating at huge loses and have hundreds of billions in liabilities and commitments. They need to turn on the money faucet sooner than later.

    • anthonypasq 1 day ago

      Theres recent reporting that Anthropic will be profitable this quarter...

      edit: I see in other comments on this thread you think Ed Zitron is a reliable pundit so that explains everything.

      • runtime_terror 22 hours ago

        How will it be profitable, really?

        You can dismiss Ed (and me vicariously) but what's your compelling evidence to counter their extremely uphill battle towards profitability?

        Either way it will be very interesting to see their S1 when they try and IPO.

        If it's anything like SpaceX's then I suspect my post will age better than yours.

        • brookst 16 hours ago

          I sincerely doubt Anthropic’s IPO will say that their AI business is only 2% of their future revenue, and they’re bundling in totally unrelated, unprofitable things they expect to account for 98%.

          • runtime_terror 2 hours ago

            I'm not sure what you're talking about or referring to...

            I haven't heard anyone claim their S1 will show that but that it will show how poorly their revenue figures look against their costs.

            • brookst 1 hour ago

              Space'x IPO docs say that launch + Starlink will be 2% of their revenue opportunity, with enterprise AI being 98%.

    • alfiedotwtf 1 day ago

      Incentives matter…

      If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.

      To be honest, I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI (pronounced “Oh-pah” as in what Greeks shout as the drink shots), which stands for “On-Prem AI”

      … yes, I just made up OPAI right now lol

      • overfeed 23 hours ago

        > I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI

        If we momentarily disregard the fact that YC itself owns billions of dollars worth of OpenAI shares[1], YC would plan to find demo-day investors willing to drive down the value of frontier labs. The coöpetition among VCs and the existing web of AI investments will mean no VC will be interested in investing in local AI...until after the frontier labs IPO.

        1. Thanks to the self-dea^w foresight of former YC president Sam Altman

      • runtime_terror 22 hours ago

        I do think many will move to lower cost models or self hosted over the next few years as prices balloon. And the privacy/control story is compelling.

        If we're able to see some big increases in hardware capabilities that can be self-hosted, that will be an accelerant.

        That said, most companies just want to pay a provider to delegate responsibility in exchange for cost and control.

      • nicoburns 22 hours ago

        > If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.

        That or just hiring people to do the work! I hear rumours that this is already starting to happen in some places (perhaps those that were a little overzealous with AI-hype driven layoffs).

        • alfiedotwtf 20 hours ago

          Fool me once… I think applying for jobs at a company that only within the last 12 months shed thousands of people “because of AI” should be seen as laughable, and employees collectively rejecting to work there should be seen as the norm

    • Npovview 1 day ago

      Even with increased prices, AI enables velocity both in development and bugs fixing. Would companies want that? If prices are biting the company, I think companies will route all development and bugs fixing requests through few superperfomer developers with complete knowledge of the different components within the company (they will be the Queen Bees holding the company on their head). The rest of the company will be tasked with requirment gathering, specs cleaning, deambiguation and so on (worker bees).

      • DougN7 23 hours ago

        From what I understand, that is sort of how IBM Bob works - multiple models behind the scenes and they route the request to the model that will handle it best at the lowest price.

      • runtime_terror 22 hours ago

        So kinda like how stuff is now at a lot of big companies? I've worked at many different companies and almost always there are a few out-performers and a lot of people just found enough not to get fired (no hate, power to them lol).

        We're already seeing slash their AI budgets. I expect that will increase till we hit more of an equilibrium.

        • Npovview 4 hours ago

          I think people will start measuring (features * time taken to implement) to tokens consumed ratio and then redistribute token budgets to developers. This will measure how effective/efficient people are LLMs.

      • curt15 18 hours ago

        > Even with increased prices, AI enables velocity both in development and bugs fixing.

        What about human understanding of the codebase that's essential to any project's long term health? Even "superperformer developers" eventually leave the company.

        • Npovview 4 hours ago

          Ask multiple AIs (if you can't trust one) to explain the project.

      • 0xbadcafebee 16 hours ago

        Most software development teams are pushing back on the deluge of bad changes from AI tools and are moving slower again to regain trust and stability. It is likely that future software development will not actually be higher velocity.

        • Npovview 16 hours ago

          Bad changes will be eliminated because better people are using the AI tools. They will reduce cost as well as slop.

          • 0xbadcafebee 6 hours ago

            The models are the same whether you're smart or dumb

          • kakacik 5 hours ago

            Yes and we will all hold hands together singing and bring world peace from now on forever. Back in reality, I can see some obstacles without even trying

            • Npovview 4 hours ago

              Do you want 60% of people to be employed in farming? I am being rhetorical because that is what you sarcasm implies. Today only 2% of people in farming support so many people in America.

      • otabdeveloper4 7 hours ago

        > AI enables velocity both in development and bugs fixing

        Or so they say. You'll have to trust those vibes blindly, because double-checking these claims apparently makes you an anti-science luddite.

    • 8note 12 hours ago

      the alternative is that api prices change to be more in line with deepseek's

      • runtime_terror 2 hours ago

        So then how do these companies return profits to investors?

  • xbmcuser 1 day ago

    Its not like the non frontier are not improving. If someone can use deepseek to get 90% of the work done for $100 then pay another $100 to anthropic or openai to complete it I think they will rather do that than pay anthropic or openai for $1000.

    • jonfromsf 3 hours ago

      Yes, for indie developers and small startups. Large corps won't want their code /email/etc data being looked at by the Chinese government.

      • LUmBULtERA 2 hours ago

        For Deepseek and other openweight models, you can use non-Chinese hosted infrastructure that offer zero data retention and still save a whole lot of money. A large corp could even host their own Deepseek v4.0 Flash model internally for some basic work.

  • try-working 18 hours ago

    DeepSeek and Xiaomi are so cheap there's no need to get a plan. Just use the API.

    • jason_s 14 hours ago

      something something something China something something intellectual property something something....

      • noman-land 13 hours ago

        You can just say the words instead of implying their meaning and letting everyone fill in the gaps themselves.

  • protocolture 17 hours ago

    >When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

    These are loss leaders that will not be maintained over the long term. Already we see moves to restrict their usage and redirect people back to API pricing.

  • otabdeveloper4 7 hours ago

    The subscription plans are the "first hit is free" plans. They're not gonna last and don't build anything serious based on them.

jyounker 22 hours ago

The problem with outsourcing, as opposed to remote developers, is that it takes a really good manager and tech lead to make it work.

My experience is that you have to write extremely detailed design documents and work specifications in order to get effective results. These generally have to be as detailed as most effective prompts.

Once you've written specs that detailed, why do you need outsourced developers and frontier models?

  • zdragnar 22 hours ago

    Having worked at a place with few or no design documents, few or no specifications and a team that was at least half outsourced or more, I can say that was not an effective place to work.

  • tomp 22 hours ago

    The entire business model of "outsourced" developers / shops is to overbill people - "we have 4 engineers working on your project" (and also on 5 other projects).

    Even if the engineers themselves are cooperative, their managers / business owners will resist close cooperation and enforce work at arm's length (e.g. 1x weekly calls).

    Ask me how I know. I once spent £300k (fortunately not my money) on an outsourced team of developers, and they delivered nothing at the end. Most of the time it was simply about aligning the work! We (me and my partner, we together had some idea of what we actually wanted) tried repeatedly to make sync-s more frequent, to better align the efforts, but their managers kept resisting. It's the "consulting" business model!

    For remote jobs, the incentives are reversed. You're literally a full-time employee, there's no management layers to impede communication, and (unless you're lazy or a fraud) you probably want to work on interesting problems and not be bored!

    • Tade0 22 hours ago

      > "we have 4 engineers working on your project" (and also on 5 other projects).

      Largest such scam[0] I've heard of was "we have 11 senior engineers working on this project" (actually three, two of whom were actually junior-to-mid-level).

      [0] Let's call a spade a spade.

    • tintor 22 hours ago

      "we have 4 engineers working on your project" (and also on 5 other projects)

      Does it matter how many engineers work on supplier's side?

      Supplier is tasked to deliver the project. It is up to them to figure out how many people would they need, and to manage them.

      • jpalawaga 21 hours ago

        if your supplier promises you an iPhone 17 Pro, and then delivers you a Pixel 4a because 'thats all you need', you will be understandably miffed.

      • svachalek 21 hours ago

        Depends how billing works, but it's pretty rare to see terms that say we'll deliver exactly what you asked for, on a fixed price and fixed timeline, and take all the liability for failure. To some degree or another you are paying for time and effort, which are easy to misrepresent.

  • _puk 22 hours ago

    Exactly where my head is at.

    Not only do you need to spec everything to the right level of detail (at which point an LLM can likely make a good go of it), but a lot of the outsourced teams don't build in anywhere near the same way as those internally, and the difference in the level and speed of delivery is absolute.

    Not to mention with everything changing so quickly, why would I be spending time and money training up someone else's staff to be keeping up with the cutting edge?

  • janpeuker 22 hours ago

    Outsourcing usually gives you exactly what you pay for, arguably more transparently than other ways. It’s just that transparency (i.e. the price for quality) is sometimes not passed on from management / procurement taking that decision down to the team eventually having to work in a distributed fashion.

    I think that’s also where the assumptions of the original post are off - the difference between DeepSeek and a frontier model is not usually what low quality outsourcing can cover. So you probably end up paying a highly qualified outsourced engineer who may not be significantly cheaper (most outsourcing is not just due to cost but capacity and capability).

  • SoftTalker 22 hours ago

    > you have to write extremely detailed design documents

    Luckily LLMs can do that too.

  • munk-a 21 hours ago

    It's funny - the problem with outsourcing is the same problem as AI and it's all a huge callback to the early 2000s. Companies are astonished by just how much money can be saved without realizing the damage to their product. Some will have extremely fastidious oversight from strong product/project leads that will become the new generation of developers and some will buy the pitch and just fail when their software becomes unmaintainable.

    In ten years my prediction is that we have just as many developers as now building more products than they build now and AI is used for automation in isolated areas where it makes sense but most software development just happens at a higher level of abstraction where less text garbage is required to express the same concepts and the meat of code becomes even more focused on specifically encoding and highlighting the intricacies of the strange edge cases.

    I started my journey in software development working on a MUD that had been passed down through a dozen hands and was extremely dirty software. I can't see anyone wanting to try and pick through the ball of mud and spaghetti that'd result of letting AI build software without severe oversight and corrections.

    The core of software development has always been problem solving (or, more accurately, problem identification). As time has gone on we've gotten rid of more and more of the cruft to focus on that point. I suspect that trend line will continue and we'll evolve towards even leaner and more abstract languages to state problems and try and isolate the fiddly logical flow components, driver bits and math more and more into libraries and tools because for most daily work it is important but can be assumed to have been done by someone else better.

    • dogcomplex 17 hours ago

      I think this is a very reasonable middle-of-the-road AI take and our likely future. Just with the caveat that there's still a major threshold being hit here where we jump up to a new casual capabilities class where it becomes silly not to use AI for the majority of work, but there are still some high-intricacy problems which become much more load bearing than they ever were before and our new abstraction level doubles down on those.

      I would like to submit that the high-intricacy work congregates in Protocols themselves, and we start seeing the cycles of development and all the ways to direct AIs, programs, inter-person/inter-company interactions, etc etc all as types of protocol design - and studying those rules of interaction themselves becomes the new job of a programmer (systems architecture). What used to be hard rules and deterministic programs becomes soft self-governing tendencies and probabilistic behavior that can nonetheless be managed and bounded with the right system, but it's new and weird and more akin to management or herding cats than architecture. This is still very different from what most of us were working on before AI, but it's still familiar - especially to those who worked on internet protocols, or defensive UX design around users, physical engineering systems, or team management. Less programming languages, more - control theory, flows and throttles, quality control, design theory, etc. And clearly the field is still wide open as everyone seems to be experimenting with their own take on the AI orchestrator.

  • fridder 21 hours ago

    My problem has been just a lack of ownership. Unless it is a small focused outsourcing firm it is easier for the companies to just ship it, regardless of quality or maintainability. I have admittedly a small personal sample set though

treis 1 day ago

I think this misses the forest for the trees. Working with ChatGPT is eerily similar to working with offshore Indian devs back in my enterprise days. Productive if guided explicitly but if let run wild there's lots of WTF moments.

LLMs are likely to replace outsourced devs because your employees that know the context can use LLMs to do what offshore devs did before.

  • zwischenzug 1 day ago

    Certainly tracks with the number of outsourced teams begging for work on LinkedIn.

  • lumost 1 day ago

    How many of those wtf moments are simply from not “being in the room when it happened?” Most enterprise software is riddled with wtf moments demanded as one compromise or another.

    • xcskier56 1 day ago

      There's always wtf, why did we add this feature, but at least in my experience, once a week or so I run into something in this category. Me: "AI, please cleanup/refactor/improve this thing" AI: "Roger that! I deleted the file so now it's perfectly clean" ... insert W.T.F.

      • dboreham 1 day ago

        Never seen that once.

        • vorticalbox 20 hours ago

          Same but I have seen it try and change my tests because it decided that it’s code was correct and my tests were incorrect.

          I saw it’s thinking tokens said something along the lines of “I have implemented it correctly but the test is failing. I’ll update the tests so the pipeline passes”

      • ofjcihen 1 day ago

        Yeah unfortunately you have to be careful with words like “clean up”.

        I’ve had it assume I meant the folder multiple times :/

    • karl_gluck 1 day ago

      At least some, but let me give an example.

      Request: “manual step X should not be part of the automated build script”

      Fulfilled as: build script is now split in two. X is still done as a manual step in between. Rather than prompting and waiting for it to be done, the documentation and scripts no longer mention X.

      Part poorly written requirements, part implementing under pressure, and part lack of engineering discipline.

      The main issue is catching stuff like this early enough to course-correct. Differences in time zone, language and cultural norms can make that a challenge, all of which LLMs have the advantage in.

  • spprashant 1 day ago

    "offshore Indian devs" are no slouches. They have access to the same GPT models and likely cost a tenth of the median US salary. Businesses are always looking to lower marginal cost. They will hire 1 software architect in US to write specs and 10 software developers in India to babysit 100 agents.

    • CuriouslyC 1 day ago

      This is short-sighted. The problem with offshore Indian devs is the communication friction/overhead. You're 9 hours offset, with people who have decent-but-not-great English skills and wildly different cultural priors. If the product people/decision makers are in the US, you're getting a ~50% savings to suffer all those issues, while the cost of tokens remains unchanged. That 50% savings doesn't look very impressive when you're taking a 20% productivity hit from comms friction and crossed wire, and 35% of your total cost is from tokens anyhow. Then it comes out to be a very marginal savings, at the cost of a VASTLY worse hiring experience and VERY high variance of outcomes.

      Offshore Indian devs make sense when you can have a large Indian division so you can amortize communication infrastructure/process management over a lot of heads, and you're building for international customers so you're not paying an English -> X tax inherently.

    • mikeocool 1 day ago

      "They will hire 1 software architect in US to write specs and 10 software developers in India" is exactly what everyone said was going to happen in 2004 as software engineering outsourcing really started to gain traction. Malcolm Gladwell's The Earth Is Flat basically made the argument that software engineering in the US was going the way of manufacturing.

      And outsourcing certainly became a thing though not in the way everyone predicted. There are far more software engineers in the US today than there were in 2004.

    • gedy 1 day ago

      While people will do what they need for money, that is a miserable type of role and the quality of architect will suffer from that.

    • runtime_terror 1 day ago

      Obviously this is just anecdotal but over my 20+ year career I've worked with a lot of outsourced teams in India and my experience has nearly always been that they require a frustratingly specific degree of direction to product anything of quality.

      Just recently I asked a dev there for a POC of a feature with decent specificity and ended up with about 8k LOC of spaghetti. I re-wrote it later in a few hundred. This is about in-line with my career experience.

      I've had a few standout devs there but it does feel like a lot are putting in the bare minimum or are just working really far outside of their abilities.

      • spprashant 23 hours ago

        Companies are also pivoting away from mere outsourcing to setting up entire GCCs in there.

    • ern_ave 1 day ago

      > "offshore Indian devs" are no slouches.

      What evidence is there of the quality of Indian devs specifically?

      One signal I'd expect to see, for example, would be success in programming competitions. Here's the list of winners of the IOI competition [1] - India has won 3 times.

      Meanwhile, Turkey has won 4 times, Estonia has won 5 times, and Vietnam has won 22 times!

      Why should we suspect that there are more or better developers in Indian than in any of the countries that has produced more winners??

      [1] https://stats.ioinformatics.org/countries/?sort=medals_desc

      • blauditore 22 hours ago

        Programming competitions are not the same as real-world engineering, plus these countries have way more people trying to use these competitions as a gateway to good jobs. Also, many good engineers emigrate to higher-income countries given the chance, and almost none will imigrate to low-income regions. The consequence is some sort of brain drain.

        • ern_ave 3 hours ago

          > Programming competitions are not the same as real-world engineering

          That's true but irrelevant. Nothing is "the same" as anything else. My question was, what evidence is there that offshore Indian devs are of high quality. One expected signal with be ...that they demonstrate their programming skill.

          > these countries have way more people trying to use these competitions as a gateway to good jobs

          That's ridiculous!! You're claiming that Turkey, with a population of under a million, has "more people trying to use these competitions" than India, with a population in the billions????

          > many good engineers emigrate to higher-income countries

          Okay, but the claim was, "offshore Indian devs are good" - that cohort (i.e. the ones INSIDE INDIA) excludes the cohort you're talking about (emigrate to higher-income contries).

          So, unless your point is, "yes, I agree with you, there is no evidence that offshore (still in India) devs are of high quality, and the reason is that the good ones emigrated" I think this statement is irrelevant.

    • opsnooperfax 16 hours ago

      Why do you need a mid career software developer to babysit ChatGPT? Why don’t you just use an American intern who’s paid half of what in Indian developer is paid? You just can’t take the people who do your plumbing and get them to design your water treatment plant. If you want someone who really knows what they are talking about, and that’s what you need where an AI fails, then you are just going to have to pay what someone at that level asks.

      I work for a global corporation. We have offices in India. For the technical professionals I deal with the wage differential is maybe 30-50% and is actually quite a bit less than the cost of living difference. My personal experience is that there is a tendency for them to massively inflate their qualifications and level of experience to a point that Americans would call fraud. The only kind of people who think this is a good idea are people like Larry Fink, and I would attribute his motives to greed and malice, probably an equal parts.

    • ifwinterco 7 hours ago

      I'm sure there's some good ones but most are bad.

      Not because all Indian devs are bad (this is of course absolutely not true), but most of the good ones are either no longer in India or working in India but for something more prestigious and interesting than an offshoring shop

      • pritambarhate 7 hours ago

        Which can be said of any country. Most of the good devs in US are working for the best US comapnies and not for small companies with less budgets.

        • ifwinterco 7 hours ago

          Yes, but this is a strong argument against Indian offshoring purely for cost reasons if you're a half-decent company in North America or Europe that could attract decent local engineers if you tried

  • goosejuice 1 day ago

    There are developers outside of your country that are talented, speak your language competently, and willing to work for less pay. There are plenty of reasons to believe that such devs will increase in numbers.

    • jujube3 21 hours ago

      Are they willing to work for less pay than Claude?

      • goosejuice 20 hours ago

        Who claimed they would? Cost of labor dwarfs LLM costs.

    • yieldcrv 20 hours ago

      yes, every idea's guy that is pumping out SaaS slop as a last ditch effort to avoid a permanent underclass will get priced out of SOTA subscriptions and have to hire cheap offshore developers again. there are a lot of idea's guys. people with no capital and no skills, but ideas.

      but for OP's use case, people with some capital and many skills who need additional help, AI is solving a problem in a way that was not solvable before, while improving on coordination abilities and coordination velocity. Offshore developers do not come back into play here.

    • protocolture 17 hours ago

      I am sure they exist but they are never the ones that I have to work with.

    • opsnooperfax 17 hours ago

      Is that your experience from having hired one, or are you speculating?

      • zdragnar 7 hours ago

        My last job was a mix of on shore and off, and we had about a 40% success rate on the offshore compared to roughly 85% onshore in terms of people working out, but also a fairly small sample size.

        Latin American countries have become more popular for offshoring lately as you can get cheaper than US rates but still have the same or similar time zones.

freediddy 1 day ago

My friend is an exec at a US software company and they are preparing to lay off a few teams of programmers in their Eastern European locations and replacing them with a small number of US programmers + AI. He said they are much more productive and produce new features much faster.

  • repeekad 1 day ago

    I think the article is right about outsourcing but not from cheap offshored contractors, good experts will become more independent and be more enabled to support more clients with AI, meaning small and medium businesses won’t need internal as many engineers, finance, marketing, etc

  • causal 23 hours ago

    This makes more sense to me. The bottleneck for me is less becoming "understanding code" and more "understanding users". Validating the latter is a task non-programmers can do.

  • piskov 17 hours ago

    How much time do you think it would pass before that guy comes back to reality and will lay off a bunch of agents? :-)

  • Nevermark 14 hours ago

    That's an interesting reverse in dynamic.

    Implication for manufacturing: Going robots first shouldn't aim at just re-localizing manufacturing, but aim higher. Become the new outsourced manufacturing destination.

davebren 25 minutes ago

Instead of programming something yourself now you'll be forced to program through the interface of someone that doesn't speak English putting what you say into an LLM. We've peaked.

CuriouslyC 22 hours ago

The future of American frontier AI isn't API calls, it's you taking your task to OAI/Anthropic like a consultant/external entity, then getting a product or whatever back, without ever seeing a large volume of intermediate work. This is inevitable because of the combination of distillation threat and proprietary harness development effort required to push performance at the bleeding edge.

OAI/Anthropic are 100% going to try to take everyone's jobs, and "own" labor. The Chinese are the good guys here.

  • illusive4080 22 hours ago

    No, because handing a project over the wall almost always ends in disaster. The requirements are never clear enough.

  • yandie 19 hours ago

    > it's you taking your task to OAI/Anthropic like a consultant/external entity

    Good luck with that. This reminds me of the inspiration of declarative programming languages such as Prolog - you're supposed to declare the problems in such a way that the machine can solve it - rather than the imperative way where you tell the machine what to do. What they didn't realize that the definition is harder than the solution itself.

ecshafer 1 day ago

I have really been trying to get local models to work. I have tried different harnesses, tooling, skills, prompts, etc. But when I compare claude code with anthropic models or codex with gpt 5.5, vs qwen, glm or gemma and the same harnesses, the frontier models come out massively ahead. I am at the point where I just don't see the point of the non-frontier models, they waste more time than they save.

  • bee_rider 1 day ago

    The hosted frontier models are massively subsidized, right? I think the point of local non-frontier models is just learning at this point, so you’ll be skilled if/when the market starts comparing the actual price of the two different models.

  • henry2023 1 day ago

    local models are 3 to 6 months behind SOTA models with the huge benefit of not needing to send all your IP to a shady third party.

    If inference cost comes down (as it has been for the last few years) you’ll be able to run today’s SOTA in your laptop by the end of the year.

    • ghrl 1 day ago

      I would say that is highly unlikely if by SOTA models you are not just referring to coding benchmarks but more general purpose ability and domain-specific knowledge. For example Kimi 2.6, which is comparable to Opus 4.6, is roughly 500+GB large, and I don't see how that would run on consumer hardware anytime soon. Besides, this is not just about the technical feasibility, but also economically not viable whatsoever. Why should consumer laptops be capable of running such models, when they would be massively underutilized most of the time, when inference providers can produce the same results faster, cheaper and a lot more viable economically?

      • henry2023 1 day ago

        Because privacy has perceived value.

      • sourcecodeplz 1 day ago

        It runs right now on 512gb RAM Macs and PCs.

        • Our_Benefactors 22 hours ago

          It runs like shit though in terms of tokens/second and still has a reduced context window. Vs a single claude prompt can easily get into 300k tokens without breaking a sweat.

          I want local AI to be a thing but the hardware isn’t here yet, because the only options are a Mac Studio or DGX machines strapped together. RAM prices needs to crash before local AI has a chance at actually competing.

          • zozbot234 21 hours ago

            The more recent Chinese models are no longer heavily limited by context size. It can easily fit in RAM on a prosumer laptop. (You can also use swap space to extemd that, since context is only written to once per inference, thus a relatively mild wear-and-tear concern.)

            • Our_Benefactors 17 hours ago

              Claude has 1M context window for the enterprise. 128k feels like a toy in comparison.

          • ATMLOTTOBEER 3 hours ago

            You’re right, and it feels like these people saying otherwise either don’t use these tools professionally (and therefore can’t tell a difference between local/cloud models) or literally just haven’t tried running local models

            As soon as I can buy hardware for less than 5k that runs an opus 4.6+/5.5 model locally I will do it instantly

    • lurking_swe 22 hours ago

      the bigger issue is context windows. HUGE difference there.

    • fg137 22 hours ago

      "shady third party"

      If Claude hosted on AWS bedrock is not considered trustworthy, I have some bad news for you.

      • henry2023 19 hours ago

        Anthropic illegally downloaded virtually all copyrighted material in the world to train their models. What makes you think they will have even a little consideration for your IP?

        • fg137 14 hours ago

          Let's say as a matter of fact that Anthropic seizes every opportunity to steal IP.

          How is that going to work on Bedrock, when they don't even manage the infrastructure?

  • koonsolo 23 hours ago

    I came to the same conclusion. For the amount that a query costs, using Opus all the time is the cheapest option.

    • myzek 9 hours ago

      For now. The prices for using those will increase massively at some point

  • eikenberry 22 hours ago

    The point is to not indenture yourself to a corporation whose motives do not align with your own.

    • illusive4080 22 hours ago

      Their motives are to make the best product to compete in a very competitive market.

      • eikenberry 21 hours ago

        Right.. and their definition of a "best product" is theirs, not mine and probably not yours.

        • illusive4080 21 hours ago

          …the best model for agentic coding, is the top goal right now.

          • eikenberry 20 hours ago

            Good example. AFAIK they are still focusing on doing this via larger models which is a bad call. They are also focusing to hard on fully-agentic coding which, while useful to a limited extent, is not the best way to use AI for most non-trivial coding tasks.

      • kaeluka 11 hours ago

        They're trying to make the product sticky while they're still subsidizing the subscription price. The plan is to raise prices when you're addicted.

    • DonsDiscountGas 16 hours ago

      This argument applies to literally every business, and yet businesses buy from each other all the time

  • steve-atx-7600 12 hours ago

    Same. It pains me to hear quibbling about spending or reimbursing $200/month for codex or Claude plans . These are virtually inexhaustible for me as a software engineer and seem like a steal given the gains in efficiency.

    And, who wants to screw around with harnesses or define agent orchestration when Claude/codex are good at this and getting better every month.

himata4113 19 hours ago

The more likely senario is that the bottom will disappear while the top becomes more productive via frontier models.

The weaker a developer is the higher capability AI requires. The entire premise of this article does not work because it confuses weak developers with weaker ai being better than strong developers with near atonomous ai. The weak developers with frontier ai already produce products that are worse than a capable developer paired with a weak (2 year old) AI.

To clarify: Strong developers 2 years ago could already leverage AI to produce high quality products whereas with latest and greatest AI weaker developers stills struggle strong developers can now delegate more of the work to the stronger AI increasing productivity further.

  • steve-atx-7600 12 hours ago

    I a so happy that I currently work at a job with mostly competent senior engineers for once in my life. The nightmares of contractors or overhired new grads without supervision would just be so much more devastating on an organization these days.

rldjbpin 4 hours ago

context: i work closely with the "proposed".

people dispute the 1k per seat number. i have seen cases where that is the amount billed, while the actual engineer might be paid even less so.

the point we seem to forget is the local compute available to the offshore team. it might be highly optimistic to assume a capable machine for each dev which might cost orders of magnitude their monthly take home. i have seen way too many guys have simple office laptops from dell/lenovo/hp/etc. with dated ultrabook specs and at most 16gb system memory in Windows. just infeasible to assume that they could get away with local ai.

combined with network complications with some (besides corporate vpn, some may be working at times using a mobile hotspot), combined with corporate policy, it becomes a logistics nightmare to make this happen.

while equally infeasible long-term, the "per-seat" coding assistant subscription like github copilot is the most compatible to this scenario. it is an add-on to the existing software licensing anyways, and might be the most ergonomic in that sense. however, recent times have shown that it's been too cheap and subsidised to work anymore.

robeym 6 hours ago

I think quality experience and personal values are more important than engineer cost. I've seen so many shortcuts taken on outsourced work these past few years. AI also loves shortcuts. The combination of the two is not worth the cost savings.

If you value high quality work and pride in what you do, outsourced workers (who most often don't pay careful attention to their work, hence the cost) are not the solution. However, if you're just trying to get something done and don't care about it getting done right, what better way to do it than spending the least amount of money possible

illusive4080 22 hours ago

The author doesn’t address: A good engineer spends little comparative time coding versus other tasks for established projects. A good engineer understands the system end to end. Offshore developers are worse than Llama3.

zuzululu 1 day ago

I keep seeing this narrative involving Deepseek as an example of OSS LLMs but they are subsidizing a huge amount of tokens at cost and one can easily understand why they are doing it if one is not lazy and think critically.

It's still far too costly and not effective to use Local AI that can match what the frontier models can offer, especially when the inference hardware is being heavily restricted due to geopolitical risks. Claims about local LLMs somehow putting these frontier companies a run for their money I find especially doubtful in the long run.

Tokens are getting expensive because they are beginning to corner the market and will use that advantage to limit hardware distribution within and beyond the borders.

It's more likely that some workflows will see more local LLMs but those will never be the ones that require frontier model level or beat the price that a lighter smaller version of frontier model will offer to capture that tail end

  • sourcecodeplz 1 day ago

    Don't think so, from what i've heard deepseek isn't loosing money on inference.

  • logicchains 1 day ago

    >they are subsidizing a huge amount of tokens at cost

    This is absolutely false, because other providers serving the Deepseek models on OpenRouter are also able to offer very low prices, and they don't have the money to subsidize anything.

    • zuzululu 20 hours ago

      That makes no sense....OpenRouter didn't create Deepseek

      • NortySpock 19 hours ago

        I don't think your counterpart is arguing that OpenRouter created DeepSeek. Rather I suspect their argument is that there are 13 providers listed on OpenRouter for DeepSeek v4 Pro that are competing on price. (That's the default balancing algorithm in OpenRouter, roughly: weighted towards the lowest price and was available in the last 30 seconds)

        If any providers are able to turn able to sustainably turn a profit, OpenRouter allows them to compete in an open market to process your tokens (or anyone else's tokens).

        Thus anyone subsidizing tokens bears the brunt of the compute load and gains not much more than name recognition and tokens to train on, but since switching to a different provider is a matter of changing one setting in the config panel (and can be set to auto-switch based on price), switching costs are very low. Providers of open models via OpenRouter have almost zero ability to lock-in users.

        So this claim that all 13 providers are selling subsidized inference is... a tough claim to swallow. Maybe some of them are, but all of them? I assume at least some providers want to show profitablity, and are pricing their service accordingly.

        https://openrouter.ai/deepseek/deepseek-v4-pro/pricing

        https://openrouter.ai/docs/guides/routing/provider-selection

    • leonidasv 14 hours ago

      Sure, but they didn't spend on training the model. If DeepSeek is providing the model for the same price as third parties, then it's probably still losing money when you account for the training.

      • throwa356262 11 hours ago

        Deepseek bypasses CUDA and has a few other optimisation that neither llama.cpp or vLLM support.

        Furthermore, V4 pro was designed to run on 4 Huawei Ascend GPUs which are much cheaper than the nvidia setup others use, and deepseek probably also got some free hardware for their collab.

        Hence it is entirely possible their inference costs are significantly lower than other providers.

  • throwa356262 1 day ago

    Do you have a source for your first claim?

    My impression is that deepseek designed v4 specifically for cheap inference and they are not loosing money even at 75% lower price.

    • zuzululu 20 hours ago

      Do you ? Did you audit deepseek?

jillesvangurp 1 day ago

I've been pretty happy sticking with codex 5.4 medium. I don't see a good case for switching to 5.5 at the cost of going through my token budget quicker.

There are misaligned incentives here between users just trying to get stuff done and AI companies competing on having the "smartest" model that passes benchmarks and continuously does some nobel peace price winning stuff. It's mostly overkill for the more mundane stuff normal people actually do with them. It's nice to have the option when you need that. But defaulting to that is not economical and a bit unnecessary.

There's also a difference between smart models and bigger context windows. Most of the progress in the last year was simply the context windows getting big enough to fit all/most of the stuff needed to solve issues. Before then, you had to carefully manage the context to not run out of space and they wouldn't fit much more than small hobby projects.

With sub agents, the parent agent doesn't need to be a frontier model. It can delegate to smarter agents. And most stuff it delegates shouldn't need a frontier model. Wouldn't it be nice if it could decide on a case by case basis.

The walled gardens offered by OpenAI, Antrhopic, and others currently default to one size fits all "frontier" models. This is not sustainable. They should evolve to using smaller and effective models most of the time with complexity based escalation as needed based on either estimated complexity or when the small models fail. I'm guessing some open source based alternatives to these walled gardens are probably already heading that direction.

The irony here is that with a walled garden, these companies are selling a premium experience. But in the current market that boils down to burning billions of investor cash to keep the GPUs going without much hope on profitability. Eventually surviving companies are going to have to compete on quality, cost and margins. The smart approach would be to dynamically adapt token and context window sizes instead of blindly defaulting everything to the best possible. Don't boil the oceans for a simple email summary or a simple web UI. That stuff already worked well enough with models even a few years ago.

  • prasoon2211 1 day ago

    I used to be on 5.4 high for most of my work. I have switched completely to 5.5 medium now. I would highly recommend trying it out

    - 5.5 is significantly more token efficient than 5.4 - the same task takes often a third of the tokens

    - because of this, is it also much faster to do the task

    - you get high "intelligence" per token even after accounting for token efficiency - 5.5 medium is just under 5.4 pro levels of intelligence (imo). It has found tricky bugs for me that all other models failed at

    So overall, ideally you will end up with more intelligent, faster model for slightly cheaper.

    • dawnerd 1 day ago

      We trialed 5.5 and the same queries produced worse results. Not worth the cost increase. Even if there’s a token efficiency gain the higher cost wipes that out.

    • thisisembar 1 day ago

      This is embarrassing but I find 5.4-mini on Low covers a substantial part of my and my colleagues work.

      Back when it became expensive I learned to live with it and I find my "AI skills" (mainly communication) have a substantial impact on the efficiency of the model. Not saying my work is difficult, it's not, but I find there is quite a bit of wiggle room. Smaller models can still perform useful work, but you have to do the heavy lifting yourself. It saves a ton of money.

      I used to burn through 75% of my tokens in an hour or two. Now I can work all day and hit maybe 50-60% if I use it heavily.

ZeroCool2u 1 day ago

A crucial factor tech industry folks tend to ignore is how much executives value predictable costs. Cloud migrations got away with this, but still had to argue fiercely, because 'the cloud' and its serverless tech had the potential to significantly decrease overall spend for unpredictable, bursty workloads.

The usual counter-argument is the operational burden, but human capital is also a relatively fixed cost. A dedicated team of 3-5 FTEs could probably handle inference ops for a F500 company.

Meanwhile, the capability delta is shrinking fast. We have more evidence that local open-source is viable with the release of DeepSeek v4, and the industry is only trending further in this direction. Especially as we rely more on test-time compute and task-specific harnesses rather than model size.

So, if you're an executive looking at a marginal but fixed operations cost, added flexibility, and a rapidly closing gap in capability, why wouldn't you just run open-source models on your own infrastructure to get those highly predictable costs? Plus, you decrease the risk of one of the frontier

  • bitmasher9 1 day ago

    Do you really want to buy the 3rd or 4th most intelligent AI?

    There’s so much uncertainty, it seems like the safe option is to give everyone a Claude or OpenAI subscription/api key until the frontier isn’t changing every six months.

domrdy 1 day ago

For sure true for specialized ones like MedGemma (healthcare). In my testing, the 27b model is at least on the same level as frontier, and in some cases outperforms them. 4B is insanely good too for some lighter workloads. Thanks G for working on this!

samtheprogram 1 day ago

$1100/m for an outsourced engineer… am I missing something? That’s far too low. Even juniors in South America tend to ask for at least double that number before factoring in the DeepSeek cost.

  • Shalomboy 1 day ago

    I thought the same thing. The author's reference point for LCOL developer seems a bit outdated. With what we pay our teammates in Colombia, the model pushed out to 22 months before crossover.

qudat 19 hours ago

I'm still thinking this through but I was arguing this position to colleagues to some shock: LLM's are a race-to-the-bottom and frontier models will not be able to afford to work on coding specific models (or coding features at all) in the very near future.

27B is already really good at coding-specific tasks. Fundamentally, there is little innovation on the core architecture: LLMs are all designed essentially the same, with minor differences in how they are trained. They are all feed-forward multi-headed attention models; it doesn't matter if it's a 4B model or a 1T model, that's just scale.

Further, the frontier models cannot afford to innovate: they have to scale as quickly as possible to "beat out" their competition. The frontier models fundamentally will not create the next "attention is all you need" monumental jump in AI.

Frontier companies are stuck on scale with zero capacity to innovate. You cannot point capitalism at "basic science research" and expect any ROI. This is a known reality. Innovation is much more indirect and a "random walk" style of knowledge acquisition.

Finally, these LLMs are quite literally designed with a human-in-the-loop, and we do not give ourselves enough credit for how well we ourselves tool-call. We are doing a lot of heavy lifting to make these models useful and you cannot simply remove us from the equation without also removing ourselves from the training pipieline.

  • lugu 18 hours ago

    There hasn't ever been in human histore more incentive to innovate than today, and you think, the best lab won't innovate. That is crazy. It is like anyone can do AI research. Of course there will be new architectures. We just discovered the steam engine and the combustion engine is coming.

jmull 1 day ago

> (Human + an almost frontier LLM) vs Frontier LLM

I'm curious, who/what is operating the frontier LLM in this scenario?

The rest of the article is equally incoherent.

regexorcist 1 day ago

I've been saying this for a couple months now since I got decent hardware and started using my local Qwen 3.6 exclusively. I have no doubt the future for individuals and medium-sized companies is local private AI.

  • bobim 22 hours ago

    Could you share some of your hardware details for Qwen 3.6? And are you using the dense or MoE variant?

    • regexorcist 20 hours ago

      Sure, I have a 64G MBP with an M1 Ultra. The best model for me by far has been the 35B A3B, in particular the 8Q_KL unslouth variant. The dense model works but it's much slower, and I don't really see a difference in quality with a good harness.

      • koyote 17 hours ago

        What do you use as a harness?

        • bobim 7 hours ago

          This is also of interest!

        • regexorcist 5 hours ago

          I use oh-my-openagent. It does an incredible job at planning and executing by orchestrating subagents with different roles.

    • hypfer 20 hours ago

      Qwen3.6-27B-UD-Q4_K_XL can run at 45t/s with 131k q8 context on an RTX 4090.

      That is pretty usable. You could get 65t/s or more with MTP, but only if you drop the context size, which I would advise against.

      Results are better with 256k context and a larger quant, however, that's not going to fit on the 4090 you already had lying around for playing cyberpunk 2077.

      The MoE models make me rather unhappy. Idk. They feel braindead to me, but YMMV.

joegibbs 13 hours ago

Why would you ever offshore again now that we have LLMs? Offshored work was famous for its terrible quality and high prices, you'd just have to go back on everything, sit in a ton of useless meetings, make sure that you had very, very detailed design documents with every little piece accounted for.

Now you can put those detailed documents into the LLM and get a better result back in a couple of hours rather than weeks for a tenth or hundredth of the cost.

And the offshore devs are going to be using the LLMs themselves, why add another layer, level of bureaucracy, language barrier in between your requirements and the result?

  • steve-atx-7600 12 hours ago

    There are plenty of folks that either don’t have a sense of pride or ownership for the products they are associated with. Or, they just do not have a deep sense of what it actually takes to ship quality products. You’d be surprised at how some less desirable places to practice software engineering are run.

mark_l_watson 1 day ago

Great article that reinforces my own opinion but adding the cleverness of adding low cost human labor into the equation. Nice.

I spent a month comparing Gemini Ultra plan to using much lower cost DeepSeek v4 with open source coding harnesses and, spoiler alert: I was happier using the much cheaper and more environmentally friendly open models: https://marklwatson.substack.com/p/my-evaluation-of-ai-agent...

lmeyerov 1 day ago

Fwiw, the cost per answer, which is what ultimately matters, is going down. In a competitive market with oss and multiple frontier labs, it is hard to maintain a premium long-term.

The big question is how subsidies vs technology improvement will play out. As we saw with Uber, selling at a loss can happen for a very long time, and technology improves relentlessly.

For reference, we publish https://botsbench.com/ that shows time and cost per answer are going down while quality is going up.

digitaltrees 15 hours ago

This is a really interesting point. It does seem that in recent days, probably driven by the approaching IPOs, anthropic and openAI have been testing increased prices and burning tokens to see just how much people are willing to spend and in all honesty seem to be targeting the salary of an SF engineer as their ceiling. They seem to think that people will be willing to spend nearly as much on tokens an a human engineer.

bob1029 22 hours ago

> The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

We talk about capability like it's some kind of linear scale. I am not paying 30x for 30x performance. I am paying 30x so that my use case goes from "haha nope" to a signed contract with the client. Works 0% of the time => works 3% of the time is an infinite improvement in capability. That is what the premium is paying for.

ocreat 8 hours ago

Why is it always the idea that the developers the work is outsourced to is less skilled than the local developers?

  • sceptic123 7 hours ago

    For me it is more that outsourced devs lack context and long term responsibility. Plus they are generally billing an hourly rate so their incentive structure is not directly aligned with the business that is outsourcing the work.

    • ocreat 36 minutes ago

      That won't change by giving them an LLM, that will just make it worse.

leonidasv 14 hours ago

We shouldn't take free open models for granted. They're a byproduct of the current AI craze, but the economics aren't on their side. It's not sustainable. Alibaba already stopped releasing the weights for their best models, for instance.

cautiouscat 1 day ago

The dark mode version of the site makes the tables unreadable.

  • the_arun 1 day ago

    Agreed, but same data is listed right below the table.

swader999 1 day ago

I'm finding sound judgment, common sense, technical depth and breadth, a feel for the UX are skills that amplify Agentic coding. Deep knowledge of the problem domain and time with the customer (or SME's or end users) are what build these. Outsourcing this will never work, you can't put someone 12 hours ahead of the timezone your serving in front of the customer.

LetsGetTechnicl 22 hours ago

So all that hype for the AI revolution just for it to be... taking advantage of cheap overseas labor after all?

nyxtom 1 day ago

I've seen the $1000/mo engineer salary thrown around a bit and I'm not even sure where it comes from.

ianhxu 1 day ago

>frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

The contradiction here is that without frontier models, there'd be no foundation for models like DeepSeek to reference and catch up to. Is there an economic model that captures this kind of dynamic?

  • aftbit 1 day ago

    Free market competition? This is a pretty classic pattern. Leaders capture market with quality but run into trouble scaling, followers compete on price and availability. Given time, leaders eventually run out of upgrade runway and find themselves swallowed up by followers. Or alternatively, leaders think their lead is inevitable and miss a sea change or iterative upgrade path. Think IBM PCs before Compaq and other cheap clones ate their lunch.

  • bee_rider 1 day ago

    I guess they’d be hoping for very protective IP laws in that case.

  • throwa356262 1 day ago

    Hold on mate, do you realize that a significant number of recent major advances in AI came from deepseek?

youarenotyu 7 hours ago

current LLMs are too large or efficient enough for local use needs more breakthroughs on both chip side and model side

hmokiguess 1 day ago

I think the biggest pull is yet to come, legislation around sovereignty and the US Cloud Act is sort of a challenge for the US hyperscalers, these local models may have more than just a price advantage against frontier labs but also policy and lobbying.

rastrojero2000 1 day ago

It's particularly funny to me, but a minor point, that this post requires me to go through some kind of cloudflare armed checkpoint to dare read about AI.

A bigger issue is this thing calls AIs better coders than people and I have tried for the past 4 months to get one of the several I looked into to consistently produce a simple event-bus backed Java monorepo going with exactly zero success. Claude even repeatedly wanted to put my login logic at the actual event bus, for some reason.

What does "better coder" _exactly_ mean at this point?

rightlane 1 day ago

I disagree with every part of this.

Local LLMs are great and very useful but if you are claiming that their code quality is in the same ballpark as Claude Code or Codex with their best models I cannot consider you a serious person. I feel like this is analogous to the folks arguing that The Cloud is "someone else's computer." As if billions of dollars of spend gives these companies zero benefit over a Mac mini.

Regarding offshore, at least in my experience, better coding agent output is down to two factors. First, is subject matter expertise. Providing the right context to the coding agent based on the tech you are building for is beyond critical. That's the issue with the Vibe Coded slop projects. No expertise in a technology means no awareness of gotchas, React is the most obvious because the LLM default is to useEffect endlessly.

The bigger issue is that by their very nature LLMs are very sensitive to quality prompting in English. I have seen offshore devs fail endlessly because they don't have the English skills to successfully prompt the machine. That has caused more work for my US based devs to either carefully tune the work ticket so it is basically a coding agent prompt. Or to go through multi day exercises to enforce better prompting.

A single US dev with Claude Code is orders of magnitude better than typical offshore. Adding local models into the mix would make offshore completely useless. I'm sure many companies will see ballooning AI bills and expensive onshore devs and be very tempted to go to TCS or similar. I hope so, because that will give startups plenty of easy targets to disrupt.

DonsDiscountGas 16 hours ago

The article assumes 5% monthly growth in token prices continuously. That seems aggressive.

alansaber 1 day ago

Always has been. People pay for the (not so) marginal performance gains.

Our_Benefactors 22 hours ago

AI has well and finally killed the idea that outsourcing saves cost. Local AI, sure, but not outsourcing. Been there, done that, doesn’t work.

economistbob 1 day ago

Deliberately combining hallucinations with a smaller fund of localized knowledge with which to spot said hallucinations seems like a bad business decision.

the_arun 1 day ago

Premium services need to allow enterprises to self host the services to reduce cost of inference. Another advantage is data doesn't leave the VPNs.

NitpickLawyer 1 day ago

> But is the capability difference enough [..]

This is the (m/b)illion dollar question, isn't it? I think there's also a question of what do you think capability is exactly, and how the difference manifests itself.

On the one hand, when something becomes "good enough" that's a clear capability threshold. On the other hand, what's the limit of those capabilities, and equally as important, how does capability reflect on reliability?

We've seen "local models" lately improve on capabilities where they're "good enough" for some tasks. Reliability of solving those tasks is a bit harder to measure/benchmark/test. It'll get better as more people work with those models. But, something I've noticed in the past ~6months is that the frontier models are gaining a lot in both the breadth of capabilities, as well as the reliability of solving those tasks that they're capable of solving. I think this is where scaling (both compute and data) is showing, and where having more compute is simply better (more parallel exploration, more training data output, more broad data, etc).

There's also the problem of benchmarking true capabilities. The popular ones are getting old, and aren't as reliable as they used to be (not even touching on the subject of benchmaxxing, just thinking about their saturation, even with honest intentions).

So the question then becomes what will users prefer? Do you get the best of the best, or the one that's good enough? There might be a market for both, honestly. Not everyone does SotA stuff. And a lot of what people used to do in a company is probably mundane enough that a "good enough" model with "good enough" reliability can probably handle (w/ some supervision ofc).

What I'm more interested in is if things like Thaalas succeed and they get to provide local hardware that runs models "burned in silicon". That would be interesting, because speed and all the advantages of local models are a "quality" on their own. For example, right now I'd pay ~1k$ for an external hdd-sized block that can run a ~32B model that's popular right now, even knowing that it can only run that model. I have no idea if that's feasible or not, if it makes sense from a financial pov. But I'd buy one. And local inference on dedicated chips doesn't need to be "oss only". I'm sure oAI / etc would probably take the risk of licensing one of their -mini / -lite models provided that the risk of the weights leaking is small enough (and it probably is).

> This keeps a ceiling on how much or how fast the frontier labs can raise prices.

I generally agree, but from a different perspective. Up till now we've seen that the 3 labs influence each other's price points. When gpt5 came out at a radically smaller price, the others lowered them as well. Now with opus being SotA for coding, w/ 5.5 close behind, they've raised them back. Google seems to follow slowly. But there's hope that, being 3 top labs + 2 trailing (xAI & Meta), there'll be pressure once again. If any of those trailing labs manage to get to SotA again, the prices will drop once more. Some people say that open source also provides a pressure here, but I'm not yet convinced of this. There's still a question of who'll serve the models, at what scales, etc.

endofreach 21 hours ago

"I phrase can this words a way not make sense. seems! but point across still!"

AI can turn it into a pseudo-poem or a 4 pages document. Or it can just fix the grammar. But it doesn't really change the point of the sentence– nor does it fix the actual issue with it.

Similar for code: There are codebases with lots of smells and really dirty parts, yet, that are still better than methodically clean ones that just don't "get to the point".

I am so sick of all the AI bloat. People were able to hide their incapability behind unnecessarily complex frameworks or obscuring it through "clean code" concepts. Now LLMs give those uninspired people the option to invest even less of what makes worthy software and hide it in more abstraction.

Just: AHA! (AI won't)

jqpabc123 1 day ago

The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

"Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.

Energy costs are a huge factor for AI. He who has the lowest energy costs will likely be able to dictate market prices. And fossil fuels dependence doesn't look to be advantageous for AI.

  • burnte 1 day ago

    > "Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.

    I feel it'll wind up like the dotcom/fiber bubble. Way too much money poured into it, lots of expensive bankruptcies or write-offs, and a readjusted market sea level.

    • wongarsu 1 day ago

      Absolutely. We are in a phase of "free money" for AI. Just as with the dotcom bubble that leads to 1) lots of experimentation, and 2) lots of infrastructure buildout (which includes AI model training). Once the money dries up, some infrastructure (including models) will turn out to be profitable, most won't. And some experiments will turn out to be successful, most won't. Lots of useful things will come out of that, both the failed and the successful attempts. Just as the dotcom boom payed real dividends 5-10 years later and laid the groundwork for the world we have today

      • burnte 1 hour ago

        I agree. I think when companies go bankrupt due to bubbles it's healthy for the economy at large. Capital gets redistributed to other employees and other companies that can build something valuable, and bad business people leave the market.

  • GodelNumbering 1 day ago

    > lowest energy costs will likely be able to dictate market prices

    This is a good insight. I think everyone has seen that chart China's electricity generation going parabolic vs the US. That combined with cheaper yet equally good talent means at least in that segment, the closed labs won't catch up anytime soon

    • andsoitis 1 day ago

      > the closed labs won't catch up anytime soon

      Which closed labs won’t catch up to whom?

      • frank_nitti 1 day ago

        Not my comment, but I’d venture to guess they’re referring to the likes of DeepSeek et al, who are/will be able to host their top-tier inference infra more efficiently

        • seniorivn 1 day ago

          right now the most likely outcome is that they are going to host locally produced much more power hungry chips, and even if the lead on electricity production will stay, it will be eaten by inefficiency of the hardware.

          • CuriouslyC 1 day ago

            Unlikely. We have a big lead in terms of general computing devices, but China can leapfrog us with ASICs. They might still lag in the training space for a while but in terms of serving inference, USA is absolutely COOKED at the low-mid end.

      • GodelNumbering 1 day ago

        I should have expanded, but basically, the OSS models becoming more and more capable to solve all day to day SWE coding needs will take a cut from frontier labs revenue.

        Not to say that frontier labs won't make progress, but the bar for a sufficiently capable agent is all the OSS models need to meet to make this happen. I imagine a lot of hybrid setups where something like Opus is used only for planning/architecture, and anecdotally, the real token consuming part is implementation not architecture.

    • rgbrenner 1 day ago

      > China's electricity generation going parabolic

      Even if we all switch to Chinese models, the west isn't going to be running the model on Chinese servers... and the majority of costs are from inference.

      > cheaper yet equally good talent

      China has tech talent, but this isn't a 3rd world developing nation. Chinese AI researchers are getting paid $10M+ USD/year salaries.

      Also they're equally good, but somehow consistently behind?

      • CuriouslyC 1 day ago

        Training models is as much art as science at this point. There's no gap in scientific acumen at Chinese labs, but the US has more real world experience in the art of training large models, and the US has the capital allocation lead.

        • Npovview 1 day ago

          Yes but when the Heads of CCP make something their target they chase it with all their might. Read the recent news of the fact that Chinese AI researchers can't leave China. China is now going after the Diamond industry of India.

  • EGreg 1 day ago

    This sounds to me like the Bitcoin bros. Yes, the first-gen technology was very energy-heavy, but afterwards people (bitcoin maxis and people who held the bag) kept insisting that all new technology is “shitcoins” and that everyone should just buy bitcoin.

    Actually, platforms that serve many customers can bring down the costs tremendously through caching, and don’t need the AI credits as much: https://safebots.ai/costs.html

    • Hamuko 1 day ago

      Bitcoin is a poor analogue for much anything since it's very much designed to be energy-heavy.

      • EGreg 1 day ago

        Oh, and neural networks doing a huge number of floating point operations per word is not energy-heavy?

        Training these neural networks every few months isn’t energy-heavy?

        Both Bitcoin and these large models weren’t “designed to be energy-heavy”. It was a consequence of first-gen design decisions to solve a specific problem. Then as time went on, costs went down and they became a huge outlier in terms of energy. The question is whether the bagholders (the AI companies that invested untild amounts into the initial training) will fight to keep people using their tech and fearmonger about everything else.

        • Groxx 1 day ago

          Bitcoin is pretty much explicitly designed to use as much electricity as the market will allow, without becoming any more useful. If you removed 99% of the miners from the current system, Bitcoin will still be exactly the same - it won't be any faster or slower, and the same number of transactions will flow through. The cost of electricity serves only as a lower bound on the expected value of a coin.

          Neural nets on the other hand generally show more capability as you add more compute power. There's a point where it's less valuable than the cost increase, so people don't do more than that, but it isn't constant value like Bitcoin.

          • EGreg 1 day ago

            It wouldn’t be exactly the same, because if you had all that mining capacity and 99% magically took a holiday, there is now enough mining power to take over the network anytime. It’s not secure.

            Same with AI. Now that the Mythos and other models are finding exploits in every code base and anyone can run them, you can’t afford anymore not to keep burning credits securing your code base. It’s like proof of work red queen theory. You have to run faster and faster just to stay in place. Great business model.

      • iwontberude 1 day ago

        Bitcoin is a good analog because the goal was to create durable trust. The energy utilization is just a means to an end of fairly distributing new tokens to members of the network. There are many other schemes they could use and have considered adopting. The energy use is not necessary, it’s sufficient.

  • pjmlp 1 day ago

    Energy costs and privacy.

    Currently the projects I am involved require devs to use approaches like Ollama, Foundry Local and co if they happen to have good enough hardware, picking the best alternatives out of https://www.canirun.ai.

  • treis 1 day ago

    Historically the winners in software have a flywheel that turns faster with more users. Facebook the more of your friends on it the better the product was. Google tracked how long users were on pages to improve search.

    The frontier models are going to win that way. They won't feed your code back into the system but they will track which code you keep and what code gets a "try again claude".

    They're not going to lose on price. No consumer software ever has because ultimately it's not that expensive relative to salary and the marginal cost is 0.

    • throwfaraway4 1 day ago

      >They're not going to lose on price. No consumer software ever has

      Lists examples of software that are free to the users

      • Npovview 1 day ago

        I want AI to go the way of Linux. I hope we see that future.

    • aftbit 1 day ago

      The marginal cost of AI is not 0. That's one of the big differences between this and older SaaS software. Inference costs a lot of money. Even if you're looking at just capital depreciation, it's quite expensive. I suppose it's more accurate to say marginal cost is stepwise - adding 1 new user is 0 cost if and only if your existing inference hardware covers that user's usage. As soon as you need a new server, adding _that_ new user costs ~$20k/year (assuming 100k server and 5 year depreciation).

      This is true for traditional SaaS too, but the number of concurrent users that could be served by one machine and the cost of the hardware were both at least an order of magnitude better.

      • jqpabc123 1 day ago

        The marginal cost of AI is not 0.

        In other words, AI is not your daddy's software. Comparing AI with old school software markets simply does not compute.

      • treis 21 hours ago

        It's not literally 0 but for most casual users it is. Programmers are heavier users and the ones that have 10 agents going wild are even more so. In that case the cost is above 0 but economies of scale means the effect is the same.

    • Npovview 1 day ago

      Exactly the CC sessions flywheel is a treasure trove of data and they all know that. The reason we went to stackoverflow was because there was data (upvotes/downvotes, comments, workarounds) discussed under the answers. That is a very high quality signal from the field.

  • Aboutplants 1 day ago

    I’ve been on this issue for a while now, models are not going to matter as much in the future. Pure energy cost will be the determining factor in who is most successful. The US just cannot build cheap energy the way other China can and at the scale that China will build it. 10 years from now it will be seen as the single source of advantage

    • SpicyLemonZest 1 day ago

      If the cost of software development falls so precipitously that energy costs are a driving factor, that implies so many other changes that I don't know how we can trust any analysis of what would happen.

    • tpolm 1 day ago

      > The US just cannot build cheap energy

      Nuclear power anyone?

      • dboreham 1 day ago

        Cheap.

        • tpolm 1 day ago

          What is expensive in nuclear energy? Reason there is not more of nuclear reactors is not the cost, it is regulation. Regulation can be changed (it also seem to already have, recently, IIRC - starting 2024 NRC law changes by Biden admin and later by Trump admin)

          • jqpabc123 22 hours ago

            Reason there is not more of nuclear reactors is not the cost, it is regulation.

            The reason for regulation is that failure is not an option --- unless you're willing to accept the cost of making a big chunk of a state uninhabitable for a very long time.

            How much would failure cost? The Chernobyl exclusion zone is over 1000 square miles --- about half the size of Delaware. And it is expected to remain uninhabitable for the next 20,000 years.

            Also, the Russian Academy of Sciences estimates that up to 1 million people may suffer premature death as a result of radiation exposure and contamination from the event.

            In the long run, renewable energy is a lot cheaper.

    • dboreham 1 day ago

      Same as bitcoin then.

  • Aurornis 1 day ago

    > they may have inadvertently priced themselves out of the market.

    Last week we were all talking about how Anthropic has too much demand, how they had to rent a data center from a competitor, and how the limits they’ve put on their service to deal with the demand are making users angry.

    DeepSeek is cheap because they’re working hard to attract users.

    The open weights models released for free weren’t free to train. It’s a loss leader to get attention to try to sell you something in the future.

    The prices we pay for tokens right now are set by supply and demand, with some being sold at high premiums and others at a loss. Some models are given away for free after the companies spent money on researchers and compute.

    • aftbit 1 day ago

      Yes and no. Just take a look at the OpenRouter providers page:

      https://openrouter.ai/deepseek/deepseek-v4-pro/providers

      Deepseek v4 Pro is much cheaper when provided by Deepseek itself, likely as a combination of the loss leader strategy you mention and the desire to have more data flow through their pipeline for training. However, the same open weights model, provided by other providers, is somewhere in the $2-3/1M output-tokens range. Compare Opus 4.7 at $25/1M output-tokens.

      Unless you mean that releasing open weights models is the loss leader, in which case, you might be right but I hope you're wrong. We've seen some of this from Qwen at least - their latest model is closed only. I hope there's always someone willing to make this bet and release better and better open models.

      • Sebb767 1 day ago

        > I hope there's always someone willing to make this bet and release better and better open models.

        What would this bet be? Training is expensive and open weights mean that for hosting you compete on price with people that don't have this item on their bill.

        • aftbit 1 day ago

          "Attention is all you need" - the larger bet is that by releasing your models open-weight, you'll get more attention and mindshare than if you tried to jump in to compete with the major closed providers, and the value of that attention will outweigh the cost of the training run.

          So far, it's really only the Chinese labs (and FAIR or whatever Meta's project is called now) that are doing this. Oh yeah, and Google's Gemma.

          At the moment, this is all massively distorted by the prestige and investment money flowing into the space. None of the labs have to charge the real cost of inference let alone the marginal cost of training because they are instead lighting investment money on fire to cover that.

          One imagines (though I have not investigated in detail) that there's a degree of national prestige work going on too. The Chinese labs are trying to show that they can build better and more efficient models and are releasing open to undercut the US labs.

      • Aurornis 1 day ago

        > Unless you mean that releasing open weights models is the loss leader, in which case, you might be right but I hope you're wrong.

        This is specifically what I meant.

        DeepSeek’s official service is trying to recoup some of the training and engineering costs too.

        The other providers only have to recoup their hardware costs and the cost of a team to run it.

        Even though DeepSeek’s official service is more expensive per token, they’re running at a lower profit than the OpenRouter providers because they had to pay for the R&D.

        This is a deliberate choice. We already see it with Qwen splitting their releases between open weight and hosted only models. The open weights are a loss leader to get attention. Without them you’d almost never hear about their hosted models.

        • aftbit 23 hours ago

          Except DeepSeek's official service is _less_ expensive per token, which suggests they're underpricing it substantially as well to attempt to draw more attention / more data.

themafia 16 hours ago

What if.. and this is a big if.. but.. what if we just paid people what they're actually worth?

scotty79 22 hours ago

> Human + an almost frontier LLM

I tried this. My role as a human boiled down to recognizing when I need to switch to frontier model for the last mile.

lowbloodsugar 1 day ago

If IT is a cost center, then a company has likely already outsourced (and if it's called IT it probably is). If you are a software development company, that makes money from software, then a local team of SDEs using what-ever AI they want is a competetive advantage vs local team of SDEs trying to deal with an 11.5 hour gap to India. AI is coming for software developer jobs, and its coming for: a/ the low skill ones and b/ the high skill ones where turn-around and iteration matters. I've worked with great engineers in India, but the time difference was brutal for our fast moving business.

crimsoneer 1 day ago

I think this is a compelling argument, but I think 2 issues:

1. I remain unconvinced LocalAI can work well for majority of businesses. It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.

2. Similarly, while Deepseek is comparable to Opus/Codex on benchmarks, for agentic work at scale I definitely notice the difference. That's not to say it's not economical, just that I definitely miss the big boys when I swap.

I kind of wish this was true, because the UK would be in a great place to compete with the US. But somehow people are happy to pay 3x the salary for an engineer in SF.

  • GodelNumbering 1 day ago

    Fair points. I used to think that until some months ago but the latest generation of OSS models are surprisingly good. Plus maybe it is the way I work, but I find myself constantly overriding the decisions of frontier LLMs (because they start degenerating towards god objects and spaghettification) so most use I have gotten out of the AI agents is really their ability to code quickly and syntactically correctly.

    Also worth noting that it doesn't have to be full either-or, there can be a two tier enterprise deployment that routes to locally hosted vs frontier model, over time more and more usecases could get routed to local LLM

  • aftbit 1 day ago

    I wish Deepseek could read images. I've been having good luck guiding it around on personal projects, but anything that needs to render to a screen really needs to be looked at to see bugs.

  • hobofan 1 day ago

    > It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.

    I'm working on an self-hostable LLM (web) UI[0] that aims to provide a comparable good UX to e.g. ChatGPT, and you are right that there is a decent amount of fragility involved, and more management overhead than most people would expect.

    However, we usually find that those details happen a lot more in e.g. the harness (= out application), or some prompt tuning that's required for each of the models, rather than model quality itself. We have seen customers using self-hosted LLMs with similar user satisfaction across their organization to other customers that heavily lean on latest GPT-5 models on Azure. Especially given that you have to do some level of tuning and setup anyways, you might as well invest it in "local"/self-hosted AI (if you can make the financials of the inference cost work out for you).

    I think it should also be noted that the inference providers on hyperscalers also tend to be quite fragile, each in their own way (e.g. Google with a horrible rate limit system or Azure with almost weekly intermittent 500-error incidents).

    [0]: https://github.com/EratoLab/erato

dyauspitr 1 day ago

Only if you don’t allow construction of local data centers

  • joe_mamba 1 day ago

    I can name one big country that won't disallow data centers.

  • rgbrenner 1 day ago

    US has over 10x the number of data centers as China; and produces 2x more energy per capita than China.

    • chrisweekly 1 day ago

      what about energy consumption per capita?

      • aftbit 1 day ago

        What about it? Energy production basically has to equal energy consumption in the medium term, so if the grandparent comment is correct, it is 2x per capita.

        Dunno how trustworthy this source is, but it says ~35 MWh/person in China and 77 MWh/person in USA.

        https://ourworldindata.org/grapher/per-capita-energy-use

mahmedalam 1 day ago

First fix your website navbar and hero on mobile that was broken, and it shows that you vibe coded a slop!!!

Stevvo 1 day ago

I don't see local AI taking off. Memory costs make it impractical. Deepseek API pricing is not a suitable analogue because it's not local.