Folks have been saying “things are different now, the agents are now compounding success instead of error” for at least a year now, but I just don’t see it. I was lucky enough to receive a weeklong $50k per head AI training from the people saying these things, and one of their few helpful concrete recommendations was to constantly clear context all the time, to avoid things going off the rails.
However, I think finding security vulnerabilities is one use case where it doesn’t matter. Tokenmaxxing is absolutely effective for that. We as an industry are in the middle of adopting very expensive, complex continuous fuzzers.
Even modern frontier models benefit so hugely from careful context pruning, maintenance, and rewriting to erase mistakes that it's astonishing to me that there are no tools centered around it. The one tool that used to have such a feature, Zed and its retroactively-named Text Threads, has now stripped itself of it.
this! the back-and-forth chat interface where you can edit only your own messages, and only then to get a new response, is a terrible one, but I think favored by vendors because it helps them fight in vain against prompt injection. Custom harnesses and stuff are nice but incredibly time consuming to set up when all I want to do is like, see what the agent is reading, and editing out some irelevant nonsense side quest it went on, or trim some massive log file it read which filled 90% of its context. Theres no inherent reason behind some caching gains that these things must be strictly chronological - a response it gave me previously does not have to be part of the context now
"some caching gains" is a pretty huge understatement- snipping something out of the middle of the window requires rebuilding the entire context. Thats a shitload of tokens.
Afaik messing with the context also pretty reliably degrades performance still. The model responses reference things that no longer exist to it and it becomes more chaotic.
The real usefulness of parallel or sub-agents is not that they run at the same time, its that they isolate noisy or self-contained context away from the main window.
You can structure your context window to minimize the amount of editing you do further back. You usually only need to edit and correct the most recent response. It's little different from forking the conversation at an earlier point, and nobody warns about that being a sneaking footgun. There is still a prefix to cache.
I still feel that during agentic workload sometimes it would be nice to have the model identify it is veering off the main track, send out a "keep the cached states and tell me which they are" command to the inference server, do the side thing (such as handling an error that plopped up that has not that much to do with the main task) and return back to the cached state with just a comment tacked at the end to say "oh and btw I fixed DNS" instead of having the DNS debugging inside the context window now. Maybe other harnesses just steer the models more towards using subagents for such tasks and my pi is misconfigured. I can use the tree feature, but having insight into what's cached would be nice there.
Some companies only get to a "hello world" level with a new kind of tech via a 50k per head training. The organizations are setup in a way that people can't experiment or learn by themselves, it's really the only way.
Who delivers a 50K per head training? Who pays for such a thing?
How many people were in the class?
I had a small training company, shuttered during COVID, and I used to charge 5K per day, for a group of up to 12 people. 5 days training = 25K. This is double, wow.
I would love to get back into training but getting enough volume to live and support a family on has been a challenge.
It's true, but I see it happening. I’ve watched seniors with 30+ years of experience adopt them successfully without losing their classic rigor.
Personally, I get huge mileage out of LLMs, and yes, I care deeply about code quality, readability, and debuggability.
I've seen juniors absolutely rock with them.
And I've seen the exact opposite, where they just struggle to get good results.
In the end, I think the divide comes down to management experience. The people thriving are the ones who have led teams, especially teams of contractors, which is the best analogy for how you have to interact with an LLM.
Those folks know how to break down problems, provide the right context, and scope a task just enough to see the "contractor" succeed before letting them move forward.
On the other hand, individual contributors who are used to just grinding solo often struggle. They expect a one-shot miracle. They say, "Hey, my code is buggy, fix it." When the LLM inevitably hallucinates or steers them wrong, they give up. The results are completely different based on how you treat the tool.
They might just have a high quality of control and standards that it is hard to find that pattern with the LLMs.
I think fierce individual contributors are a lot more valuable in the era of llms as well. We as humans typically achieve better balance with new stuff when we allow backlash from new processes that start to trample on old ones without understanding AKA the Chester's fence.
I remember there being a lot of phrasing in the announcements that implied they didn't know Second Life existed. Kinda seemed like they thought it was a completely new idea.
The implication that tokenmaxxing was an intentional and thoughtfully considered approach rather than blind hype-following by an overpaid manager class who are too far removed from value to understand the downsides of LLMs is hysterical beyond belief.
Yeah, the rationalization after the fact is kind if absurd. IME, the reasoning underlying tokenmaxxing at the corporate level was "we need to leverage AI as much as possible as fast as possible because we're scared our competitors will find some leverage before us".
Definitely not some measured, long term, rational out of the gate.
Worse, tokenmaxxing has been pushed by the labs hoping to charge those tokens by the pound on their API prices eventually, even if temporarily hiding such costs behind "highly subsidized plans" or frequent bug-induced "reset buttons"
I would wager most if not all of the tokenmaxxing was done on enterprise API priced plans, not subscription plans. You can't actually token"max" if you are limited in the amount of tokens you can use per 5 hours.
Enterprise plans weren't usage-based until recently. Lots of enterprises were bait-and-switched by AI companies from flat per-seat fees, and now have to go on a token diet to rein-in budgets - which is the blog post's premise about tokenmaxxing being dead.
I really don't understand this take. If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
The goal isn't to have people work at converting wood into sawdust, the point is that if you wanna see if the tools are working you wanna see proof they're actually being used.
I'm sure there were some people cargo-culting this stuff, but suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting.
Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
(Of course, we've all had bosses that went to some marketing seminar and come back having been tricked^Wsold into buying some wizz-bang widget that we need to now integrate because of a sunk-cost fallacy, but I thought everyone was on the same page that this is not how normal procurement was supposed to work.)
> the point is that if you wanna see if the tools are working you wanna see proof they're actually being used.
That is way too charitable, people were being fired based on these metrics and people were absolutely talking about token burn as being a metric for productivity (do I really need to link the Jensen Huang quote?). That isn't an indication of this hysteria being based on "just trying to see if the tools work".
If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
So do a workshop on power tools, measure their efficacy and the quality of the result, do some demonstration videos on power tools, get people to compare, seek feedback on their usage. Don't count electricity and sawdust, or you'll find people getting very good at expensively turning blocks of wood into sawdust.
Is the idea that most stubborn employees would adopt AI if their company made videos showing internal metrics that AI is better?
> Don’t count electricity and sawdust
I agree that it seems wasteful, but is there some better way to accomplish it at the scale of hundreds, or hundreds of thousands, etc? I'm personally doubtful that stubborn employees would switch even if a video provided internal metrics, videos, etc.
Nobody started out with that conclusions. The tests and experiments and workshops and consulting with select employees you propose, were all done 2-3 years ago. Results are in, but a chunk of population decides to ignore them and obstinately continue believing and claiming that it doesn't work, which is a conclusion they started out with.
What? The basic properties of water are scientifically defined as best we can. There is mathematical proof that 1+1=2. Do you think science starts from nuclear physics or differential equations?
We couldn't build any of this stuff (rockets, LLMs, heart medicine) if the foundation was ill defined.
I think it's the second time I run into you like this, Temporal. I wish HN had a way to classify you as an "AI booster" or equivalent.
> What? The basic properties of water are scientifically defined as best we can. There is mathematical proof that 1+1=2. Do you think science starts from nuclear physics or differential equations?
Yes. There's a lot of interesting things science has to say about water, very specific claims that took a lot of effort to discover, precisely formulate, and reproduce.
We're not talking about those. The whole LLM discussion on HN, as well as in the wider industry, is still stuck at the state where a large (or vocal) group of people refuses to believe water is wet. Yes, there is a similar group that tries to sell water as miracle cure, I'm not denying it - IMO both perspectives are dumb and entirely detached from obvious observational evidence that you can collect for ~free at home in 15 minutes. Example will follow.
There exist the equivalent of foundational, detailed studies on LLMs, at every level of rigor imaginable (with a caveat, it's hard to rigorously prove anything useful in software engineering; it's still largely opinion-driven field). But they're not part of the overall "AI hypers/haters" dynamics.
> I wish HN had a way to classify you as an "AI booster" or equivalent.
You can take any of the LLMs and have it vibecode you a user script in under 5 minutes, than you then can paste into Greasemonkey/Tampermonkey, and voilà, you have me labeled as "AI booster" or filtered out.
In fact, let me help you, I'll time it. I opened chatgpt.com in incognito (to emulate being a rando free user), and put the following prompt in:
> I need a user script I can paste into Tampermonkey on my Firefox that will clearly label user named TeMPOraL with robot emoji and some silly emoji, so I never forget when reading their HackerNews comments that they're an unapologetic AI booster.
This is the promised empirical example. It doesn't prove everything, but it proves something, and it took, end-to-end, a total of 1 minute to perform just now. You can collect many such examples over a single day by just trying. People who keep saying AI is useless and a fad and can't do anything useful, obviously never bother with even that.
FYI: I'm not an AI booster. I like AI, and I find it useful, but I'm not going out of my way to boost it. I just enjoy this topic, but more importantly - and I remain consistent in this - I point out bullshit that doesn't agree with obvious observable reality.
EDIT: try the example yourself, and post whether it works for you too - if it does, it's technically a peer-reviewed, replicated study, but I doubt it'll convince any of the naysayers of anything.
EDIT2: I have plenty of negative things to say about LLM capabilities and how irresponsibly people use them, and I do occasionally write about this (mostly at work, these days), but most HN threads on AI are not on this level - not anymore. They used to be more reasonable back in GPT-4 days.
They're not "3-4 trillion dollars in investments over 5 years" useful, nor "crammed into the throat of every employee on the planet, regardless of their actual job" useful.
The way they are pushed right now will lead to a very hard crash and probably lots of suffering.
Also, you need a more advanced prompt for Firefox on Android :-p
> They're not "3-4 trillion dollars in investments over 5 years" useful
Why not? They're a general-purpose technology, in the same category as "software" or "electricity".
> nor "crammed into the throat of every employee on the planet, regardless of their actual job" useful
They're potentially useful for anything that can be fed into computers (VLMs lifted the "that can be expressed as text" limitation, visual and audio tokens are not a separate category to text tokens anymore). That touches every single job people do in some aspects. Even though LLMs can't do physical work for people, they're still able to help with directing it and teaching it.
"Cramming into the throat of every employee on the planet" was already covered by many comments here, and the article itself - it's about forcing the obstinate holdouts to at least try.
> Also, you need a more advanced prompt for Firefox on Android :-p
No I don't; literally copy-pasted it to Tampermonkey on my Firefox on Android just now, and it works there out-of-the-box too.
Regarding LLMs, they are pushed too hard and too abusively by business people. Employees are being laid off and replaced with chatbots that don't do the job. Frustrating if support for McDonald's, risky if health insurance support. Also the financials don't make sense. AI companies are money pits. Money is ultimately production. We make X amount of stuff yearly, globally. We can't afford to through away 5% of X yearly on technologies that will probably have a proper return in 5 or 10 years. When we mis-allocate resources on scales like these, people die. Look at Communist centralized planning. For $3-4 trillion we could have solved a LOT of actual global problems.
LLMs are fine but they should have matured in the software dev domain for 2-3 more years and then non tech products would have followed.
At the risk of abusing the analogy further: many people aren't refusing to believe it's wet, they're observing that sulfuric acid is also "wet" and can look similar upon visual inspection, and there's a lot of harm coming along with the demonstrated capabilities, in addition to those capabilities themselves being fickle and inconsistent (not a desirable property for a good technology).
This isn't a problem of "doesn't know what AI can do"; yes, some people are misinformed, but you shouldn't dismiss all refusal to use AI as being misinformed. This is a problem of "knows what AI can do, and based on that informed position thinks it's terrible and should have careful guardrails around it".
As a matter of fact, Nature does regularly publish papers about wetting properties of water. In fact, it just published one last week, from Nature Physics:
Scientists find more or less everything very interesting, even (especially?) things that are supposedly self-evident. You can both make a big splash disproving self-evident things, and much can be learned from it.
Yes, "results are in". They're all over the map, about productivity, about stress and churn, about trust, about public sentiment, etc.
But sure, if you want to tell people their productivity will be measured by token usage, they will certainly respond to that incentive by setting your checkbook on fire while they work on a job search.
If a company wants to provide AI accounts for people, along with guidance for usage and non-usage, that might well make sense for some jobs. It certainly makes sense for some uses. If they start measuring token usage, that's even worse than when companies tried to measure lines of code written.
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
because that would require actually admitting that employees are the people in an organisation who are responsible for the success of that organisation, rather than the people higher up the org chart.
And here we are. AI use mandates are a humiliation ritual, at least how I've seen them. Because it's not just a matter of making the employees use AI; public criticism or speaking about the drawbacks are also punished. It's get totally on board or get out; if you're not completely gung ho, despite the testimony of your lying eyes, maybe you don't have what it takes to work here, son. It's something they use as a shit test, just like the North Korean dogma that Kim Jong-il scored a perfect 18 holes-in-one every time he stepped on the course: are you willing to compromise your values, to the point of mouthing naked untruths, in total submission to the company's leadership?
Do you actually have a job? Do you talk to your coworkers?
This is an insane take. Plenty of people are critical of AI at my job despite a big push to use it. I find the comparison to NK distasteful, coming from someone who presumably is pretty well paid and can quit their job whenever they want.
If you're feeling humiliated... well, I don't think it's because your boss wants you to try AI.
Having been the guy to speak the uncomfortable truths at such meetings, I can tell you that does not end well for anyone. Expect to look for another job shortly afterwards.
Narcissists, non-violent sociopaths, and control freaks end up in managerial positions (often more likely than the general population). The pointy haired boss in Dilbert is a popular representation for a reason. We've all been subject to degrading and/or stupid management trends (see also: https://ibb.co/Kx46rqkg ), and while in the tech industry we had a golden age were the engineer was king, that's been chipped away even before AI became mainstream. Also, hyperbole is a thing. :-)
The logic of trusting employees who are worried that power tools will replace them to utilize power tools effectively is completely backwards in any sane world. People don’t like change, sometimes it needs to be forced on them.
> who are worried that power tools will replace them
maybe, just maybe, it would have been a better idea to engage with employees first rather than posting on linkedin about how everyone is going to lose their jobs.
cos it's the kinds of people trying to force this stuff on employees that are the ones who have been shouting about that from the rooftops.
If you take LinkedIn at face value everyone who uses the Internet is a sociopath who lives for no purpose beyond maximizing shareholder value.
Seriously, some of the most deranged things I've ever read were by relatively normal people trying to promote themselves on LinkedIn.
What people SAY does not matter nearly as much as what everyone KNOWS and it's pretty damn clear that AI is never going to be able to replace humans in complex domains. Every time a frontier lab announces a breakthrough it's pretty obvious that the setup was more complicated than "hey chat prove the Riemann hypothesis."
The world is gonna need skilled human beings to drive LLMs, no matter how desperately some people like to pretend otherwise.
Doubt. People brought in all kinds of web applications in the early Web 2.0 era because corporate IT was being too stingy (for a lot of reasons). People will find efficiencies on their job on their own. No need to denigrate them.
I don’t know, at my company at least tons of devs were holding out on ai usage until the token maxing stuff really started. It was beyond clear by that point that coding agents were a productivity multiplier.
A lot of people believe that. Not a lot of evidence on the table for it (it’s not agent developers’ fault; empirical studies are expensive and rarely live up to scrutiny). Not sure it’s worth forcing people unless you like malicious compliance.
Well here’s where you can level valid complaints against management I think. “Move fast and break things” doesn’t line up super well with “wait for empirical studies to back up your suspicions”
For sure. Just because the studies are incomplete or difficult doesn’t mean they’re useless. We still do unit testing and type systems continue to get more sophisticated and spread further because we believe they have an effect on quality and productivity regardless of the lack of evidence.
However it takes some taste in engineering and perhaps some mathematical sophistication to figure these things out. “Just use AI,” is not a very convincing argument either.
Yeah but if you can't attack the workers and make them hate their lives, are you even a good capitalist? Didn't Milton Friedman die for our bosses right to stomp on our faces in the pursuit of profit?
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively?
Are you suggesting that changes to new production technologies are always driven bottom up by line workers? I'm guessing that historically that's rare.
Historically that rarely happens because industrial equipment is/was generally too expensive for the average worker to purchase on their own, plus workers usually have a budget of roughly 0 to buy extra tools, especially expensive ones.
But to give you an example, also roughly 0 companies made developers use Linux and still many developers choose it, so bottom up improvements happen in a decent chunk of cases. Nobody paid for PostgreSQL promotion. Or Python, etc.
> so bottom up improvements happen in a decent chunk of cases. Nobody paid for PostgreSQL promotion. Or Python, etc.
It does, but for better or worse, it's an anomaly. Even now, maybe nobody was paid for PostgreSQL or Python promotion, but modern OSS tools and programming languages usually have a business backing it. Linux, too, wasn't commercially promoted until it was; RedHat isn't exactly a charity after all.
Conversely, no one paid for initial AI promotion either - ChatGPT exploded organically after release, and for the first year or two, companies had a problem because a good chunk of their staff, including especially non-engineers, discovered just how useful it was and wanted to use it at work, casually violating every internal policy, bylaw and even regulatory policies about data sharing. The massive spend on promotion - including first-party spend - came later, but at that point it was already obvious ~everyone is going to be buying it.
I suppose bottom-up vs. top-down may be in part about how mature a technology and industry is.
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
For one, software tools are cheap, especially with OSS in the mix. You're buying one "tool" and paying for operational expenses that scale with total usage across all company.
But secondly, and more importantly, the "consulting" and discussing was done over the period of last 3 years, by ~1 year ago the high-level conclusions were pretty much locked in, the worthiness of the adoption was blindingly obvious at that point, so I can see why tokenmaxxing would be where this ended up, even though (here I disagree with the article a bit) the tools aren't at the "compounding correctness" stage just yet. It's really quite simple: the stick didn't work (telling people in increasingly direct ways to try using AI for stuff), so they tried the carrot.
$deity knows a good chunk of engineers will inadvertently fall for any trick that involves a scoreboard. That holds even when they're perfectly aware they're being tricked.
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
Again, they did that, they've been doing it continuously over past 3 years. Some people are excited, some people don't care, but some - a population that's definitely overrepresented in HN comments - just stubbornly refuse to try. Now that the answers are in, and they speak in favor of AI, the companies are doing what "any normal employer would": trying to get the stubborn employers to do their job they way their bosses want them to.
(In fact, normal employers would be more eager to fire people who keep refusing top-down instructions - but it's also obvious this technology is experimental; the models and harnesses get more powerful faster than people can learn to use them - so carrots make more sense than sticks in this transition period. Stubborn people begrudgingly using those tools offer an entirely unique perspective and explore use cases and approaches that you won't get from excited adopters.)
You don't need peer reviewed studies to tell you water is wet.
Peer review is a technique to get evidence from data when SNR is low. It's not "science", it's just a technique. So is "throwing shit at a wall and seeing what sticks". Don't turn techniques into rituals, and science into religion.
Vibes are not evidence, neither is a curated demo. You need actual measured evidence that has an adversarial review to actually prove something without falling to confirmation bias.
I think GP is being sarcastic, and pointing out that
1. "heavy rock falls faster" is what common sense will tell you (I was literally told this by multiple laypeople just a few days ago when sightseeing atop a tall tower)
2. This is disproven by a trivial experiment that nobody thought worthy of trying for millenia
3. therefore we do need peer reviewed studies to confirm even "obvious" knowledge.
Also, note that GP's parent post about "water being wet" is quite the subject of contention in scientific and philosophical circles, so that wasn't the best example either.
Because people don't know what they want until they have and use it. Faster horses, etc. One can only really implement systemic change from the top down, as Moloch indicates.
Because Japanese hand tools are objectively less efficient than power tools in a carpentry shop. The guys that want to use hand tools can go work in a boutique that charges a premium for that level of craftsmanship. If you told them to use power tools, no amount of utility would convince them to use them, with most of their justification being psychological. Also, "It is difficult to get a man to understand something, when his salary depends on his not understanding it."
>If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
I run a small business with two employees.
N=2 here, of course, but one of them will experiment with any new process you introduce (as well as plenty more that you don't!)
The other will keep doing what he's always been doing, even if it's frustrating and inefficient, unless you monitor him and force him to use the new process.
I could imagine most "normal employers" would understand that both type of person exists and, assuming you're getting good first impressions from group A, it's usually better off in the long run to shove the new process down group B's throat.
(This isn't to say that the "Group B" employee is less valuable or anything - he is more conscientious and reliable than anyone else we've ever hired - but just that different people need different management styles)
In my experience, your first dev will have four thousand ideas and experiments on the go, and leave an absolute mess in their wake.
And your second will be struggling to clean up that mess while also getting their own work done.
Of course, you expect the same level of work from both of them, but because person two has to do a bunch of person one's work as well as their own, person one ends up looking better and gets praised by management.
> it's usually better off in the long run to shove the new process down group B's throat.
> (…) the "Group B" employee (…) is more conscientious and reliable than anyone else we've ever hired
If employee B is proving themselves to be valuable and reliable, then you should trust them to make the best decisions for how they’re going to go about their work and support them. Leave the door open for them to try different things, but no one likes having processes shoved down their throats (your words). All you’re doing is making them unhappy and more likely to leave to go work for someone who’ll value them like they deserve.
Thank you for taking the time out of your day to explain the best way to manage someone you've never met, working in an organisation you know nothing about.
You may want to consider that your Group B employee may be conscientious and reliable because they use an apparently “frustrating and inefficient” process. Productive friction is a thing: processes which force you to slow down enough to put careful thought into what you’re doing and why. And if they’re stuck in a loop of doing frustrating work - you may well consider why they’re doing so much frustrating work. Maybe that can be resolved at the managerial level!
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively?
I mean, the difference in the metaphor is that we have pretty fully understood carpentry for many hundreds of years. We still find it difficult to write even simple software to address all our needs, as is evidenced by the insane pay in our industry. Carpenters can suggest tools because they know what's out there. The same was not true about LLMs a year ago.
> That is way too charitable, people were being fired based on these metrics
People get fired for all kinds of reasons including no reason at all. Oftentimes leadership even lies about the real reasons for firing people because they don't sound good!
I'm gonna be blunt: if you're in software and you refuse to use AI for moral reasons, I think you should be fired. There's being principled and there's being obstinate and the difference between the two is how well you can convince people that you _have_ principles. Most LLM-hating people fall short on this point, because
> do I really need to link the Jensen Huang quote?
Sure! Link it again, we all know it's highly immoral when shovel salesmen try to make you want shovels.
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
I do not like this HN take of "let's do this thing that works great in small companies and then just blindly pretend that it'll also work at the largest companies in the world!" No, this doesn't work at "normal companies" because you cannot "just ask" 30k+ employees what they want.
Employees, like EVERYONE ELSE, are resistant to change. If I, as CEO of a company, want to get my company to try Claude I have to measure tokens to see if it's getting used. That's it. There's no wave of delusion here.
> The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world
People are stubborn. A lot of productivity improvements had to be almost forced upon farmers, for example. Even when early adopters demonstrated the benefits, a decent fraction of them just didn’t want to change.
This is just a variant of the argument ”people don’t know what’s good for them”. You’re very close to the actual answer, which is that the aforementioned ”manager class” is simply convinced that they understand reality better than those below them, which is quite frankly absurd considering the fact that managers very rarely do any of the ”real work” that these tools supposedly make redundant, and yet they still believe themselves to understand the potential better.
And not few of those 'productivity improvements' for farmers have had disastrous consequences, even though what I think you are referring to has been implemented with far greater discernment and empirical basis than the current AI revolution.
People are stubborn, but sometimes for good reason. Let the stubborn people hold on to their practices, if the innovators are right they will eventually fold anyway.
> not few of those 'productivity improvements' for farmers have had disastrous consequences
Sure. Many have not. I’m thinking of stuff like ox-drawn and then mechanized ploughs, four- versus three-crop rotation, et cetera. The point is there is pushback regardless of benefit and even after it’s been demonstrated. Plenty of people are fine being comfortable. Which is fine. But it also explains why companies and societies with a nudge feature do better.
> if the innovators are right they will eventually fold anyway
Again, sure. If it’s their land, it gets acquired. If it’s your land they’re tilling, you get a say.
I’m not saying all-nor even most—pushback is unfounded. Just that there are plenty of cases where it is, and the solution there is to push through the change.
>Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? T
What happens if employees say no power tools are needed and after a few months a competition shows up with power tools and hires a bunch of noobs and beating your production numbers and sales?
Your employees simply may leave the company and work for them and learn the new culture at this new competitor.
Is there any law which prevents people from moving between companies? No? Then the promoters of that company are going to do what they think is fit to keep them in business and stay competitive. Many times they'll be wrong, sometimes they'll be right.
To use your analogy again, it's kind of like the shop boss buying everyone a table saw and then saying "The best way to use the table saw is just experiment with it, it's the fastest and most accurate straight cut we can get - the future is table saw."
Yes, this is, in fact, how adoption of table saws and other such tools looked like, while they were still new tools. The basic form and function was established and its utility proven in both testing and early adopters, but as new kind of general tool on the market, every user from "early majority" was still writing the operating playbook for their specific shop conditions and kind of work they're doing.
So yes, it's a great analogy. We're right now well in the stage where bosses say, "evidence is in and conclusively shows this is useful for us, now the job is figuring out exactly how to work it into our particular business".
> figuring out exactly how to work it into our particular business
This is the most crucial bit. Neither ramming it down developers throats nor rejecting it wholesale is particularly productive. You need the conservative people onboard as well, to discover critical edges and failure modes. Including their criticism in the adoption process instead of bluntly banning it is the smarter move. Of course, there will be a few people who just don't play, they will fold eventually or be let go.
I see the point but, I'm not really sure the analogy holds up here. If i was in a cabinet shop and had to joint, plane and resaw and cross cut a pile of timber fresh from the saw mill for the next job I'd be very grateful for the jointer, the planer, the bandsaw and the table saw. I'd also be very grateful for the dust extraction.
In in total agreement with you though, forcing tools on employees is very dumb and is terrible leadership. Ask your people what they need to be optimally exceptional and go get them it. Then let them get on with it.
> In in total agreement with you though, forcing tools on employees is very dumb and is terrible leadership. Ask your people what they need to be optimally exceptional and go get them it. Then let them get on with it.
Some employees want AI tools, others don't. Standardizing SDLC workflows > each person does their own thing. So now you have to choose: do you require AI tool use that fit into a new SDLC? Or don't you?
I don't see why they have to be mutually exclusive. Assuming the vendor risk profile is acceptable to the infosec people and procurement are happy with the AI tools vendor relationship, then it's a tool inside the information security perimeter.
As long as there's evidence that work meets quality gates for any required customer audits, and your customers are happy and in the loop that AI is a thing that may or may not be used to produce the service, then those engineers that want it can have it and those that don't, don't.
Feels like a revision to an SDLC rather than a new one. Without seeing the SDLC it's hard to find common ground though. It really depends on how it's written and implemented and of course: culture. In the example we're working from sounds like the tools are being forced on people, and that's less infosec, SDLC and more unbearably bad leadership.
Not sure why you bring in "vendors" and "infosec". I'm simply talking about the situation of a software engineering team building something together needing to have similar workflows (supported by tooling/software) in order to work together effectively.
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
This is how it's gone down throughout history. It's why we remember the Luddites, textile workers who started smashing stocking frames and power looms because the machinery was introduced over their objections. The whole goal was to undercut the craftsmen's wages and bargaining power.
So, no, your expectation to be consulted was never going to happen and has not happened throughout history as industrialization has advanced.
> but suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting
You're far too charitable. Understanding has nothing to do with it. Big companies are too far insulated from bad metrics. Middle managers get away with anything and everything because their decisions are too far removed from reality. And they're nowhere to be seen when the other shoe drops. And they'll just leave to a promotion elsewhere if they stay and results are bad.
Everything is far removed from reality in bigco. So you get a bunch of theater and house-playing with "data-driven" posters up on the wall. It's a show that everyone is aware of and seemingly we all still attend.
The level of trust in leadership is remarkable. There’s reasonable ways to have people try power tools. Have one team use power tools and another hand tools and see the outcome.
The mandate was literally “the more sawdust you create the more money you’ll make”. Nothing of value is learned by that mandate. Sure it’ll make people use power tools but it won’t cause anyone to learn how to use them to make furniture.
They might understand the danger of bad metric but that doesn’t mean they aren’t victims of them. If there was intentionality here it was lazy as hell at best.
> suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting.
from my time in FAANG... that seems about correct. Probably the people at the absolute top don't want to just pointlessly burn tokens, but pass that down the chain and eventually the rumor mill turns that into "tokens are an input for your performance review" and people start running Wiggum loops to fix minor typos or linters or something—especially if you do it at a time when every company seems to be doing layoffs.
> If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
Or count the fingers, I guess. It's all fun and games until someone looses AI.
Bad managers, in general, grab a metric and then unthinkingly optimize it. I’ve never worked for FAANG, but I’d be surprised if they didn’t have bad managers too.
> the people who run FAANG don't understand the dangers of bad metrics is... interesting
They don't. They want some metric to support what they want to do and don't care about good metrics at all.
I've spent the vast majority of my career in FAANGs and it's been the pattern everywhere.
Right now my org has a senior director who is constantly battering managers to tell their reports to fill out the weekly surveys.
Why are the employees not filling out the surveys? Because instead of the old once a year large survey with questions about various levels (including local teams where management cared about the numbers and I could see the actions they took) we now get a survey every week with questions that are meaningless and I have no answer for.
"How does team X deliver on its priorities"?
Team X has O(10K) peoples and a barely countable infinity of projects. Most of which I don't know about and most of which I'm not supposed to know about since things are compartmentalized. So I don't know what team X's priorities are, I don't know how they deliver on them, and I never will know. Asking me and my colleagues is a waste of time and money.
...but none of that matters because the directors want "data" and they want a dashboard showing that we're all giving them "data".
The switch away from hand to power tools was a while ago but not, like, ancient history. In the era with fairly widespread literacy and records. Did this sort of check for sawdust thing actually happen?
Or... If you are a carpentry shop owner, you should understand what exactly the power tools you acquired are good for, how and if they can actually be used by your employees for them to do their job.
This, obviously, presumes that the person managing this hypothetical carpentry shop knows what they are doing. It's almost laughable.
In truth the carpentry shop owner manages on vibes, has no idea what employees do and also doen't trust them, and tells employees he wants to see a lot of sawdust in the workshop floor.
Why would you look for sawdust? That's a waste product. You would motivate everyone to stop producing actual furniture, and just buy the biggest bits of wood to turn entirely into sawdust. Which is textbook Godhart's law.
This is what's happening here, you have people setting up two chatbots to churn useless tokens at each other, making only sawdust.
I contend that tokens per se are actually a waste product, or at least non-value add. The end user doesn't actually care how many tokens were used to make a thing. If you could get the same result with fewer tokens, that would be an improvement.
> If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
But to make this work, you cannot tell your workers that you are looking for sawdust, because you just gave them tools that make sawdust very easily.
Eh, it’s less hysterical than the ever-pervasive belief among junior devs that they are the smartest people in the room and that all managers everywhere are dumb.
Though I understand that gets social validation from other people with no actual experience.
The retconning is absurd. Companies CxOs's exhibit herd-line behaviors all the time: hiring scrum masters, mandatory return to office, and now tokenmaxxing. FOMO is a sufficient explanation of the behavior, any trend gaining momentum that they fear will give their competition an edge, they will reflexively force adoption with no further reasoning needed. There is no cost to boarding hype trains that go nowhere.
Sorry to disappoint bud, I'm a manager of managers. Being able to look critically at your own tribe is one of the first skills you learn if you want to survive as someone telling other people how to do their jobs.
> That’s no longer true. We’ve entered a different regime, where spending more tokens generally results in better results. We call this “compounding correctness” — the more tokens you spend on getting a task correct, the more likely you’ll get a good outcome. We talked about this a bit at the last in person Agentics meetup:
Have we? Is it generally the case that the more tokens you spend, you better results you get? This take is so weird I suspect author somehow financially benefits from tokenmaxxing.
"Most teams haven’t yet figured out how to build their own Ramp Inspect or Stripe Minions (if that’s you, reach out — we can help!) but basically everyone is at least using cursor in the side bar."
? What is your point? That the OP is obviously finically motived to encourage tokenmaxing?
Here’s what they said, $$$ aside:
> That’s no longer true. We’ve entered a different regime, where spending more tokens generally results in better results. We call this “compounding correctness” — the more tokens you spend on getting a task correct
> Compounding correctness flips the calculus. If more token spend leads to better outcomes, then you’re going to want to spend a lot of time running tokens. Which sure as hell sounds like tokenmaxxing to me! The original incentives to tokenmax are gone, but eventually folks will realize that a new and more powerful incentive has take its place.
> There were ways to get loops to work, but it was hard. You had to think a lot about how to prompt the agent, which in turn required a pretty deep familiarity with how these things work.
> Now, though, it’s easy. Compounding correctness makes it easy
Go on, tell me I’m quoting the OP out of context.
It’s pretty clear this person believes in compounding correctness, while other, more serious people (1) are perhaps more skeptical.
..and Armins company owns pi. You can’t get much more all in on AI.
Compounding correctness sounds cool, but the real examples of people spending lots of tokens are not compounding correctness; they are wide parallel exploration; like Mythos. The OP is confused, and wrong; they’ve made some basic (flawed) assumptions, and based their entire reasoning on them.
Hi, author here. I'm probably somewhat financially benefiting from tokenmaxxing. I also just believe compounding correctness is right, based on my own experience using the tools (which is why I have structured my life to try and financially benefit from tokenmaxxing)
Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
For companies that have measured performance based on token spend, they can now dial it back. Employees have learned to leverage AI for things they wouldn’t have prior. Now they know what’s possible and what’s not.
No one is stupid enough to always measure performance based on token spend and have unlimited budget. It was always a temporary thing to transition the employees to a new world.
Management felt like employees weren't leveraging AI fast enough. That's why in 2025, there were many mainstream articles about how CEOs were forcing their employees to use AI or get fired. Tokenmaxxing was just the other extreme. Companies will arrive at an equilibrium.
There's no need to overthink this.
Edit: One reply cited this X post as an example of why management needed to do this. Trying to change a company with hundreds/thousands/tens of thousands of employees is hard. You have to send one simple message at a time. https://x.com/danluu/status/1487228574608211969?lang=en
having heard the arguments made by some VP + C-levels throughout the Tokenmaxxing Tulip Mania, I think the interpretation that those mandates were made intentionally for "forcing employees to start leveraging AI in meaningful ways" is too charitable.
Most companies focused entirely on doing "what everyone else is doing" at best or "to see if Programmer Joe can be as productive as the entire team so we can fire the rest".
And many indeed fired employees in droves because they were "underperforming in token spend".
> Most companies focused entirely on doing "what everyone else is doing"
This is true of my current overlords. It slipped recently that the reason they went AI-nuts was that a competitor had announced going “AI first” and the market responded excitedly. Not because they thought it was a good idea: because the market got excited and they didn't want to get left behind.
This is quite a change as our market is financial services and I remember a time when we had to support decades old browsers (one large UK bank who I won't name here had IE6, and only IE6, on many of its user's machines until ~2017) and web servers because they refused to upgrade anything.
> "to see if Programmer Joe can be as productive as the entire team so we can fire the rest"
I'm not sure who Joe is in our outfit, but I'm certainly in the “the rest who are to be fired” set. I've been unhappy in dev & related for years so the AI revolution which I don't care for is where I'm consciously letting myself get left behind to find something else to do with my life. Haven't touched it. Was too late to claim one of the first tranche of Claude licences. And the second. Oops. Maybe I'll use AI in my next big adventure, or maybe my distaste for it all means I have a grand future waiting for me in the hospitality industry!
> Isn't it easier to get a job when you already have a job?
Yes. Or so I'm told, I've not needed to apply for a job for 26 years…
I have something possible available, though whether it still will be in five months (the earliest I'm likely to leave because of [reasons] and a two-month notice period) is a bit unknown. That five months might be ten as there are other major changes in the company (we were bought a while ago) from which the dust should have settled by Feb, and it makes sense to try to hold out that long to see if I'm still hating things with the same passion at that point.
Without that “something” there are less certain tech based options I could look at, and to be honest I really could do with a proper sabbatical style break. The mortgage is paid, I have savings, and no dependents other than the cats, so I have the luxury of considering that option. And if all else fails I've actually done the arithmetic and I can survive on minimum wage for an extended time if I need to, and hospitality work is something friends can get me into above the many others looking (that bit is less of a joke then people assume: it is seriously part of my plans D & E if I can't stick with A and B & C completely fall through).
> many indeed fired employees in droves because they were "underperforming in token spend"
1. Source for that 4-word quotation? I googled it, but it appears you are the only person who has ever said it?
2. Even if you made up the quote, source for the claim that "many" "fired employees in droves" for "underperforming in token spend"? (Again, even if the companies never used those words, I'm still interested in the source for the claim about many companies firing employees in droves for low token use.)
Some companies are badly managed (at least in some aspects). But it’s also true that some devs need to be pushed - sometimes forcibly - out of their comfort zone.
I’ve had multiple instances of taking months/years to get some devs to use a more sophisticated git client than GitHub Desktop (so they could properly do anything but the most trivial merges/rebases for example). Or to learn how to use the debugger instead of just printing/logging for debugging. For some of them getting them to seriously figure out how to better use AI required a bunch of repeated prodding.
Funnily enough a few years ago they enthusiastically jumped on copilot’s fancier autocomplete in VS Code, but getting them to really figure out how to get the most out of Claude Code required more pushing.
That's a very good point. Our company has been very thrifty with our AI spend, until a few months ago the average employee had ~$50 of supported spend and I was trying to be an AI leader in the company and figure out what was and was not possible, I had a $100/mo spend (Claude $100 service costs $108/mo).
We are now seeing that Claude Code can do a LOT of heavy lifting in our day-to-day work, but the bulk of our employees are stuck cost-maxing and literally cannot "imagine how you are running into your session limits". "I'm fine with the $20/mo account."
There's a case for the cost-maxing has hurt our company.
I'm in the boat of wondering how so many people run into session limits so often. I have never hit one, except once when Claude Design came out and I had fun generating a bunch of random things to see what it could do (not with the intent of actually using any of the generated designs/code, because it all sucked).
I'm using the $200/mo Claude, and I'll often hit the 5 hour limit, but last week was the first or second time I've hit the week limit. I run pretty much everything on "/effort max" because I've had good results, and I've had plenty of quota left usually, and I want to worry less about "am I getting the best results".
Fable, for the few days I had it, would eat through tokens pretty quickly, largely because it tended to work much more on its own. I could give it a task and after asking a few questions it would go off and work for 4-6 hours and be done.
I also run a lot of experiments. I'm trying to be a resource that the rest of my team can learn from as far as what works. For example: when one of the people from our parent company asked about automating payroll entry, I threw their documents and discussion at Claude to see what it'd build. That plus churning on their feedback was ~30 hours of API usage right there.
I'm currently experimenting with "loops", and using codex in those loops to provide feedback and review. That gives me fable-like autonamy (that 30 hours of API usage above), maybe even better. But it uses a lot of tokens. Loops is the bulk of why I got to the weekly limit last week.
Plus I'm having it build an experiment on what my ideal "agent mux" would look like. Herdr is really close, I found it after I started that experiment. Now I'm just letting it run when I have spare usage to see what it comes up with.
It really wasn't. It was a moronic move fueled by hype, implemented by the same type of incompetent business leaders who previously, to various extents, drank the blockchain and metaverse kool-aid.
There was demonstrably zero cost or consequence analysis, which is also why it was dialed back as soon as the (still) subsidized tokens became just slightly less subsidized, and the wise leaders realized they spent huge sums of money with no way of gauging ROI.
LLMs may have their use cases, but let's not make up free excuses for blithering idiots who, by any rights, should all be fired for cooking up money-burning policies that are textbook implementations of Goodhart's law.
You're naive, uninformed or turfing if you think companies are still not tokenmaxxing.
Also tokenmaxxing was never an intentional and smart strategy employed by companies like you say. It was a mix of fear of missing out, signaling to investors they were in on the hype and recouping investmenets in data centers
Your business will suffer greatly for your short-sightedness. But yeah, go imitate Uber, I am sure you will get just as big as they are this way. Everybody knows Uber's success comes from Apple Vision Pro making their developers oh so productive. You should go to the Apple store right now.
Your livelihood now depends on tokens remaining subsidized. How long do you think your engineers will continue to have the independent ability to maintain your codebase if the tokens got 20x more expensive?
Buy and sip that intelligence straight from the tap.
I never said go imitate Uber's strategy. I just challenged the person who claimed that these companies are only doing it to recuperate data center investments when Uber doesn't have any data center investments.
Let us also not think that management is any smarter than any of us and is playing 5D chess games we couldn't comprehend. Notably, games that they also could not articulate when they were making these decisions.
It’s so easy to comprehend that even I was able to do it. I don’t think it’s 5D chess. More like checkers. They have dumb the message down just to get developers to try.
I don't disagree. They talk amongst each other, get advice from expensive consultants, but often lack the knowledge that their on the ground workers have. That said, I still think this was done to get employees to adopt AI faster and see what is possible rather than a long-term incentive.
People in small teams with managers promoted from within could probably have had this in mind.
Big Corporate managers are much more likely to have felt the need to “do AI” from their VPs, who in turn got it from the executive team, who have probably been under fire to produce a coherent magical AI strategy that makes to company scale infinitely while reducing costs. In that environment it’s much more likely to be copy-and-pasted charts from Gartner and buzzwords overheard at conferences, combined with the hope that somebody somewhere will eventually turn it all into something that resembles forward movement.
This exactly. Gartner and McKinsey were paid by openAI and Anthropic to sell executive teams on AI usages 20xing profits while cutting spend 70%. And if they don’t immediately make the front end investment now they would be left behind by their competitors.
The presentations were convincing and many bought large upfront wholesale tokens then forced them on their employees.
When the savings didn’t materialize and neither did the profits. And the spend was no longer going down but up, many execs rather than admit they were duped began blaming workers at scale for not making up for their stupid decision.
This is the “get them addicted” part of the platform play.
The problem is unlike any successful drug this was expensive and low payoff.
the big tech companies needing to pump demand for compute.
Demand is already so large that OpenAI, Anthropic, Meta, Google could not fill it. Tokenmaxxing for these companies strictly to pump fake demand is just plain wrong. The inference demand for these companies internally must be a drop in a bucket in overall inference demand.
This reminds me of the popular opinion on HN for return to office mandates as executives wanting to recover their real estate investments.
Out of $13Bln of 2025 revenue, OpenAI received $867 million from one customer (less charitably, one bankroller), SoftBank. And $300 million from Microsoft[0]. That's more than a drop in the bucket, especially given that they're not the only players complicit in being both an investor and a customer.
Also are we sure it's all at arm's length? Barring a full audit, it's not possible to guarantee that there's no round-tripping or overstating of revenue. With Microsoft also being a provider for OpenAI, they could be creatively using set-off, or using SG&A, in order to overstate their revenue/gross margin/inference profit margin. I of course have no proof, extraordinary claims etc. etc. It's unlikely but we should at least debate the possibility. They have such a huge collective incentive to do it.
I remember a story on HN from a while back. The idea is that the larger the org, the simpler the message and the tool has to be to reach everyone. The comment author was saying that as a junior, his company implemented a "tokenmaxxing" scheme for A/B testing - more tests, better for performance review. He, back then, thought it was stupid. However, it got the desired outcome of everyone being familiar with what experiments are and how to run them.
> Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
No, it was a sinister way to manufacture your consent to cause cognitive atrophy in your employees so that you lose your ability to independently operate your business.
You'll come to realize this once they begin charging you more and more for tokens but you will probably not blame yourself for it.
The whole tokenmaxxing thing started because Jensen Huang said insane things like having a single engineer spend 250k in tokens or he’d fire him; and that OpenClaw was basically AGI.
> No one is stupid enough to always measure performance based on token spend and have unlimited budget.
Yes the people forcing these mandates absolutely are this stupid because that’s what people like Jensen Huang, Peter Steinberger and Boris Cherney were touting. Seriously have you ever actually talked to an average C-Level about AI? They are absolutely cooked.
How many “average C-levels” have you talked to? What, specifically, do you think that actually means? Do you think the average CMO and CTO are identical, and have identical profiles in this case?
Or are you just blathering about things you’ve never experienced because you met the “CEO” of a five person company once? I find grand proclamations by people who speak in TikTok absolutely laughable memeing.
You don't need to meet these C-levels personally. They spread their insane hot AI takes everywhere and go out of their way to market this crap at conferences, etc.
Did you read those hot takes personally, or only had them reported on by your favorite YouTube/TikTok pundits? Hint: YouTubers and Tiktokers are in business, and that business is entertainment, and they're about as truthful as a fortune teller.
(Difference being, one of these groups is just lying about objective reality that's trivial to independently verify, the other one are just unlicensed therapists with thousand years old rituals).
> like having a single engineer spend 250k in tokens or he’d fire him;
That’s not quite what was said there, he’s budgeting half a devs salary as token spend in a podcast and that if he had a 500k engineer who spent 5k on it at the end of the year he’d go ape shit.
Now you can say that’s wild, and sure, but this is not a standard c suite exec talking it’s the ceo of Nvidia.
Even ignoring other hiring costs this is essentially an argument that Nvidia top engineers should get more than a 50% performance improvement with extremely heavy AI usage. To me, that doesn’t seem like such an enormous statement. For the head of a multi trillion dollar company entirely driven by AI sales arguing it gives a useful benefit to engineers isn’t that odd and betting on a 50% improvement within Nvidia seems kinda normal.
> Seriously have you ever actually talked to an average C-Level about AI?
Yes. Single digit percentage improvements over time would normally excite them, the idea of cappable cost performance improvements that last which your devs actually want to experiment with and a cultural and customer expectation that you’re doing this is pretty enticing. Particularly during a time when tokens were heavily subsidised - isn’t that the perfect time to do it? Now that has ended there’s a huge focus on roi.
I'm a +26 on my post so far so it seems like there are a lot of people who agree with me but most replies disagree with me. I suppose this is the nature of online forums - that those who disagree will take the time to reply but those who agree rarely do.
FWIW I agree with you, but it doesn't add much to the conversation to leave a comment saying so.
I also agree with the comment you're replying to as well - the vitriol and anger, along with the "this is just another blockchain bubble" type relies is really interesting. It's so surprising to see the variety of (negative) replies and beliefs people have, along with the general distaste/distrust for management. I guess it's also largely a sign of the times since a lot of ICs probably have a ton of anxiety about their career.
It’s about power and leverage. Software engineers were seen as “gods” in a tech company. Even the crappy ones. Over the last year, really 6 months, they lost great deal of that. Now they’re seen as costs, rather than assets.
This is especially true for the devs who take the code more seriously than the business that employs them. The technical PM who knows a bit of design are suddenly the kings of the company.
> I suppose this is the nature of online forums - that those who disagree will take the time to reply but those who agree rarely do.
Why would those who agree “take the time to reply”? To say what? “This”? “Agreed”? “This guy knows it”? Those comments don’t add anything of value. When you agree, it only makes sense to reply if you have something to say which wasn’t covered by the original argument.
I agree, but for a completely different reason. A lot of executives simply chase trends. This was another trend they copied from each other. No reason to imagine they carefully studied the issue.
might be the first time I've seen this reasonable and obviously correct interpretation of the last 6-12 months so directly and unapologetically stated. bravo
HN opinions are usually divided into individual contributor vs management battles. Usually the IC opinion is majority because most people here are likely ICs.
At the IC level, people don't sense the impending urgency for the overall business. They usually sense the urgency for themselves first. AI has completely changed the software industry in 6 months. We went from having AI write some code and copy/pasting to having AI write 99% of the code in 6 months. SaaS went from nice UX and CRUD code logic being a moat to these being nearly free.
Big software companies have to adapt to this new world or they will be outcompeted by smaller, newer, nimbler companies. That's what management is thinking. For ICs, they're usually thinking about their own jobs first.
No. While what you’re saying makes sense, that’s not the logic behind the token max mentality. It’s simply lazy ineffective leaders who are bad at their jobs and don’t make rational decisions. They really did think spending more is somehow going to make their business better.
An interesting side effect of this spreading across social media is that even companies without token leaderboards were having problems with needless tokenmaxxing.
When everyone was reading about token leaderboards on all of their social media channels (include social news sites like Reddit and Hacker News) it created token anxiety even at companies that didn’t want a leaderboard. Programmers were afraid that their managers would be secretly ranking them based on token usage and they needed to pump up those numbers to avoid layoffs.
Once teams implemented token budgets in response it creates an ugly situation where a few people feel the need to use as many tokens as they can at the beginning of the budget window to stay ahead.
It’s really frustrating to have this phenomenon leak into a company that was never encouraging or looking for high token use.
The smart move would have been to get lower level managers to assign specific employees to experiment with applying LLMs to their processes and report back. Then incorpoate the findings into their processes.
Instead there was FOMO mass hysteria. Now there is a backlash. And a lot of time and money wasted.
Its not _just_ that. Orgs aren't remotely sensible at measuring anything that isn't counted in dollars.
employees who are on the ai bandwagon are there for the free management attention.
Management is cooked because the damn market is hard, money is tight and they can't afford to fight the top down love and $$$ thrown at AI.
If you zoom out, all the real money spent on energy to keep AI alive isn't going to be held in nvidia stock for too long. it will burst, but its stupid to time it.
> Orgs aren't remotely sensible at measuring anything that isn't counted in dollars.
A sensible organization machinery will move to optimize the metrics that make money. Often times figuring out said machinery takes iterations. Some of them are idiotic (ref: tokenmaxxing) but they are generally directionally correct.
> Management felt like employees weren't leveraging AI fast enough.
If my productivity is in line with their expectations, I don’t understand why management cares what tools I’m using to do it. No employer ever told me to use emacs instead of vi, even though I’m 10x more productive in one vs the other. So why all of a sudden does management need to micromanage my tools?
> why all of a sudden does management need to micromanage my tools?
Because doing so increases the value of their stock options. They might privately think it's as dumb as you do, but apparently the stock market disagrees.
Imagine you had a direct report. They were doing just fine, slightly better than a typical report. Then you found out they were writing all their code in notepad - no linting, no automated tests or live updates, no refactoring tools, no highlighting or any code search. They didn’t have any cross code searches and didn’t have any documentation. When they hit a problem, they’d churn away at it and never reach for docs, google or so.
Still, their performance is in line with what you’d expect from someone in their position.
Would getting them to try emacs, vi, linters, etc be micromanaging them? Do you think they’d perform better with them? They are performing in line with expectations for the role, so why bother with something you think would make them more efficient?
I’ve made this obviously over the top, and can hear already replies from other bemoaning my comparison while missing the point — tools do matter and if you genuinely believe that a developer could be more efficient working in a different way it makes sense to not only want them to try it but to actively fund that change. Hell, this is literally what we argue for in training! Spend money to make someone better at their job!
If you think AI tools make you worse or don’t and can’t help, then that’s one thing. But it makes sense for management if they think it might to spend money on it and to get you to try.
Not only this, but wasn’t everyone here shouting about how tokens were subsidised and it couldn’t last? If so, wasn’t the first half of this year a really excellent cheap time to do the maxxing?
It's funny, because editor choice is also an analogy I use, to argue for the exact opposite conclusion.
Your hypothetical developer wouldn't be using notepad because they're unaware of other editors, they'd be using it because they evaluated other editors and concluded that, for whatever reason, they would be worse for them. I'd be fascinated to hear why they came to that conclusion, but I'm not going to tell them they're wrong if they're performing acceptably, aren't constantly breaking CI because the linter rejects their code, etc. Everyone is different, and I'm not narcissistic enough to think the fact that I would be way less productive without my modal editor, LSP, linter, terminal multiplexer, etc. justifies forcing everyone else has to adopt my exact setup.
Did they see productivity gains that they're now calibrating for? Why have these productivity gains not been reflected externally in any measurable way?
At my company, this was the explicitly stated and shared goal from management.
"We can't know all the parts of our business that AI can do a good job automating [because it's so new] but we also don't want to be the last to know and outcompeted along the way. Please throw AI at random parts of your job [and we're tracking this] so we can generate feedback from employees on where to invest in additional automation"
My company has since provided a ton of high-value little AI workflows, alongside a handful that didn't pan out. AI-assisted software development is a major change overall, but the general business-process updates from AI are a net-positive to me.
It's simpler. Management felt like employees weren't leveraging AI fast enough. They chose to measure "AI leveraging" in the easiest way they could: how many tokens each employee was using. Goodhart's Law ("When a measure becomes a target, it ceases to be a good measure") immediately triggered.
This is obviously wrong. Management has never cared about how engineers do their job. There's never been a push for any other productivity boosting technology: better languages, better editors, automated refactorings, paid code intelligence tools, etc. But suddenly AI comes along and the CEO says "all developers need to write code entirely with LLMs".
This is absolute nonsense. Management in many places cares enormously about productivity. What’s been a bit different here is a huge claim of improvement other companies are seeing (so you’re going to be left behind), alongside some developers going off and doing this anyway sending proprietary code hither and thither, alongside some devs railing against the very concept of using it. It’s also a wildly powerful tool and how to use it hasn’t been as clear (where does it provide value, where does it not, what can and can’t it do) so experimentation is really important.
They really don’t IMO. Hell most of the companies pushing these tools don’t even agree what LLMs are for or are capable of. Too many people are trying to use it to cut too many corners on their work (making more work for everyone else) or are using it to attempt things they don’t know how to do, which means they are incapable of vetting the results, (vibe coding anyone?) which means more instances of the first case or even getting hurt.
Were there cheaper and humane ways to get more employees to use AI? (yes). Did many people JUST burn tokens (goodbart law)? (Yes). Would people revert back to the mean? (I think yes). Do many professionals hate AI because of this push? (Yes). Was the org net productive?
I wish there was an independent body truly assessing the impact of big tech decisions and running counterfactuals. Instead of accepting nice stories like this as a given.
Interesting take - upvoted you. I'm not convinced it's been the optimal management strategy, but you're succinctly explaining what they have done, not what they should have done, and in that sense you have a good point.
Still leaves huge questions about ROI ($26tln of TAM, anyone???) and doesn't quell the concerns brought forward by AI detractors though.
I would say tokenmaxxing = spending without limits or care about results (and assuming results). The term as it is right now, at least.
When it comes to "using tokens overall", then open source models change the equation and in that case, we will enter a phase of 'maximizing AI usage...but for near-zero marginal increase in costs with increased usage'. But even then, the convo would shift to platform engineering...which would then ask 'what value are we getting out of this?'
OR - cloud model economics change over time and we use cloud models as happily and cost effectively as we do cloud storage now. But hard to say when that comes.
Brute forcing positive outcomes by spending more tokens until a happy path manifests does not solve the underlying comprehension (and liability) problem.
I fear a world where critical software is stood up with
increasingly non-human governed abstraction because it [seems like it] works.
Software engineers as the review terminal in a conveyor of business-led code mass production... coming to a company near you?
You're right, but you'd be lucky if a real human actually reviews any code. At my company, merging a PR still requires 2 humans to press "Approve" but I've been instructed that I don't need to read the PR, I only need to click "Approve". This is what 30 years of SWE experience is being used for now.
> Just repeating the same prompt until you get the desired result?
Not necessarily the desired result, but until it's 'done', where the LLM itself is the judge on if the is the case according to the given criteria (often just an updated todo-list). One of those extremely simple 'harnesses' (if you can even call it that) was even named the 'Ralph Wiggum Loop' [1] to allude to the braindead-but-persistent tokenmaxxing it results in.
What I have been doing seems a bit different to what's described, but I always make sure to define how to know the task is done so the agent doesn't quit early. Usually this means telling it to to run the tests and type checks to ensure it runs without errors.
Otherwise they often do a first pass looks good enough but it doesn't actually work.
Or if you were ever working with an approximation / search / optimization (really they're the same thing) algorithm that iteratively converges on a solution...
This seems to happen with most big tech adoption in the first few years. The big data boom in the early 2010's had execs just buying up spark clusters and data lakes before they even had a clear analytical use case or governance.
>I’ve basically never heard a business leader say that they were going to set a bunch of money on fire because it made them feel good.
Really? ~4 years ago our CEO hired a consultant to fly out several times to do team building exercises. We can't afford to do our 3-year server refresh cycle, but the consultant was no problem to pay.
We just recently had branding consultants come in and also spent thousands of dollars (AWS charges) on rebranding all our photos. We operate in a captive market, if you want to operate in our market you are required to subscribe to our service, and if you aren't in our market you can't subscribe. Branding at the end of the day drives 0 sales.
Heck, reminds me of the time a company I was working with hired a new CTO and one of the first things he did was as "server renaming scheme" using obscure (to the US-centric staff) city names from around the world (database servers are Swiss city names, web servers are Denmark, storage is Finland). We went from cattle naming to pet naming, for a CTO that lasted ~6 months.
In my experience company leadership is not quite as thrifty as this article likes to think they are.
I'm also taken aback with how naive folks are about companies, they really seem to have bought the whole "capitalism is efficient" maxim hook, line, and sinker.
I really struggle to imagine how anyone in a corporate environment has managed to never run into obvious examples of waste like you describe (overpaid consultants and mandatory budgets are classic examples). Office Space came out 27 years ago and has a plotline making fun of overpaid "efficiency consultants" whose only job is to tell management to fire people.
The precondition for that is competition. If some company has idiot managers that waste resources on idiotic things, they're supposed to be wiped out by the companies that are actually smart.
Capitalism requires constant evolutionary pressure and a sort of government directed corporation level eugenics program to constantly apply that pressure in order to function properly. Without that, it's just distributed fascism.
i think tokenmaxxing mostly comes from cloud pricing. once you're paying by the token, you naturally start caring about token counts. with local inference i barely think about it anymore.
The issue is the companies doing it could spend billions on tokens and they have. I for one know that there are multiple Big Tech Fortune 500 companies that have burnt over 1B in tokens in a single quarter.
In my current company nobody forces you to use more tokens, but you're encouraged to write a 300 lines markdown skill.md which takes 8 minutes and costs 5 bucks to execute. That, instead of writing a 200 lines bash script doing all the same thing, but in a deterministic fashion, completing in under 5 seconds and costing 0 if you're not careful with rounding.
It's AI usage mandates now, but rather than focusing on how the current hot topic has ripped through the business world, often without benefit nor repercussions at leadership, I'd prefer to analyze the higher pattern. We've recently experienced such ripples as the metaverse, blockchain/nft/web3, 'the cloud' (and a minor wave of cloud gaming). There was even a teacup buzz of 'apis', oddly disconnected from the semantic web.
Why do such fever dreams occur at all? Are they getting more prevalent? More damaging? Do they jepaordize the global economy? Should they be regulated in some fashion?
I can't prove my case, but I think it's a symptom of media manipulation/consolidation, the 'fiduciary duty' delusion, and that shareholders can hold the puppet strings tighter than they used to. More and more, they place their sillytown bets and expect the plebs to dance to them.
The dominance of finance capital over industrial capital reaching its absurd conclusion. NFT mania was only possible because we don't make anything here, no one has a serious plan to reshore and start making things here again, and we can't indefinitely maintain control of production we've exported to the 3rd world indefinitely. So you might as well play these symbolic games and increase your slice of the pie while the music is still playing.
The thing that most disconcerts me isn't the runtime pruning, it's the cold loading. Months ago, I added a few skills and MCPs to test them, partly in the frenzy of free shopping, but then I forgot about half of them.
So after I got tired of choosing by hand, and therefore also a bit blindly, I created a small tool that runs locally and analyzes conversations to tell you which skills, MCPs, or other things are always unused.
347 items never used · ~19354 dead tokens/session · ~$25.49/month A lot of ECC that I never used but always loaded.
If anyone's interested, I've put it on GitHub, thousandflowers/skillreaper.
This is more likely the junior camper version of "not everything that counts can be counted, and not everything that can be counted counts."
In the early days of LLMs, we saw the classic hype-driven bi-modality of opinions. Folks were in the "fake news, fad" camp, or they were in the "omg, take over the world" camp.
Those of us closer to the space, with the awareness to know that there was some truth (and a lot of misjudgment) to go around, were in the middle of nowhere. When I co-wrote some driver code with Chat GPT, other engineers (and even one of our directors) told me to keep it quiet. At the same time I had directors and VPs asking me how we could accelerate adoption. For a while, I had access to a cheat code just because I had the audacity to not ask for permission. Folks were sure I would get in trouble for spending thousands per month in LLM operation, but a handful came along for the ride, burning tokens like firewood and learning along the way.
Tokenmaxxing is probably coming from at least a few things:
1. A course-correction for the practiced frugality that kept folks from jumping in and just learning at the ragged edge.
2. A willful and deliberate recognition that the best innovations in the later phases of a disruptive introduction often come from sparks of ideation in concentrations of activity. In other words, we don't know where good is, and we need to find it. (Charitable interpretation from the article)
3. Recognition that, even if they don't know why, leaders and product owners will get punished for not jumping in and, because of bullets 1 and 2, won't get punished for trying and missing. Even if they have no idea what they're doing, they're going to fake it until they make it (or slide into another job).
This last set is where the pain lives. An organization with healthy and increasing AI tool
usage will see elevated token counts, but so too will one using LLMs to rewrite wikipedia articles without the letter "m" to keep token counts high. These are pathological behaviors brought on by conflated metrics.
We had discussions about this in the early LLM days, where my old team was looking to ship new capabilities for older products. There was a lengthy VP-level discussion about getting to "80% usage" of the new system vs the old. Because the new system was a superset of the old, I eventually said "we can do that immediately, but it's a cost goal, where we're just aiming to make our business more expensive to operate, rather than a value goal for our users". We didn't adopt the target, but folks were understandably frustrated that they didn't have a straightforward way to measure and report progress.
Tokenmaxxing is, inevitably, a conflated goal, but it's what we have right now. Take advantage of the moment, learn, build, and keep an eye on levers for efficiency.
Beyond getting momentum going for a cmpany, Tokenmaxxing is lighting money on fire.
The idea of tokenmaxxing reaches different companies in different waves, so it will be discovered in waves and outgrown in waves in companies and industries in their own cycle.
In the long run, tokenmaxxing is like drunken sailor spending. Scaling is almost always about a large component of efficiency, and lighting money on fire in the street can only last so long.
Your comment implies no ROI on spent tokens. I get a lot more work done tokenmaxxing so the cost is negligible to me but YMMV. Of course there's no point in tokenmaxxing if you don't have enough work available to scale beyond yourself, or you're unable to use AI to do so.
I predict startups will continue to tokenmaxx while 40,000+ person companies will become a little more conservative.
Folks have been saying “things are different now, the agents are now compounding success instead of error” for at least a year now, but I just don’t see it. I was lucky enough to receive a weeklong $50k per head AI training from the people saying these things, and one of their few helpful concrete recommendations was to constantly clear context all the time, to avoid things going off the rails.
However, I think finding security vulnerabilities is one use case where it doesn’t matter. Tokenmaxxing is absolutely effective for that. We as an industry are in the middle of adopting very expensive, complex continuous fuzzers.
> I was lucky enough to receive a weeklong $50k per head AI training
wow! That sounds like an unbelievable grift. Who were they such that anyone could possibly think that's a worthwhile investment?
Isn't that like a day worth of tokens?
10 days
Even modern frontier models benefit so hugely from careful context pruning, maintenance, and rewriting to erase mistakes that it's astonishing to me that there are no tools centered around it. The one tool that used to have such a feature, Zed and its retroactively-named Text Threads, has now stripped itself of it.
this! the back-and-forth chat interface where you can edit only your own messages, and only then to get a new response, is a terrible one, but I think favored by vendors because it helps them fight in vain against prompt injection. Custom harnesses and stuff are nice but incredibly time consuming to set up when all I want to do is like, see what the agent is reading, and editing out some irelevant nonsense side quest it went on, or trim some massive log file it read which filled 90% of its context. Theres no inherent reason behind some caching gains that these things must be strictly chronological - a response it gave me previously does not have to be part of the context now
No reason other than you're giving user's a cost footgun? That's a pretty good reason.
Local models benefit disproportionately from this kind of pruning and have no such footgun.
"some caching gains" is a pretty huge understatement- snipping something out of the middle of the window requires rebuilding the entire context. Thats a shitload of tokens.
Afaik messing with the context also pretty reliably degrades performance still. The model responses reference things that no longer exist to it and it becomes more chaotic.
The real usefulness of parallel or sub-agents is not that they run at the same time, its that they isolate noisy or self-contained context away from the main window.
You can structure your context window to minimize the amount of editing you do further back. You usually only need to edit and correct the most recent response. It's little different from forking the conversation at an earlier point, and nobody warns about that being a sneaking footgun. There is still a prefix to cache.
I still feel that during agentic workload sometimes it would be nice to have the model identify it is veering off the main track, send out a "keep the cached states and tell me which they are" command to the inference server, do the side thing (such as handling an error that plopped up that has not that much to do with the main task) and return back to the cached state with just a comment tacked at the end to say "oh and btw I fixed DNS" instead of having the DNS debugging inside the context window now. Maybe other harnesses just steer the models more towards using subagents for such tasks and my pi is misconfigured. I can use the tree feature, but having insight into what's cached would be nice there.
This is why I prefer Pi over all other agent harnesses. It has a tree view of each conversation and it's easy to move between branches.
50k per head training and the largest takeaway was to clear context.. that is the "hello world" of using agents, insane.
Have you tried turning it off, and then turning it back on again?
Some companies only get to a "hello world" level with a new kind of tech via a 50k per head training. The organizations are setup in a way that people can't experiment or learn by themselves, it's really the only way.
$50k a head is cheap compared to the productivity gains, probably can push it to $75k
Really. How?
Have you measured them beyond loc?
Who delivers a 50K per head training? Who pays for such a thing?
How many people were in the class?
I had a small training company, shuttered during COVID, and I used to charge 5K per day, for a group of up to 12 people. 5 days training = 25K. This is double, wow.
I would love to get back into training but getting enough volume to live and support a family on has been a challenge.
It's true, but I see it happening. I’ve watched seniors with 30+ years of experience adopt them successfully without losing their classic rigor.
Personally, I get huge mileage out of LLMs, and yes, I care deeply about code quality, readability, and debuggability.
I've seen juniors absolutely rock with them.
And I've seen the exact opposite, where they just struggle to get good results.
In the end, I think the divide comes down to management experience. The people thriving are the ones who have led teams, especially teams of contractors, which is the best analogy for how you have to interact with an LLM.
Those folks know how to break down problems, provide the right context, and scope a task just enough to see the "contractor" succeed before letting them move forward.
On the other hand, individual contributors who are used to just grinding solo often struggle. They expect a one-shot miracle. They say, "Hey, my code is buggy, fix it." When the LLM inevitably hallucinates or steers them wrong, they give up. The results are completely different based on how you treat the tool.
They might just have a high quality of control and standards that it is hard to find that pattern with the LLMs.
I think fierce individual contributors are a lot more valuable in the era of llms as well. We as humans typically achieve better balance with new stuff when we allow backlash from new processes that start to trample on old ones without understanding AKA the Chester's fence.
Anyways, more of a ramble than my two cents.
> Like, imagine if some serious business leader, like, idk, Mark Zuckerberg, decided to announce that Meta was going to burn money.
Like ... pivoting to the "metaverse" and changing the company name to show he's serious.
It's astounding to me that they looked at Second Life and really thought that was the future.
I could easily see that being the case, which is why I was so against it.
I think they're just a few decades too early.
Similar to cloud gaming, although that's much closer on the horizon.
I remember there being a lot of phrasing in the announcements that implied they didn't know Second Life existed. Kinda seemed like they thought it was a completely new idea.
The implication that tokenmaxxing was an intentional and thoughtfully considered approach rather than blind hype-following by an overpaid manager class who are too far removed from value to understand the downsides of LLMs is hysterical beyond belief.
Yeah, the rationalization after the fact is kind if absurd. IME, the reasoning underlying tokenmaxxing at the corporate level was "we need to leverage AI as much as possible as fast as possible because we're scared our competitors will find some leverage before us".
Definitely not some measured, long term, rational out of the gate.
Worse, tokenmaxxing has been pushed by the labs hoping to charge those tokens by the pound on their API prices eventually, even if temporarily hiding such costs behind "highly subsidized plans" or frequent bug-induced "reset buttons"
I would wager most if not all of the tokenmaxxing was done on enterprise API priced plans, not subscription plans. You can't actually token"max" if you are limited in the amount of tokens you can use per 5 hours.
Enterprise plans weren't usage-based until recently. Lots of enterprises were bait-and-switched by AI companies from flat per-seat fees, and now have to go on a token diet to rein-in budgets - which is the blog post's premise about tokenmaxxing being dead.
I really don't understand this take. If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
The goal isn't to have people work at converting wood into sawdust, the point is that if you wanna see if the tools are working you wanna see proof they're actually being used.
I'm sure there were some people cargo-culting this stuff, but suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting.
Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
(Of course, we've all had bosses that went to some marketing seminar and come back having been tricked^Wsold into buying some wizz-bang widget that we need to now integrate because of a sunk-cost fallacy, but I thought everyone was on the same page that this is not how normal procurement was supposed to work.)
> the point is that if you wanna see if the tools are working you wanna see proof they're actually being used.
That is way too charitable, people were being fired based on these metrics and people were absolutely talking about token burn as being a metric for productivity (do I really need to link the Jensen Huang quote?). That isn't an indication of this hysteria being based on "just trying to see if the tools work".
If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
Because those power tools had just been invented and no one had experience with them.
Though in theory power tools are faster than hand tools.
So do a workshop on power tools, measure their efficacy and the quality of the result, do some demonstration videos on power tools, get people to compare, seek feedback on their usage. Don't count electricity and sawdust, or you'll find people getting very good at expensively turning blocks of wood into sawdust.
Is the idea that most stubborn employees would adopt AI if their company made videos showing internal metrics that AI is better?
> Don’t count electricity and sawdust
I agree that it seems wasteful, but is there some better way to accomplish it at the scale of hundreds, or hundreds of thousands, etc? I'm personally doubtful that stubborn employees would switch even if a video provided internal metrics, videos, etc.
Perhaps don't start out with the conclusion that it's obviously better and anyone rejecting it is just "stubborn".
Nobody started out with that conclusions. The tests and experiments and workshops and consulting with select employees you propose, were all done 2-3 years ago. Results are in, but a chunk of population decides to ignore them and obstinately continue believing and claiming that it doesn't work, which is a conclusion they started out with.
> Results are in
Awesome, can you please share those results? Surely they would be all over Nature, Science, IEEE, etc.
Nature doesn't publish papers about water being wet either.
source: just trust me bro
Source: it takes less effort to test this yourself than to write comments about it on Hacker News.
We did, that's why we're so skeptical of your claim. The burden of proof here is on you, not on us.
What? The basic properties of water are scientifically defined as best we can. There is mathematical proof that 1+1=2. Do you think science starts from nuclear physics or differential equations?
We couldn't build any of this stuff (rockets, LLMs, heart medicine) if the foundation was ill defined.
I think it's the second time I run into you like this, Temporal. I wish HN had a way to classify you as an "AI booster" or equivalent.
> What? The basic properties of water are scientifically defined as best we can. There is mathematical proof that 1+1=2. Do you think science starts from nuclear physics or differential equations?
Yes. There's a lot of interesting things science has to say about water, very specific claims that took a lot of effort to discover, precisely formulate, and reproduce.
We're not talking about those. The whole LLM discussion on HN, as well as in the wider industry, is still stuck at the state where a large (or vocal) group of people refuses to believe water is wet. Yes, there is a similar group that tries to sell water as miracle cure, I'm not denying it - IMO both perspectives are dumb and entirely detached from obvious observational evidence that you can collect for ~free at home in 15 minutes. Example will follow.
There exist the equivalent of foundational, detailed studies on LLMs, at every level of rigor imaginable (with a caveat, it's hard to rigorously prove anything useful in software engineering; it's still largely opinion-driven field). But they're not part of the overall "AI hypers/haters" dynamics.
> I wish HN had a way to classify you as an "AI booster" or equivalent.
You can take any of the LLMs and have it vibecode you a user script in under 5 minutes, than you then can paste into Greasemonkey/Tampermonkey, and voilà, you have me labeled as "AI booster" or filtered out.
In fact, let me help you, I'll time it. I opened chatgpt.com in incognito (to emulate being a rando free user), and put the following prompt in:
> I need a user script I can paste into Tampermonkey on my Firefox that will clearly label user named TeMPOraL with robot emoji and some silly emoji, so I never forget when reading their HackerNews comments that they're an unapologetic AI booster.
Got back this script in under 10 seconds: https://pastebin.com/akEchvHd. Tested it, works out of the box.
This is the promised empirical example. It doesn't prove everything, but it proves something, and it took, end-to-end, a total of 1 minute to perform just now. You can collect many such examples over a single day by just trying. People who keep saying AI is useless and a fad and can't do anything useful, obviously never bother with even that.
FYI: I'm not an AI booster. I like AI, and I find it useful, but I'm not going out of my way to boost it. I just enjoy this topic, but more importantly - and I remain consistent in this - I point out bullshit that doesn't agree with obvious observable reality.
EDIT: try the example yourself, and post whether it works for you too - if it does, it's technically a peer-reviewed, replicated study, but I doubt it'll convince any of the naysayers of anything.
EDIT2: I have plenty of negative things to say about LLM capabilities and how irresponsibly people use them, and I do occasionally write about this (mostly at work, these days), but most HN threads on AI are not on this level - not anymore. They used to be more reasonable back in GPT-4 days.
LLMs are useful.
They're not "3-4 trillion dollars in investments over 5 years" useful, nor "crammed into the throat of every employee on the planet, regardless of their actual job" useful.
The way they are pushed right now will lead to a very hard crash and probably lots of suffering. Also, you need a more advanced prompt for Firefox on Android :-p
> They're not "3-4 trillion dollars in investments over 5 years" useful
Why not? They're a general-purpose technology, in the same category as "software" or "electricity".
> nor "crammed into the throat of every employee on the planet, regardless of their actual job" useful
They're potentially useful for anything that can be fed into computers (VLMs lifted the "that can be expressed as text" limitation, visual and audio tokens are not a separate category to text tokens anymore). That touches every single job people do in some aspects. Even though LLMs can't do physical work for people, they're still able to help with directing it and teaching it.
"Cramming into the throat of every employee on the planet" was already covered by many comments here, and the article itself - it's about forcing the obstinate holdouts to at least try.
> Also, you need a more advanced prompt for Firefox on Android :-p
No I don't; literally copy-pasted it to Tampermonkey on my Firefox on Android just now, and it works there out-of-the-box too.
LOL, it did work :-)
Regarding LLMs, they are pushed too hard and too abusively by business people. Employees are being laid off and replaced with chatbots that don't do the job. Frustrating if support for McDonald's, risky if health insurance support. Also the financials don't make sense. AI companies are money pits. Money is ultimately production. We make X amount of stuff yearly, globally. We can't afford to through away 5% of X yearly on technologies that will probably have a proper return in 5 or 10 years. When we mis-allocate resources on scales like these, people die. Look at Communist centralized planning. For $3-4 trillion we could have solved a LOT of actual global problems.
LLMs are fine but they should have matured in the software dev domain for 2-3 more years and then non tech products would have followed.
> refuses to believe water is wet
At the risk of abusing the analogy further: many people aren't refusing to believe it's wet, they're observing that sulfuric acid is also "wet" and can look similar upon visual inspection, and there's a lot of harm coming along with the demonstrated capabilities, in addition to those capabilities themselves being fickle and inconsistent (not a desirable property for a good technology).
This isn't a problem of "doesn't know what AI can do"; yes, some people are misinformed, but you shouldn't dismiss all refusal to use AI as being misinformed. This is a problem of "knows what AI can do, and based on that informed position thinks it's terrible and should have careful guardrails around it".
As a matter of fact, Nature does regularly publish papers about wetting properties of water. In fact, it just published one last week, from Nature Physics:
https://www.nature.com/articles/s41567-026-03299-z
Scientists find more or less everything very interesting, even (especially?) things that are supposedly self-evident. You can both make a big splash disproving self-evident things, and much can be learned from it.
Results like these?
https://www.faros.ai/blog/ai-acceleration-whiplash-takeaways
Or these?
https://www.forbes.com/councils/forbestechcouncil/2026/03/16...
Or these?
https://poll.qu.edu/poll-release?releaseid=3955
Yes, "results are in". They're all over the map, about productivity, about stress and churn, about trust, about public sentiment, etc.
But sure, if you want to tell people their productivity will be measured by token usage, they will certainly respond to that incentive by setting your checkbook on fire while they work on a job search.
If a company wants to provide AI accounts for people, along with guidance for usage and non-usage, that might well make sense for some jobs. It certainly makes sense for some uses. If they start measuring token usage, that's even worse than when companies tried to measure lines of code written.
The same way you monitor your staff's work in general? Do they not have goals and deadlines and some way to discuss their progress with their manager?
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
because that would require actually admitting that employees are the people in an organisation who are responsible for the success of that organisation, rather than the people higher up the org chart.
And here we are. AI use mandates are a humiliation ritual, at least how I've seen them. Because it's not just a matter of making the employees use AI; public criticism or speaking about the drawbacks are also punished. It's get totally on board or get out; if you're not completely gung ho, despite the testimony of your lying eyes, maybe you don't have what it takes to work here, son. It's something they use as a shit test, just like the North Korean dogma that Kim Jong-il scored a perfect 18 holes-in-one every time he stepped on the course: are you willing to compromise your values, to the point of mouthing naked untruths, in total submission to the company's leadership?
Do you actually have a job? Do you talk to your coworkers?
This is an insane take. Plenty of people are critical of AI at my job despite a big push to use it. I find the comparison to NK distasteful, coming from someone who presumably is pretty well paid and can quit their job whenever they want.
If you're feeling humiliated... well, I don't think it's because your boss wants you to try AI.
I had seen exact same dynamic as he describes. So yeah, he is speaking to people and coworkers. That is how he knows.
> Plenty of people are critical of AI at my job despite a big push to use it.
Are they critical to you and your 10 people team, aka a small circle or are they critical in the all hands Q&A in front of 500 employees?
Having been the guy to speak the uncomfortable truths at such meetings, I can tell you that does not end well for anyone. Expect to look for another job shortly afterwards.
Narcissists, non-violent sociopaths, and control freaks end up in managerial positions (often more likely than the general population). The pointy haired boss in Dilbert is a popular representation for a reason. We've all been subject to degrading and/or stupid management trends (see also: https://ibb.co/Kx46rqkg ), and while in the tech industry we had a golden age were the engineer was king, that's been chipped away even before AI became mainstream. Also, hyperbole is a thing. :-)
The logic of trusting employees who are worried that power tools will replace them to utilize power tools effectively is completely backwards in any sane world. People don’t like change, sometimes it needs to be forced on them.
> who are worried that power tools will replace them
maybe, just maybe, it would have been a better idea to engage with employees first rather than posting on linkedin about how everyone is going to lose their jobs.
cos it's the kinds of people trying to force this stuff on employees that are the ones who have been shouting about that from the rooftops.
If you take LinkedIn at face value everyone who uses the Internet is a sociopath who lives for no purpose beyond maximizing shareholder value.
Seriously, some of the most deranged things I've ever read were by relatively normal people trying to promote themselves on LinkedIn.
What people SAY does not matter nearly as much as what everyone KNOWS and it's pretty damn clear that AI is never going to be able to replace humans in complex domains. Every time a frontier lab announces a breakthrough it's pretty obvious that the setup was more complicated than "hey chat prove the Riemann hypothesis."
The world is gonna need skilled human beings to drive LLMs, no matter how desperately some people like to pretend otherwise.
Doubt. People brought in all kinds of web applications in the early Web 2.0 era because corporate IT was being too stingy (for a lot of reasons). People will find efficiencies on their job on their own. No need to denigrate them.
I don’t know, at my company at least tons of devs were holding out on ai usage until the token maxing stuff really started. It was beyond clear by that point that coding agents were a productivity multiplier.
A lot of people believe that. Not a lot of evidence on the table for it (it’s not agent developers’ fault; empirical studies are expensive and rarely live up to scrutiny). Not sure it’s worth forcing people unless you like malicious compliance.
Well here’s where you can level valid complaints against management I think. “Move fast and break things” doesn’t line up super well with “wait for empirical studies to back up your suspicions”
For sure. Just because the studies are incomplete or difficult doesn’t mean they’re useless. We still do unit testing and type systems continue to get more sophisticated and spread further because we believe they have an effect on quality and productivity regardless of the lack of evidence.
However it takes some taste in engineering and perhaps some mathematical sophistication to figure these things out. “Just use AI,” is not a very convincing argument either.
It’ll take time to sort out, I wager.
“Beyond clear” I wouldn’t say that confidently. Even now I’m not sure I agree with that, especially looking at it long term.
Yeah but if you can't attack the workers and make them hate their lives, are you even a good capitalist? Didn't Milton Friedman die for our bosses right to stomp on our faces in the pursuit of profit?
Alienation is inherent to the system.
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively?
Are you suggesting that changes to new production technologies are always driven bottom up by line workers? I'm guessing that historically that's rare.
Historically that rarely happens because industrial equipment is/was generally too expensive for the average worker to purchase on their own, plus workers usually have a budget of roughly 0 to buy extra tools, especially expensive ones.
But to give you an example, also roughly 0 companies made developers use Linux and still many developers choose it, so bottom up improvements happen in a decent chunk of cases. Nobody paid for PostgreSQL promotion. Or Python, etc.
> so bottom up improvements happen in a decent chunk of cases. Nobody paid for PostgreSQL promotion. Or Python, etc.
It does, but for better or worse, it's an anomaly. Even now, maybe nobody was paid for PostgreSQL or Python promotion, but modern OSS tools and programming languages usually have a business backing it. Linux, too, wasn't commercially promoted until it was; RedHat isn't exactly a charity after all.
Conversely, no one paid for initial AI promotion either - ChatGPT exploded organically after release, and for the first year or two, companies had a problem because a good chunk of their staff, including especially non-engineers, discovered just how useful it was and wanted to use it at work, casually violating every internal policy, bylaw and even regulatory policies about data sharing. The massive spend on promotion - including first-party spend - came later, but at that point it was already obvious ~everyone is going to be buying it.
I suppose bottom-up vs. top-down may be in part about how mature a technology and industry is.
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
For one, software tools are cheap, especially with OSS in the mix. You're buying one "tool" and paying for operational expenses that scale with total usage across all company.
But secondly, and more importantly, the "consulting" and discussing was done over the period of last 3 years, by ~1 year ago the high-level conclusions were pretty much locked in, the worthiness of the adoption was blindingly obvious at that point, so I can see why tokenmaxxing would be where this ended up, even though (here I disagree with the article a bit) the tools aren't at the "compounding correctness" stage just yet. It's really quite simple: the stick didn't work (telling people in increasingly direct ways to try using AI for stuff), so they tried the carrot.
$deity knows a good chunk of engineers will inadvertently fall for any trick that involves a scoreboard. That holds even when they're perfectly aware they're being tricked.
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
Again, they did that, they've been doing it continuously over past 3 years. Some people are excited, some people don't care, but some - a population that's definitely overrepresented in HN comments - just stubbornly refuse to try. Now that the answers are in, and they speak in favor of AI, the companies are doing what "any normal employer would": trying to get the stubborn employers to do their job they way their bosses want them to.
(In fact, normal employers would be more eager to fire people who keep refusing top-down instructions - but it's also obvious this technology is experimental; the models and harnesses get more powerful faster than people can learn to use them - so carrots make more sense than sticks in this transition period. Stubborn people begrudgingly using those tools offer an entirely unique perspective and explore use cases and approaches that you won't get from excited adopters.)
> the worthiness of the adoption was blindingly obvious at that point
Everything is so "blindingly obvious" yet nobody can point to ANY serious peer reviewed studies that prove it.
I'm patient, I'll wait.
You don't need peer reviewed studies to tell you water is wet.
Peer review is a technique to get evidence from data when SNR is low. It's not "science", it's just a technique. So is "throwing shit at a wall and seeing what sticks". Don't turn techniques into rituals, and science into religion.
Vibes are not evidence, neither is a curated demo. You need actual measured evidence that has an adversarial review to actually prove something without falling to confirmation bias.
> You don't need peer reviewed studies to tell you water is wet.
You don't need a peer reviewed study to tell you that a heavy rock will fall faster than a light rock.
Which is why we have peer review even for obvious things.
> You don't need a peer reviewed study to tell you that a heavy rock will fall faster than a light rock.
Either I don't understand gravity, or you might want to pick a different analogy...
Aristotle didn't understand gravity like you do, see https://en.wikipedia.org/wiki/Galileo%27s_Leaning_Tower_of_P....
I think GP is being sarcastic, and pointing out that
1. "heavy rock falls faster" is what common sense will tell you (I was literally told this by multiple laypeople just a few days ago when sightseeing atop a tall tower)
2. This is disproven by a trivial experiment that nobody thought worthy of trying for millenia
3. therefore we do need peer reviewed studies to confirm even "obvious" knowledge.
Also, note that GP's parent post about "water being wet" is quite the subject of contention in scientific and philosophical circles, so that wasn't the best example either.
> You don't need peer reviewed studies to tell you water is wet.
I'm afraid I have bad news for you...
I guess you mean that physics fact/joke that water isn't wet? It makes things wet.
Because people don't know what they want until they have and use it. Faster horses, etc. One can only really implement systemic change from the top down, as Moloch indicates.
Because Japanese hand tools are objectively less efficient than power tools in a carpentry shop. The guys that want to use hand tools can go work in a boutique that charges a premium for that level of craftsmanship. If you told them to use power tools, no amount of utility would convince them to use them, with most of their justification being psychological. Also, "It is difficult to get a man to understand something, when his salary depends on his not understanding it."
>If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
I run a small business with two employees.
N=2 here, of course, but one of them will experiment with any new process you introduce (as well as plenty more that you don't!)
The other will keep doing what he's always been doing, even if it's frustrating and inefficient, unless you monitor him and force him to use the new process.
I could imagine most "normal employers" would understand that both type of person exists and, assuming you're getting good first impressions from group A, it's usually better off in the long run to shove the new process down group B's throat.
(This isn't to say that the "Group B" employee is less valuable or anything - he is more conscientious and reliable than anyone else we've ever hired - but just that different people need different management styles)
In my experience, your first dev will have four thousand ideas and experiments on the go, and leave an absolute mess in their wake.
And your second will be struggling to clean up that mess while also getting their own work done.
Of course, you expect the same level of work from both of them, but because person two has to do a bunch of person one's work as well as their own, person one ends up looking better and gets praised by management.
I'm totally not bitter at all.
That feels backwards.
> it's usually better off in the long run to shove the new process down group B's throat.
> (…) the "Group B" employee (…) is more conscientious and reliable than anyone else we've ever hired
If employee B is proving themselves to be valuable and reliable, then you should trust them to make the best decisions for how they’re going to go about their work and support them. Leave the door open for them to try different things, but no one likes having processes shoved down their throats (your words). All you’re doing is making them unhappy and more likely to leave to go work for someone who’ll value them like they deserve.
Thank you for taking the time out of your day to explain the best way to manage someone you've never met, working in an organisation you know nothing about.
Related reading (Exploration–exploitation dilemma): https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitati...
You may want to consider that your Group B employee may be conscientious and reliable because they use an apparently “frustrating and inefficient” process. Productive friction is a thing: processes which force you to slow down enough to put careful thought into what you’re doing and why. And if they’re stuck in a loop of doing frustrating work - you may well consider why they’re doing so much frustrating work. Maybe that can be resolved at the managerial level!
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively?
I mean, the difference in the metaphor is that we have pretty fully understood carpentry for many hundreds of years. We still find it difficult to write even simple software to address all our needs, as is evidenced by the insane pay in our industry. Carpenters can suggest tools because they know what's out there. The same was not true about LLMs a year ago.
> That is way too charitable, people were being fired based on these metrics
People get fired for all kinds of reasons including no reason at all. Oftentimes leadership even lies about the real reasons for firing people because they don't sound good!
I'm gonna be blunt: if you're in software and you refuse to use AI for moral reasons, I think you should be fired. There's being principled and there's being obstinate and the difference between the two is how well you can convince people that you _have_ principles. Most LLM-hating people fall short on this point, because
> do I really need to link the Jensen Huang quote?
Sure! Link it again, we all know it's highly immoral when shovel salesmen try to make you want shovels.
> If you want to see if the tools work, why don't you just ask your employees? Like any normal employer would?
I do not like this HN take of "let's do this thing that works great in small companies and then just blindly pretend that it'll also work at the largest companies in the world!" No, this doesn't work at "normal companies" because you cannot "just ask" 30k+ employees what they want.
Employees, like EVERYONE ELSE, are resistant to change. If I, as CEO of a company, want to get my company to try Claude I have to measure tokens to see if it's getting used. That's it. There's no wave of delusion here.
Have you considered using a more scientific metric, like the number of bugs being closed or the number of typewriter ribbons being used up?
> The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world
People are stubborn. A lot of productivity improvements had to be almost forced upon farmers, for example. Even when early adopters demonstrated the benefits, a decent fraction of them just didn’t want to change.
> People are stubborn.
This is just a variant of the argument ”people don’t know what’s good for them”. You’re very close to the actual answer, which is that the aforementioned ”manager class” is simply convinced that they understand reality better than those below them, which is quite frankly absurd considering the fact that managers very rarely do any of the ”real work” that these tools supposedly make redundant, and yet they still believe themselves to understand the potential better.
Maybe multiple things can be true
Like when doctors insisted they didn't need to wash their hands (https://en.wikipedia.org/wiki/Ignaz_Semmelweis#Conflict_with...)
or "science advances one funeral at a time" (https://en.wikipedia.org/wiki/Planck%27s_principle)
And not few of those 'productivity improvements' for farmers have had disastrous consequences, even though what I think you are referring to has been implemented with far greater discernment and empirical basis than the current AI revolution.
People are stubborn, but sometimes for good reason. Let the stubborn people hold on to their practices, if the innovators are right they will eventually fold anyway.
> not few of those 'productivity improvements' for farmers have had disastrous consequences
Sure. Many have not. I’m thinking of stuff like ox-drawn and then mechanized ploughs, four- versus three-crop rotation, et cetera. The point is there is pushback regardless of benefit and even after it’s been demonstrated. Plenty of people are fine being comfortable. Which is fine. But it also explains why companies and societies with a nudge feature do better.
> if the innovators are right they will eventually fold anyway
Again, sure. If it’s their land, it gets acquired. If it’s your land they’re tilling, you get a say.
I’m not saying all-nor even most—pushback is unfounded. Just that there are plenty of cases where it is, and the solution there is to push through the change.
>Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? T
What happens if employees say no power tools are needed and after a few months a competition shows up with power tools and hires a bunch of noobs and beating your production numbers and sales?
Your employees simply may leave the company and work for them and learn the new culture at this new competitor.
Is there any law which prevents people from moving between companies? No? Then the promoters of that company are going to do what they think is fit to keep them in business and stay competitive. Many times they'll be wrong, sometimes they'll be right.
> beating your production numbers and sales?
That did not happened.
Many, many hand tool shops were outcompeted by power tool users.
To use your analogy again, it's kind of like the shop boss buying everyone a table saw and then saying "The best way to use the table saw is just experiment with it, it's the fastest and most accurate straight cut we can get - the future is table saw."
Yes, this is, in fact, how adoption of table saws and other such tools looked like, while they were still new tools. The basic form and function was established and its utility proven in both testing and early adopters, but as new kind of general tool on the market, every user from "early majority" was still writing the operating playbook for their specific shop conditions and kind of work they're doing.
So yes, it's a great analogy. We're right now well in the stage where bosses say, "evidence is in and conclusively shows this is useful for us, now the job is figuring out exactly how to work it into our particular business".
> figuring out exactly how to work it into our particular business
This is the most crucial bit. Neither ramming it down developers throats nor rejecting it wholesale is particularly productive. You need the conservative people onboard as well, to discover critical edges and failure modes. Including their criticism in the adoption process instead of bluntly banning it is the smarter move. Of course, there will be a few people who just don't play, they will fold eventually or be let go.
I see the point but, I'm not really sure the analogy holds up here. If i was in a cabinet shop and had to joint, plane and resaw and cross cut a pile of timber fresh from the saw mill for the next job I'd be very grateful for the jointer, the planer, the bandsaw and the table saw. I'd also be very grateful for the dust extraction.
In in total agreement with you though, forcing tools on employees is very dumb and is terrible leadership. Ask your people what they need to be optimally exceptional and go get them it. Then let them get on with it.
> In in total agreement with you though, forcing tools on employees is very dumb and is terrible leadership. Ask your people what they need to be optimally exceptional and go get them it. Then let them get on with it.
Some employees want AI tools, others don't. Standardizing SDLC workflows > each person does their own thing. So now you have to choose: do you require AI tool use that fit into a new SDLC? Or don't you?
I don't see why they have to be mutually exclusive. Assuming the vendor risk profile is acceptable to the infosec people and procurement are happy with the AI tools vendor relationship, then it's a tool inside the information security perimeter.
As long as there's evidence that work meets quality gates for any required customer audits, and your customers are happy and in the loop that AI is a thing that may or may not be used to produce the service, then those engineers that want it can have it and those that don't, don't.
Feels like a revision to an SDLC rather than a new one. Without seeing the SDLC it's hard to find common ground though. It really depends on how it's written and implemented and of course: culture. In the example we're working from sounds like the tools are being forced on people, and that's less infosec, SDLC and more unbearably bad leadership.
Not sure why you bring in "vendors" and "infosec". I'm simply talking about the situation of a software engineering team building something together needing to have similar workflows (supported by tooling/software) in order to work together effectively.
> Why would a carpentry shop buy hundreds of thousands of dollars of power tools without consulting with their employees to see what they actually need to get their job done more effectively? The logic of buying the tools then forcing the employees to use them "or else" is completely backwards in any sane world.
This is how it's gone down throughout history. It's why we remember the Luddites, textile workers who started smashing stocking frames and power looms because the machinery was introduced over their objections. The whole goal was to undercut the craftsmen's wages and bargaining power.
So, no, your expectation to be consulted was never going to happen and has not happened throughout history as industrialization has advanced.
> but suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting
You're far too charitable. Understanding has nothing to do with it. Big companies are too far insulated from bad metrics. Middle managers get away with anything and everything because their decisions are too far removed from reality. And they're nowhere to be seen when the other shoe drops. And they'll just leave to a promotion elsewhere if they stay and results are bad.
Everything is far removed from reality in bigco. So you get a bunch of theater and house-playing with "data-driven" posters up on the wall. It's a show that everyone is aware of and seemingly we all still attend.
The level of trust in leadership is remarkable. There’s reasonable ways to have people try power tools. Have one team use power tools and another hand tools and see the outcome.
The mandate was literally “the more sawdust you create the more money you’ll make”. Nothing of value is learned by that mandate. Sure it’ll make people use power tools but it won’t cause anyone to learn how to use them to make furniture.
They might understand the danger of bad metric but that doesn’t mean they aren’t victims of them. If there was intentionality here it was lazy as hell at best.
> suggesting that the people who run FAANG don't understand the dangers of bad metrics is... interesting.
from my time in FAANG... that seems about correct. Probably the people at the absolute top don't want to just pointlessly burn tokens, but pass that down the chain and eventually the rumor mill turns that into "tokens are an input for your performance review" and people start running Wiggum loops to fix minor typos or linters or something—especially if you do it at a time when every company seems to be doing layoffs.
> If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
Or count the fingers, I guess. It's all fun and games until someone looses AI.
Bad managers, in general, grab a metric and then unthinkingly optimize it. I’ve never worked for FAANG, but I’d be surprised if they didn’t have bad managers too.
> the people who run FAANG don't understand the dangers of bad metrics is... interesting
They don't. They want some metric to support what they want to do and don't care about good metrics at all.
I've spent the vast majority of my career in FAANGs and it's been the pattern everywhere.
Right now my org has a senior director who is constantly battering managers to tell their reports to fill out the weekly surveys.
Why are the employees not filling out the surveys? Because instead of the old once a year large survey with questions about various levels (including local teams where management cared about the numbers and I could see the actions they took) we now get a survey every week with questions that are meaningless and I have no answer for.
"How does team X deliver on its priorities"?
Team X has O(10K) peoples and a barely countable infinity of projects. Most of which I don't know about and most of which I'm not supposed to know about since things are compartmentalized. So I don't know what team X's priorities are, I don't know how they deliver on them, and I never will know. Asking me and my colleagues is a waste of time and money.
...but none of that matters because the directors want "data" and they want a dashboard showing that we're all giving them "data".
Looking for sawdust is a far cry from having a leaderboard of who turned the most wood and electricity into dumpsters full of sawdust
The switch away from hand to power tools was a while ago but not, like, ancient history. In the era with fairly widespread literacy and records. Did this sort of check for sawdust thing actually happen?
I worked at FAANG. If anything, people are not nearly skeptical enough about how dumb it is with all this going on.
People are (in this analogy) building sawdust farms there.
They didn’t measure sawdust accumulation. They measured the electricity bill.
Or... If you are a carpentry shop owner, you should understand what exactly the power tools you acquired are good for, how and if they can actually be used by your employees for them to do their job.
This, obviously, presumes that the person managing this hypothetical carpentry shop knows what they are doing. It's almost laughable.
In truth the carpentry shop owner manages on vibes, has no idea what employees do and also doen't trust them, and tells employees he wants to see a lot of sawdust in the workshop floor.
Why would you look for sawdust? That's a waste product. You would motivate everyone to stop producing actual furniture, and just buy the biggest bits of wood to turn entirely into sawdust. Which is textbook Godhart's law.
This is what's happening here, you have people setting up two chatbots to churn useless tokens at each other, making only sawdust.
I contend that tokens per se are actually a waste product, or at least non-value add. The end user doesn't actually care how many tokens were used to make a thing. If you could get the same result with fewer tokens, that would be an improvement.
> If you're a carpentry shop that just bought power tools for the first time and you're worried that your employees are sticking with hand tools because that's what they know, then you look for sawdust.
But to make this work, you cannot tell your workers that you are looking for sawdust, because you just gave them tools that make sawdust very easily.
Eh, it’s less hysterical than the ever-pervasive belief among junior devs that they are the smartest people in the room and that all managers everywhere are dumb.
Though I understand that gets social validation from other people with no actual experience.
The retconning is absurd. Companies CxOs's exhibit herd-line behaviors all the time: hiring scrum masters, mandatory return to office, and now tokenmaxxing. FOMO is a sufficient explanation of the behavior, any trend gaining momentum that they fear will give their competition an edge, they will reflexively force adoption with no further reasoning needed. There is no cost to boarding hype trains that go nowhere.
> overpaid manager class
Ugh. Tell me you're early in career without telling me. Sophomoric take.
Sorry to disappoint bud, I'm a manager of managers. Being able to look critically at your own tribe is one of the first skills you learn if you want to survive as someone telling other people how to do their jobs.
> That’s no longer true. We’ve entered a different regime, where spending more tokens generally results in better results. We call this “compounding correctness” — the more tokens you spend on getting a task correct, the more likely you’ll get a good outcome. We talked about this a bit at the last in person Agentics meetup:
Have we? Is it generally the case that the more tokens you spend, you better results you get? This take is so weird I suspect author somehow financially benefits from tokenmaxxing.
> This take is so weird I suspect author somehow financially benefits from tokenmaxxing.
They might own a chunk of NVDA.
it's a huge oversimplification imo. it reminds me of someone worshipping LOC: more = better
Did you read the article?
"Most teams haven’t yet figured out how to build their own Ramp Inspect or Stripe Minions (if that’s you, reach out — we can help!) but basically everyone is at least using cursor in the side bar."
? What is your point? That the OP is obviously finically motived to encourage tokenmaxing?
Here’s what they said, $$$ aside:
> That’s no longer true. We’ve entered a different regime, where spending more tokens generally results in better results. We call this “compounding correctness” — the more tokens you spend on getting a task correct
> Compounding correctness flips the calculus. If more token spend leads to better outcomes, then you’re going to want to spend a lot of time running tokens. Which sure as hell sounds like tokenmaxxing to me! The original incentives to tokenmax are gone, but eventually folks will realize that a new and more powerful incentive has take its place.
> There were ways to get loops to work, but it was hard. You had to think a lot about how to prompt the agent, which in turn required a pretty deep familiarity with how these things work.
> Now, though, it’s easy. Compounding correctness makes it easy
Go on, tell me I’m quoting the OP out of context.
It’s pretty clear this person believes in compounding correctness, while other, more serious people (1) are perhaps more skeptical.
..and Armins company owns pi. You can’t get much more all in on AI.
Compounding correctness sounds cool, but the real examples of people spending lots of tokens are not compounding correctness; they are wide parallel exploration; like Mythos. The OP is confused, and wrong; they’ve made some basic (flawed) assumptions, and based their entire reasoning on them.
…and are selling AI things. How surprising.
[1] - https://lucumr.pocoo.org/2026/6/23/the-coming-loop/
"I suspect author somehow financially benefits from tokenmaxxing"
Yes, because they are selling AI services. The article is an ad.
Looks like it based on Anthropic’s own multi-agent orchestration research:
https://www.anthropic.com/engineering/multi-agent-research-s...
Their findings suggest multi-agent systems result in better performance attributed mostly to token usage (80% of variance).
Hi, author here. I'm probably somewhat financially benefiting from tokenmaxxing. I also just believe compounding correctness is right, based on my own experience using the tools (which is why I have structured my life to try and financially benefit from tokenmaxxing)
How do you structure your life such that you financially benefit from tokenmaxxing?
starting a company that benefits from tokenmaxxing, mostly
Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
For companies that have measured performance based on token spend, they can now dial it back. Employees have learned to leverage AI for things they wouldn’t have prior. Now they know what’s possible and what’s not.
No one is stupid enough to always measure performance based on token spend and have unlimited budget. It was always a temporary thing to transition the employees to a new world.
Management felt like employees weren't leveraging AI fast enough. That's why in 2025, there were many mainstream articles about how CEOs were forcing their employees to use AI or get fired. Tokenmaxxing was just the other extreme. Companies will arrive at an equilibrium.
There's no need to overthink this.
Edit: One reply cited this X post as an example of why management needed to do this. Trying to change a company with hundreds/thousands/tens of thousands of employees is hard. You have to send one simple message at a time. https://x.com/danluu/status/1487228574608211969?lang=en
The problem is that managers have no idea how this is supposed to help either, and just get told from above to use AI.
having heard the arguments made by some VP + C-levels throughout the Tokenmaxxing Tulip Mania, I think the interpretation that those mandates were made intentionally for "forcing employees to start leveraging AI in meaningful ways" is too charitable.
Most companies focused entirely on doing "what everyone else is doing" at best or "to see if Programmer Joe can be as productive as the entire team so we can fire the rest".
And many indeed fired employees in droves because they were "underperforming in token spend".
> Most companies focused entirely on doing "what everyone else is doing"
This is true of my current overlords. It slipped recently that the reason they went AI-nuts was that a competitor had announced going “AI first” and the market responded excitedly. Not because they thought it was a good idea: because the market got excited and they didn't want to get left behind.
This is quite a change as our market is financial services and I remember a time when we had to support decades old browsers (one large UK bank who I won't name here had IE6, and only IE6, on many of its user's machines until ~2017) and web servers because they refused to upgrade anything.
> "to see if Programmer Joe can be as productive as the entire team so we can fire the rest"
I'm not sure who Joe is in our outfit, but I'm certainly in the “the rest who are to be fired” set. I've been unhappy in dev & related for years so the AI revolution which I don't care for is where I'm consciously letting myself get left behind to find something else to do with my life. Haven't touched it. Was too late to claim one of the first tranche of Claude licences. And the second. Oops. Maybe I'll use AI in my next big adventure, or maybe my distaste for it all means I have a grand future waiting for me in the hospitality industry!
Isn't it easier to get a job when you already have a job?
There's some jobs I'd love to do, but I can't face the bullshit of tertiary education again.
Without some sort of ticket, job choices become more limited?
> Isn't it easier to get a job when you already have a job?
Yes. Or so I'm told, I've not needed to apply for a job for 26 years…
I have something possible available, though whether it still will be in five months (the earliest I'm likely to leave because of [reasons] and a two-month notice period) is a bit unknown. That five months might be ten as there are other major changes in the company (we were bought a while ago) from which the dust should have settled by Feb, and it makes sense to try to hold out that long to see if I'm still hating things with the same passion at that point.
Without that “something” there are less certain tech based options I could look at, and to be honest I really could do with a proper sabbatical style break. The mortgage is paid, I have savings, and no dependents other than the cats, so I have the luxury of considering that option. And if all else fails I've actually done the arithmetic and I can survive on minimum wage for an extended time if I need to, and hospitality work is something friends can get me into above the many others looking (that bit is less of a joke then people assume: it is seriously part of my plans D & E if I can't stick with A and B & C completely fall through).
> many indeed fired employees in droves because they were "underperforming in token spend"
1. Source for that 4-word quotation? I googled it, but it appears you are the only person who has ever said it?
2. Even if you made up the quote, source for the claim that "many" "fired employees in droves" for "underperforming in token spend"? (Again, even if the companies never used those words, I'm still interested in the source for the claim about many companies firing employees in droves for low token use.)
Some companies are badly managed (at least in some aspects). But it’s also true that some devs need to be pushed - sometimes forcibly - out of their comfort zone.
I’ve had multiple instances of taking months/years to get some devs to use a more sophisticated git client than GitHub Desktop (so they could properly do anything but the most trivial merges/rebases for example). Or to learn how to use the debugger instead of just printing/logging for debugging. For some of them getting them to seriously figure out how to better use AI required a bunch of repeated prodding.
Funnily enough a few years ago they enthusiastically jumped on copilot’s fancier autocomplete in VS Code, but getting them to really figure out how to get the most out of Claude Code required more pushing.
That's a very good point. Our company has been very thrifty with our AI spend, until a few months ago the average employee had ~$50 of supported spend and I was trying to be an AI leader in the company and figure out what was and was not possible, I had a $100/mo spend (Claude $100 service costs $108/mo).
We are now seeing that Claude Code can do a LOT of heavy lifting in our day-to-day work, but the bulk of our employees are stuck cost-maxing and literally cannot "imagine how you are running into your session limits". "I'm fine with the $20/mo account."
There's a case for the cost-maxing has hurt our company.
I'm in the boat of wondering how so many people run into session limits so often. I have never hit one, except once when Claude Design came out and I had fun generating a bunch of random things to see what it could do (not with the intent of actually using any of the generated designs/code, because it all sucked).
I'm using the $200/mo Claude, and I'll often hit the 5 hour limit, but last week was the first or second time I've hit the week limit. I run pretty much everything on "/effort max" because I've had good results, and I've had plenty of quota left usually, and I want to worry less about "am I getting the best results".
Fable, for the few days I had it, would eat through tokens pretty quickly, largely because it tended to work much more on its own. I could give it a task and after asking a few questions it would go off and work for 4-6 hours and be done.
I also run a lot of experiments. I'm trying to be a resource that the rest of my team can learn from as far as what works. For example: when one of the people from our parent company asked about automating payroll entry, I threw their documents and discussion at Claude to see what it'd build. That plus churning on their feedback was ~30 hours of API usage right there.
I'm currently experimenting with "loops", and using codex in those loops to provide feedback and review. That gives me fable-like autonamy (that 30 hours of API usage above), maybe even better. But it uses a lot of tokens. Loops is the bulk of why I got to the weekly limit last week.
Plus I'm having it build an experiment on what my ideal "agent mux" would look like. Herdr is really close, I found it after I started that experiment. Now I'm just letting it run when I have spare usage to see what it comes up with.
It really wasn't. It was a moronic move fueled by hype, implemented by the same type of incompetent business leaders who previously, to various extents, drank the blockchain and metaverse kool-aid.
There was demonstrably zero cost or consequence analysis, which is also why it was dialed back as soon as the (still) subsidized tokens became just slightly less subsidized, and the wise leaders realized they spent huge sums of money with no way of gauging ROI.
LLMs may have their use cases, but let's not make up free excuses for blithering idiots who, by any rights, should all be fired for cooking up money-burning policies that are textbook implementations of Goodhart's law.
Anyway, just needed to get that off my chest.
You're naive, uninformed or turfing if you think companies are still not tokenmaxxing.
Also tokenmaxxing was never an intentional and smart strategy employed by companies like you say. It was a mix of fear of missing out, signaling to investors they were in on the hype and recouping investmenets in data centers
Yes, and Uber was trying to recuperate what investments in data centers?
Come on now. Let's not think that we are all smarter than management at these companies.
Your business will suffer greatly for your short-sightedness. But yeah, go imitate Uber, I am sure you will get just as big as they are this way. Everybody knows Uber's success comes from Apple Vision Pro making their developers oh so productive. You should go to the Apple store right now.
Your livelihood now depends on tokens remaining subsidized. How long do you think your engineers will continue to have the independent ability to maintain your codebase if the tokens got 20x more expensive?
Buy and sip that intelligence straight from the tap.
I never said go imitate Uber's strategy. I just challenged the person who claimed that these companies are only doing it to recuperate data center investments when Uber doesn't have any data center investments.
> Let's not think that we are all smarter than management at these companies.
Outside of a few well run companies, it's hard not to feel like the average IC is smarter than their leadership.
Let us also not think that management is any smarter than any of us and is playing 5D chess games we couldn't comprehend. Notably, games that they also could not articulate when they were making these decisions.
It’s so easy to comprehend that even I was able to do it. I don’t think it’s 5D chess. More like checkers. They have dumb the message down just to get developers to try.
CEOs are just as, if not moreso, susceptibility to fomo than everyone else!
I don't disagree. They talk amongst each other, get advice from expensive consultants, but often lack the knowledge that their on the ground workers have. That said, I still think this was done to get employees to adopt AI faster and see what is possible rather than a long-term incentive.
People in small teams with managers promoted from within could probably have had this in mind.
Big Corporate managers are much more likely to have felt the need to “do AI” from their VPs, who in turn got it from the executive team, who have probably been under fire to produce a coherent magical AI strategy that makes to company scale infinitely while reducing costs. In that environment it’s much more likely to be copy-and-pasted charts from Gartner and buzzwords overheard at conferences, combined with the hope that somebody somewhere will eventually turn it all into something that resembles forward movement.
This exactly. Gartner and McKinsey were paid by openAI and Anthropic to sell executive teams on AI usages 20xing profits while cutting spend 70%. And if they don’t immediately make the front end investment now they would be left behind by their competitors.
The presentations were convincing and many bought large upfront wholesale tokens then forced them on their employees.
When the savings didn’t materialize and neither did the profits. And the spend was no longer going down but up, many execs rather than admit they were duped began blaming workers at scale for not making up for their stupid decision.
This is the “get them addicted” part of the platform play.
The problem is unlike any successful drug this was expensive and low payoff.
Do you have a source for this?
> Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
> It was always a temporary thing to transition the employees to a new world.
Trying to understand your justification for rejecting Hanlon’s razor.
Yes, my own company's decision and logic.
An insane re-writing of the last year of bullshit insanity. Good one.
Yeah there's no way that was the reason. I judge it to be a combination of FOMO and the big tech companies needing to pump demand for compute.
Demand is already so large that OpenAI, Anthropic, Meta, Google could not fill it. Tokenmaxxing for these companies strictly to pump fake demand is just plain wrong. The inference demand for these companies internally must be a drop in a bucket in overall inference demand.
This reminds me of the popular opinion on HN for return to office mandates as executives wanting to recover their real estate investments.
Out of $13Bln of 2025 revenue, OpenAI received $867 million from one customer (less charitably, one bankroller), SoftBank. And $300 million from Microsoft[0]. That's more than a drop in the bucket, especially given that they're not the only players complicit in being both an investor and a customer.
Also are we sure it's all at arm's length? Barring a full audit, it's not possible to guarantee that there's no round-tripping or overstating of revenue. With Microsoft also being a provider for OpenAI, they could be creatively using set-off, or using SG&A, in order to overstate their revenue/gross margin/inference profit margin. I of course have no proof, extraordinary claims etc. etc. It's unlikely but we should at least debate the possibility. They have such a huge collective incentive to do it.
[0]: https://www.wheresyoured.at/exclusive-openai-financials/ (no affiliation with the website owner, who has a unique bias in this)
Are we not all getting timeout issues from Claude Code and Codex frequently due to too much demand?
I remember a story on HN from a while back. The idea is that the larger the org, the simpler the message and the tool has to be to reach everyone. The comment author was saying that as a junior, his company implemented a "tokenmaxxing" scheme for A/B testing - more tests, better for performance review. He, back then, thought it was stupid. However, it got the desired outcome of everyone being familiar with what experiments are and how to run them.
Dan Luu at MS: https://x.com/danluu/status/1487228574608211969?lang=en
This is exactly it.
> Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
No, it was a sinister way to manufacture your consent to cause cognitive atrophy in your employees so that you lose your ability to independently operate your business.
You'll come to realize this once they begin charging you more and more for tokens but you will probably not blame yourself for it.
> to cause cognitive atrophy in your employees so that you lose your ability to independently operate your business
The argument is tokenmaxxing was put in place by companies, with the goal of causing their employees to lose knowledge?
This feels like it’s on the level of claiming conspiracy theories are invented by tin foil companies to sell more hats.
This is an insane level of cope.
The whole tokenmaxxing thing started because Jensen Huang said insane things like having a single engineer spend 250k in tokens or he’d fire him; and that OpenClaw was basically AGI.
> No one is stupid enough to always measure performance based on token spend and have unlimited budget.
Yes the people forcing these mandates absolutely are this stupid because that’s what people like Jensen Huang, Peter Steinberger and Boris Cherney were touting. Seriously have you ever actually talked to an average C-Level about AI? They are absolutely cooked.
You’re the one that’s overthinking it.
How many “average C-levels” have you talked to? What, specifically, do you think that actually means? Do you think the average CMO and CTO are identical, and have identical profiles in this case?
Or are you just blathering about things you’ve never experienced because you met the “CEO” of a five person company once? I find grand proclamations by people who speak in TikTok absolutely laughable memeing.
You don't need to meet these C-levels personally. They spread their insane hot AI takes everywhere and go out of their way to market this crap at conferences, etc.
Did you read those hot takes personally, or only had them reported on by your favorite YouTube/TikTok pundits? Hint: YouTubers and Tiktokers are in business, and that business is entertainment, and they're about as truthful as a fortune teller.
(Difference being, one of these groups is just lying about objective reality that's trivial to independently verify, the other one are just unlicensed therapists with thousand years old rituals).
> like having a single engineer spend 250k in tokens or he’d fire him;
That’s not quite what was said there, he’s budgeting half a devs salary as token spend in a podcast and that if he had a 500k engineer who spent 5k on it at the end of the year he’d go ape shit.
Now you can say that’s wild, and sure, but this is not a standard c suite exec talking it’s the ceo of Nvidia.
Even ignoring other hiring costs this is essentially an argument that Nvidia top engineers should get more than a 50% performance improvement with extremely heavy AI usage. To me, that doesn’t seem like such an enormous statement. For the head of a multi trillion dollar company entirely driven by AI sales arguing it gives a useful benefit to engineers isn’t that odd and betting on a 50% improvement within Nvidia seems kinda normal.
> Seriously have you ever actually talked to an average C-Level about AI?
Yes. Single digit percentage improvements over time would normally excite them, the idea of cappable cost performance improvements that last which your devs actually want to experiment with and a cultural and customer expectation that you’re doing this is pretty enticing. Particularly during a time when tokens were heavily subsidised - isn’t that the perfect time to do it? Now that has ended there’s a huge focus on roi.
> Tokenmaxxing was just a way to force employees to start leveraging AI in a meaningful way.
Of course not. That is not what it achieved or could possibly achieve.
> Management felt like employees weren't leveraging AI fast enough.
I agree it was about their irrational feelings.
Independent of everything else, very interesting to see how polarized the comments are here
I'm a +26 on my post so far so it seems like there are a lot of people who agree with me but most replies disagree with me. I suppose this is the nature of online forums - that those who disagree will take the time to reply but those who agree rarely do.
FWIW I agree with you, but it doesn't add much to the conversation to leave a comment saying so.
I also agree with the comment you're replying to as well - the vitriol and anger, along with the "this is just another blockchain bubble" type relies is really interesting. It's so surprising to see the variety of (negative) replies and beliefs people have, along with the general distaste/distrust for management. I guess it's also largely a sign of the times since a lot of ICs probably have a ton of anxiety about their career.
It’s about power and leverage. Software engineers were seen as “gods” in a tech company. Even the crappy ones. Over the last year, really 6 months, they lost great deal of that. Now they’re seen as costs, rather than assets.
This is especially true for the devs who take the code more seriously than the business that employs them. The technical PM who knows a bit of design are suddenly the kings of the company.
> I suppose this is the nature of online forums - that those who disagree will take the time to reply but those who agree rarely do.
Why would those who agree “take the time to reply”? To say what? “This”? “Agreed”? “This guy knows it”? Those comments don’t add anything of value. When you agree, it only makes sense to reply if you have something to say which wasn’t covered by the original argument.
I don't disagree with you.
I'm just pointing out that there are equally, if not more, people who agree with me than what the replies seem to suggest.
> There's no need to overthink this.
I agree, but for a completely different reason. A lot of executives simply chase trends. This was another trend they copied from each other. No reason to imagine they carefully studied the issue.
might be the first time I've seen this reasonable and obviously correct interpretation of the last 6-12 months so directly and unapologetically stated. bravo
HN opinions are usually divided into individual contributor vs management battles. Usually the IC opinion is majority because most people here are likely ICs.
At the IC level, people don't sense the impending urgency for the overall business. They usually sense the urgency for themselves first. AI has completely changed the software industry in 6 months. We went from having AI write some code and copy/pasting to having AI write 99% of the code in 6 months. SaaS went from nice UX and CRUD code logic being a moat to these being nearly free.
Big software companies have to adapt to this new world or they will be outcompeted by smaller, newer, nimbler companies. That's what management is thinking. For ICs, they're usually thinking about their own jobs first.
It does not seem obviously correct to me.
You’re post rationalizating
No. While what you’re saying makes sense, that’s not the logic behind the token max mentality. It’s simply lazy ineffective leaders who are bad at their jobs and don’t make rational decisions. They really did think spending more is somehow going to make their business better.
An interesting side effect of this spreading across social media is that even companies without token leaderboards were having problems with needless tokenmaxxing.
When everyone was reading about token leaderboards on all of their social media channels (include social news sites like Reddit and Hacker News) it created token anxiety even at companies that didn’t want a leaderboard. Programmers were afraid that their managers would be secretly ranking them based on token usage and they needed to pump up those numbers to avoid layoffs.
Once teams implemented token budgets in response it creates an ugly situation where a few people feel the need to use as many tokens as they can at the beginning of the budget window to stay ahead.
It’s really frustrating to have this phenomenon leak into a company that was never encouraging or looking for high token use.
The smart move would have been to get lower level managers to assign specific employees to experiment with applying LLMs to their processes and report back. Then incorpoate the findings into their processes.
Instead there was FOMO mass hysteria. Now there is a backlash. And a lot of time and money wasted.
Letting everybody freely experiment for a while is much more effective than appointing somebody to do just that.
Freely experiment sure. But you're not doing an effective experiment if you tell people they'll be graded on how many tokens they use.
I agree with you, but I was answering parent who suggested appointing specific people.
Its not _just_ that. Orgs aren't remotely sensible at measuring anything that isn't counted in dollars.
employees who are on the ai bandwagon are there for the free management attention.
Management is cooked because the damn market is hard, money is tight and they can't afford to fight the top down love and $$$ thrown at AI.
If you zoom out, all the real money spent on energy to keep AI alive isn't going to be held in nvidia stock for too long. it will burst, but its stupid to time it.
> Orgs aren't remotely sensible at measuring anything that isn't counted in dollars.
A sensible organization machinery will move to optimize the metrics that make money. Often times figuring out said machinery takes iterations. Some of them are idiotic (ref: tokenmaxxing) but they are generally directionally correct.
No one is stupid enough to always measure performance based on token spend and have unlimited budget.
Accenture was.
Thanks for posting the tweet, it was a very interesting read. A bit amusing knowing what's up with MS and Azure these days, but that's not the point!
It seems really absurd that anyone would encourage or even force employees to burn more money to see if maybe something works.
Why? This is literally what experimentation and prototyping is!
You spend money for a potential benefit. In this case it’s also a one off cost to find things that can save money over time.
One thing is to invest R&D money with a strategy and another to force people to burn as much money as possible to see what happens.
Did not read the twitter thread but I think it is a mix of some companies with above strategy and most others just cargo culting
> Management felt like employees weren't leveraging AI fast enough.
If my productivity is in line with their expectations, I don’t understand why management cares what tools I’m using to do it. No employer ever told me to use emacs instead of vi, even though I’m 10x more productive in one vs the other. So why all of a sudden does management need to micromanage my tools?
It's FOMO all the way down.
Your productivity isn’t in line with their expectations. Maybe your immediate manager but not the executives. That’s why they are doing it.
Their expectations aren't based in reality so I'm not sure why anyone should care
Edit: I mean besides the obvious of "because they will fire you if you don't care"
But idk. They're aiming to fire me eventually and have AI do 100% of my job so meh. Fire me now instead of later.
Because they're reading blog posts and listening to podcasts that increase their expectations of what your output should be.
> why all of a sudden does management need to micromanage my tools?
Because doing so increases the value of their stock options. They might privately think it's as dumb as you do, but apparently the stock market disagrees.
Expectations shift, and tools do matter.
Imagine you had a direct report. They were doing just fine, slightly better than a typical report. Then you found out they were writing all their code in notepad - no linting, no automated tests or live updates, no refactoring tools, no highlighting or any code search. They didn’t have any cross code searches and didn’t have any documentation. When they hit a problem, they’d churn away at it and never reach for docs, google or so.
Still, their performance is in line with what you’d expect from someone in their position.
Would getting them to try emacs, vi, linters, etc be micromanaging them? Do you think they’d perform better with them? They are performing in line with expectations for the role, so why bother with something you think would make them more efficient?
I’ve made this obviously over the top, and can hear already replies from other bemoaning my comparison while missing the point — tools do matter and if you genuinely believe that a developer could be more efficient working in a different way it makes sense to not only want them to try it but to actively fund that change. Hell, this is literally what we argue for in training! Spend money to make someone better at their job!
If you think AI tools make you worse or don’t and can’t help, then that’s one thing. But it makes sense for management if they think it might to spend money on it and to get you to try.
Not only this, but wasn’t everyone here shouting about how tokens were subsidised and it couldn’t last? If so, wasn’t the first half of this year a really excellent cheap time to do the maxxing?
It's funny, because editor choice is also an analogy I use, to argue for the exact opposite conclusion.
Your hypothetical developer wouldn't be using notepad because they're unaware of other editors, they'd be using it because they evaluated other editors and concluded that, for whatever reason, they would be worse for them. I'd be fascinated to hear why they came to that conclusion, but I'm not going to tell them they're wrong if they're performing acceptably, aren't constantly breaking CI because the linter rejects their code, etc. Everyone is different, and I'm not narcissistic enough to think the fact that I would be way less productive without my modal editor, LSP, linter, terminal multiplexer, etc. justifies forcing everyone else has to adopt my exact setup.
Did they see productivity gains that they're now calibrating for? Why have these productivity gains not been reflected externally in any measurable way?
This is probably the most charitable explanation humanly possible.
Surely for this specific example of managerial stupidity it just is, but I mean more generally, it's a beautiful posting.
I aspire to have this much misplaced belief in any humans at all, let alone CEOs.
At my company, this was the explicitly stated and shared goal from management.
"We can't know all the parts of our business that AI can do a good job automating [because it's so new] but we also don't want to be the last to know and outcompeted along the way. Please throw AI at random parts of your job [and we're tracking this] so we can generate feedback from employees on where to invest in additional automation"
My company has since provided a ton of high-value little AI workflows, alongside a handful that didn't pan out. AI-assisted software development is a major change overall, but the general business-process updates from AI are a net-positive to me.
I doubt the author of the article even believes it. The article is an ad for their services which just happen to be "helping" companies use AI.
lines of code produced. similar dumb metric.
So this is the narrative now? Come on.
It's simpler. Management felt like employees weren't leveraging AI fast enough. They chose to measure "AI leveraging" in the easiest way they could: how many tokens each employee was using. Goodhart's Law ("When a measure becomes a target, it ceases to be a good measure") immediately triggered.
This is obviously wrong. Management has never cared about how engineers do their job. There's never been a push for any other productivity boosting technology: better languages, better editors, automated refactorings, paid code intelligence tools, etc. But suddenly AI comes along and the CEO says "all developers need to write code entirely with LLMs".
This is absolute nonsense. Management in many places cares enormously about productivity. What’s been a bit different here is a huge claim of improvement other companies are seeing (so you’re going to be left behind), alongside some developers going off and doing this anyway sending proprietary code hither and thither, alongside some devs railing against the very concept of using it. It’s also a wildly powerful tool and how to use it hasn’t been as clear (where does it provide value, where does it not, what can and can’t it do) so experimentation is really important.
I did not say that management doesn't care about productivity. To be clear though, I did mean to say upper management.
> Now they know what’s possible and what’s not.
They really don’t IMO. Hell most of the companies pushing these tools don’t even agree what LLMs are for or are capable of. Too many people are trying to use it to cut too many corners on their work (making more work for everyone else) or are using it to attempt things they don’t know how to do, which means they are incapable of vetting the results, (vibe coding anyone?) which means more instances of the first case or even getting hurt.
Were there cheaper and humane ways to get more employees to use AI? (yes). Did many people JUST burn tokens (goodbart law)? (Yes). Would people revert back to the mean? (I think yes). Do many professionals hate AI because of this push? (Yes). Was the org net productive?
I wish there was an independent body truly assessing the impact of big tech decisions and running counterfactuals. Instead of accepting nice stories like this as a given.
Using Claude, I recently tried to do something similar for the Covid hiring spree: https://claude.ai/public/artifacts/21bba86a-ad5d-439c-861d-0...
Interesting take - upvoted you. I'm not convinced it's been the optimal management strategy, but you're succinctly explaining what they have done, not what they should have done, and in that sense you have a good point.
Still leaves huge questions about ROI ($26tln of TAM, anyone???) and doesn't quell the concerns brought forward by AI detractors though.
I was one of those mentioning the death of tokenmaxxing (https://www.ibm.com/think/insights/tokenmaxxing-dead-long-li...)
I would say tokenmaxxing = spending without limits or care about results (and assuming results). The term as it is right now, at least.
When it comes to "using tokens overall", then open source models change the equation and in that case, we will enter a phase of 'maximizing AI usage...but for near-zero marginal increase in costs with increased usage'. But even then, the convo would shift to platform engineering...which would then ask 'what value are we getting out of this?'
OR - cloud model economics change over time and we use cloud models as happily and cost effectively as we do cloud storage now. But hard to say when that comes.
Open to thoughts, though.
This is like hell, if hell was being stuck on a really poorly-maintained uncomfortable rollercoaster forever.
Better title more in line with the content of the article would have been: The reports of tokenmaxxing’s death are greatly exaggerated.
Pet peeve of mine is nonsensical usage of the x is dead, long live x.
The long live x is a lazy meme that draws attention that posters can use to skip thinking of an actual appropriate title.
that is a better title! Added it as a subheader
Brute forcing positive outcomes by spending more tokens until a happy path manifests does not solve the underlying comprehension (and liability) problem.
I fear a world where critical software is stood up with increasingly non-human governed abstraction because it [seems like it] works.
Software engineers as the review terminal in a conveyor of business-led code mass production... coming to a company near you?
You're right, but you'd be lucky if a real human actually reviews any code. At my company, merging a PR still requires 2 humans to press "Approve" but I've been instructed that I don't need to read the PR, I only need to click "Approve". This is what 30 years of SWE experience is being used for now.
What is meant by a "loop" here? Just repeating the same prompt until you get the desired result? Are subsequent repetitions too close to each other?
> Just repeating the same prompt until you get the desired result?
Not necessarily the desired result, but until it's 'done', where the LLM itself is the judge on if the is the case according to the given criteria (often just an updated todo-list). One of those extremely simple 'harnesses' (if you can even call it that) was even named the 'Ralph Wiggum Loop' [1] to allude to the braindead-but-persistent tokenmaxxing it results in.
[1] https://awesomeclaude.ai/ralph-wiggum
What I have been doing seems a bit different to what's described, but I always make sure to define how to know the task is done so the agent doesn't quit early. Usually this means telling it to to run the tests and type checks to ensure it runs without errors.
Otherwise they often do a first pass looks good enough but it doesn't actually work.
Loop "engineering" has now become a thing now apparently (a la prompt "engineering") https://github.com/topics/loop-engineering
> Just repeating the same prompt ...
If you were tokenmaxxing you would understand.
Or if you were ever working with an approximation / search / optimization (really they're the same thing) algorithm that iteratively converges on a solution...
This seems to happen with most big tech adoption in the first few years. The big data boom in the early 2010's had execs just buying up spark clusters and data lakes before they even had a clear analytical use case or governance.
Studies have proved that you'd have been better off with fartmaxxing.
I have no doubt those studies are run by some smart fellers.
>I’ve basically never heard a business leader say that they were going to set a bunch of money on fire because it made them feel good.
Really? ~4 years ago our CEO hired a consultant to fly out several times to do team building exercises. We can't afford to do our 3-year server refresh cycle, but the consultant was no problem to pay.
We just recently had branding consultants come in and also spent thousands of dollars (AWS charges) on rebranding all our photos. We operate in a captive market, if you want to operate in our market you are required to subscribe to our service, and if you aren't in our market you can't subscribe. Branding at the end of the day drives 0 sales.
Heck, reminds me of the time a company I was working with hired a new CTO and one of the first things he did was as "server renaming scheme" using obscure (to the US-centric staff) city names from around the world (database servers are Swiss city names, web servers are Denmark, storage is Finland). We went from cattle naming to pet naming, for a CTO that lasted ~6 months.
In my experience company leadership is not quite as thrifty as this article likes to think they are.
To be fair leaders usually don't say that, they say a whole lot of nothing that means "We're gonna set money on fire because it makes me feel good."
Or more accurately, "Because this is good for my career."
> database servers are Swiss city names, web servers are Denmark, storage is Finland
consider me officially triggered
why name your servers db-us-east-2 and web-de-stuttgart-3 when they could be called grindelwald and silkeborg?
I'm also taken aback with how naive folks are about companies, they really seem to have bought the whole "capitalism is efficient" maxim hook, line, and sinker.
I really struggle to imagine how anyone in a corporate environment has managed to never run into obvious examples of waste like you describe (overpaid consultants and mandatory budgets are classic examples). Office Space came out 27 years ago and has a plotline making fun of overpaid "efficiency consultants" whose only job is to tell management to fire people.
Narratives are the most ungodly effective thing known to mankind, is the issue.
> "capitalism is efficient"
The precondition for that is competition. If some company has idiot managers that waste resources on idiotic things, they're supposed to be wiped out by the companies that are actually smart.
Capitalism requires constant evolutionary pressure and a sort of government directed corporation level eugenics program to constantly apply that pressure in order to function properly. Without that, it's just distributed fascism.
i think tokenmaxxing mostly comes from cloud pricing. once you're paying by the token, you naturally start caring about token counts. with local inference i barely think about it anymore.
This is little more than an ad for their services.
Tokenmaxxing was never a thing to begin with. Just because a few companies did it doesn't mean it was a widespread phenomenon.
Agreed. There is way too much noise made out of this from a handful of companies.
The issue is the companies doing it could spend billions on tokens and they have. I for one know that there are multiple Big Tech Fortune 500 companies that have burnt over 1B in tokens in a single quarter.
This is purely for coding and analogues.
> Tokenmaxxing was never a thing to begin with.
Anecdote, I thought so too until the company I work just instated this where you have spend from 35-60K within 6 months. Insanity
Maxxing is just a catchy and imprecise name.
In my current company nobody forces you to use more tokens, but you're encouraged to write a 300 lines markdown skill.md which takes 8 minutes and costs 5 bucks to execute. That, instead of writing a 200 lines bash script doing all the same thing, but in a deterministic fashion, completing in under 5 seconds and costing 0 if you're not careful with rounding.
At least it's being used. There are many examples of tech over-adoption, like building out capacity for 1M concurrent users, only to see 50.
Or like Meta spending $90 billion on "the metaverse" only to see 300,000 users at its peak.
That comes out to spending $300,000 per user.
Without a doubt this too will be overbuilt. At least we’ll have cheap second hand DC gpus in however many years.
It's AI usage mandates now, but rather than focusing on how the current hot topic has ripped through the business world, often without benefit nor repercussions at leadership, I'd prefer to analyze the higher pattern. We've recently experienced such ripples as the metaverse, blockchain/nft/web3, 'the cloud' (and a minor wave of cloud gaming). There was even a teacup buzz of 'apis', oddly disconnected from the semantic web.
Why do such fever dreams occur at all? Are they getting more prevalent? More damaging? Do they jepaordize the global economy? Should they be regulated in some fashion?
I can't prove my case, but I think it's a symptom of media manipulation/consolidation, the 'fiduciary duty' delusion, and that shareholders can hold the puppet strings tighter than they used to. More and more, they place their sillytown bets and expect the plebs to dance to them.
The dominance of finance capital over industrial capital reaching its absurd conclusion. NFT mania was only possible because we don't make anything here, no one has a serious plan to reshore and start making things here again, and we can't indefinitely maintain control of production we've exported to the 3rd world indefinitely. So you might as well play these symbolic games and increase your slice of the pie while the music is still playing.
The thing that most disconcerts me isn't the runtime pruning, it's the cold loading. Months ago, I added a few skills and MCPs to test them, partly in the frenzy of free shopping, but then I forgot about half of them.
So after I got tired of choosing by hand, and therefore also a bit blindly, I created a small tool that runs locally and analyzes conversations to tell you which skills, MCPs, or other things are always unused.
347 items never used · ~19354 dead tokens/session · ~$25.49/month A lot of ECC that I never used but always loaded.
If anyone's interested, I've put it on GitHub, thousandflowers/skillreaper.
> Compounding correctness flips the calculus. If more token spend leads to better outcomes
Citation needed
cute easter egg on the story points showcase :)
Funny, now it's the management saying "Go be a bohemian, experiment, spend freely." and the employee saying, "Hold on, where's my ROI?"
If you are really engineering, you would really be tokenoptimizing for most quality per token.
“Thing is dead, long live thing” is dead, long live “thing is dead, long live thing.”
I do abuse this title format, guilty as charged
Phoenixing considered harmful
Would that be pheonixmaxxing or pheonixxing these days?
‘“Thing is dead, long live thing” is all you need’ considered harmful
I don’t think people who write these headlines understand that “long live the king” used to refer to the next king. Where is the next tokenmaxxing?
(its in the article, which predicts that there will be another round of tokenmaxxing with different underlying incentives)
It's actually used properly here.
This is more likely the junior camper version of "not everything that counts can be counted, and not everything that can be counted counts."
In the early days of LLMs, we saw the classic hype-driven bi-modality of opinions. Folks were in the "fake news, fad" camp, or they were in the "omg, take over the world" camp.
Those of us closer to the space, with the awareness to know that there was some truth (and a lot of misjudgment) to go around, were in the middle of nowhere. When I co-wrote some driver code with Chat GPT, other engineers (and even one of our directors) told me to keep it quiet. At the same time I had directors and VPs asking me how we could accelerate adoption. For a while, I had access to a cheat code just because I had the audacity to not ask for permission. Folks were sure I would get in trouble for spending thousands per month in LLM operation, but a handful came along for the ride, burning tokens like firewood and learning along the way.
Tokenmaxxing is probably coming from at least a few things:
1. A course-correction for the practiced frugality that kept folks from jumping in and just learning at the ragged edge.
2. A willful and deliberate recognition that the best innovations in the later phases of a disruptive introduction often come from sparks of ideation in concentrations of activity. In other words, we don't know where good is, and we need to find it. (Charitable interpretation from the article)
3. Recognition that, even if they don't know why, leaders and product owners will get punished for not jumping in and, because of bullets 1 and 2, won't get punished for trying and missing. Even if they have no idea what they're doing, they're going to fake it until they make it (or slide into another job).
This last set is where the pain lives. An organization with healthy and increasing AI tool usage will see elevated token counts, but so too will one using LLMs to rewrite wikipedia articles without the letter "m" to keep token counts high. These are pathological behaviors brought on by conflated metrics.
We had discussions about this in the early LLM days, where my old team was looking to ship new capabilities for older products. There was a lengthy VP-level discussion about getting to "80% usage" of the new system vs the old. Because the new system was a superset of the old, I eventually said "we can do that immediately, but it's a cost goal, where we're just aiming to make our business more expensive to operate, rather than a value goal for our users". We didn't adopt the target, but folks were understandably frustrated that they didn't have a straightforward way to measure and report progress.
Tokenmaxxing is, inevitably, a conflated goal, but it's what we have right now. Take advantage of the moment, learn, build, and keep an eye on levers for efficiency.
Beyond getting momentum going for a cmpany, Tokenmaxxing is lighting money on fire.
The idea of tokenmaxxing reaches different companies in different waves, so it will be discovered in waves and outgrown in waves in companies and industries in their own cycle.
In the long run, tokenmaxxing is like drunken sailor spending. Scaling is almost always about a large component of efficiency, and lighting money on fire in the street can only last so long.
Your comment implies no ROI on spent tokens. I get a lot more work done tokenmaxxing so the cost is negligible to me but YMMV. Of course there's no point in tokenmaxxing if you don't have enough work available to scale beyond yourself, or you're unable to use AI to do so.
I predict startups will continue to tokenmaxx while 40,000+ person companies will become a little more conservative.