This isn't the first time this has happened, either. I do not understand how these consultancies - who sell these "reports" for six or seven digit sums - continue to mess this up. It should be excruciatingly embarrassing for them.
I guess nobody ever got fired for paying KPMG and friends for an expensive report that supported their priors.
The problem is that there's a lot of people running around who believe the polite fictions we tell ourselves about review processes. It's very hard to explain why it doesn't work to have someone manually clean up a sloppy AI draft without discussing the fact, which many people find unacceptable, that manual review can't catch all errors.
These six-figure reports are produced by underpaid kids in their twenties working 18 hours a day.
The purpose of paying for these reports is for executives to have someone else to blame when their idea doesn't work. It has nothing to do with the correctness of the content.
> These six-figure reports are produced by underpaid kids in their twenties working 18 hours a day.
That's accurate, for the first draft. Similar to big legal firms - subsequent versions are signed-off and passed up (and if revisions request, down) the hierarchy, each stratum with its own billing rate(s).
Which makes me wonder when the hallucinations got added.
It can't have been at any of the big 4, because partners aren't skipping 4+ org-chart layers to look at draft documents written by early-career associates. I have no experience with body shops - if that's where you were.
I would have to disagree, this report in question sounds more like thought leadership dribble rather than a report commissioned by a client with a scope attached.
The purpose of most reports are absolutely for Assurance to decision makers or management and often times, we disagree with management or provide a view that might not favorable. Which just reflects the realities of what we have identified or tested.
As I said, this seems like thought leadership dribble which absolutely even as someone who has worked in Big 4, I think they're pretty average.
KPGM et al. are used as political ammo to push through internal changes. Those in power rely on consultancies underlying their decisions (painful redundancies, firings, etc.). Acknowledging that the arguments for these painful decisions was hallucinated will lead to many problems for powerful people, so for now it's best to just try and sweep it all under the rug.
The crazy thing is the level of effort to say, "have a sub agent validate all references and figures" is so low. I'm paraphrasing, but you don't need much more than that. It would have prevented 99% of the face palms.
I use this regularly for my personal financial research system. Even flagship models make mistakes. Though currently the issue is usually the model using a figure from and older report. Cross-check reduces that dramatically.
Eh.. without going into too many details, having seen some face palms at work, I realized that the anecdotes may be closer to a pattern than I would like to believe, which prompted me to start basic howtos available company-wide.
I kinda get it, without experience and trying, how are they to know ( unless they are already 'into it')? After all, corporate training is laughable at best.
This is where the absolutism of let agents to 100% of the work fails. You get adversarial agents pulling all reverences into a table, they might miss some, so run this a few times.
Then have another set of agents, with skills like web browsing (to verify that links actually exist, maybe that references and abstracts actually match, etc), have one engineer (or agent) write a small script to help with this (just make sure you test it, and a bit).
So your work is not verified until your references table is 90% green checkmarks, maybe with uncertainty figures.
A human can then verify the ones with under 90% certainty.
This alone gets you a long way there. Does not costs the millions they're being paid.
It's quite interesting that these companies marketed themselves as them best of the best in excellence, accept no mistakes. I can imagine the countless keynotes and books about this. Or the sales pitches.
Has always been a lie, they just understood how to hide it. Today they don't, and it's embarrassing.
The hallucinations here (https://gptzero.me/news/investigations-kpmg/) would have passed a cursory reference check. It's easy to see when it's laid out in a table that "BNP Paribas. AI Integration: Transforming Financial Journeys. The Banking Scene, 2025." is a false citation, because the title doesn't quite match and it wrongly attributes BNP Paribas authorship to an article written about BNP Paribas by some random Belgian guy doing business as "The Banking Scene". It'd be a lot harder to see when you're skimming through browser tab 9 of 45 and see all the key words match up.
I'm not talking about a reference check by someone other than the author.
You'd not put a reference in in the first place, that you hadn't read, since you couldn't formulate the text that relates to the reference?
This isn't the first time this has happened, either. I do not understand how these consultancies - who sell these "reports" for six or seven digit sums - continue to mess this up. It should be excruciatingly embarrassing for them.
I guess nobody ever got fired for paying KPMG and friends for an expensive report that supported their priors.
The problem is that there's a lot of people running around who believe the polite fictions we tell ourselves about review processes. It's very hard to explain why it doesn't work to have someone manually clean up a sloppy AI draft without discussing the fact, which many people find unacceptable, that manual review can't catch all errors.
These six-figure reports are produced by underpaid kids in their twenties working 18 hours a day.
The purpose of paying for these reports is for executives to have someone else to blame when their idea doesn't work. It has nothing to do with the correctness of the content.
This is absolutely correct in my experience. It's solely finger pointing insurance.
> These six-figure reports are produced by underpaid kids in their twenties working 18 hours a day.
That's accurate, for the first draft. Similar to big legal firms - subsequent versions are signed-off and passed up (and if revisions request, down) the hierarchy, each stratum with its own billing rate(s).
Which makes me wonder when the hallucinations got added.
Not where I used to work. Any "sign off" was some director making sure the letterhead looked right.
> Not where I used to work
It can't have been at any of the big 4, because partners aren't skipping 4+ org-chart layers to look at draft documents written by early-career associates. I have no experience with body shops - if that's where you were.
I would have to disagree, this report in question sounds more like thought leadership dribble rather than a report commissioned by a client with a scope attached.
The purpose of most reports are absolutely for Assurance to decision makers or management and often times, we disagree with management or provide a view that might not favorable. Which just reflects the realities of what we have identified or tested.
As I said, this seems like thought leadership dribble which absolutely even as someone who has worked in Big 4, I think they're pretty average.
KPGM et al. are used as political ammo to push through internal changes. Those in power rely on consultancies underlying their decisions (painful redundancies, firings, etc.). Acknowledging that the arguments for these painful decisions was hallucinated will lead to many problems for powerful people, so for now it's best to just try and sweep it all under the rug.
Isn't gmthat exacly how the EU ia tring to regulate AI? Like don't use it for firing people, credit scores etc.
> I guess nobody ever got fired for paying KPMG and friends for an expensive report that supported their priors.
I mean basically. KMPG is a regulatory checkmark in some industries
> Professional services firm KPMG has pulled a report titled, “Redefining excellence in the age of agentic AI,”
Well they were true to their word about demonstrating a new and increasingly relevant definition of "excellence."
Reminds me of this earlier Deloitte incident: https://fortune.com/2025/10/07/deloitte-ai-australia-governm...
Gartner is going to have to pull a loooot of reports over the years
AI hallucinations could only make Gartner more reliable.
Go, GPTZero!
[dupe] https://news.ycombinator.com/item?id=48515733
The register article is better.
Every once in awhile, someone utters a truly unique statement.
The crazy thing is the level of effort to say, "have a sub agent validate all references and figures" is so low. I'm paraphrasing, but you don't need much more than that. It would have prevented 99% of the face palms.
I use this regularly for my personal financial research system. Even flagship models make mistakes. Though currently the issue is usually the model using a figure from and older report. Cross-check reduces that dramatically.
Eh.. without going into too many details, having seen some face palms at work, I realized that the anecdotes may be closer to a pattern than I would like to believe, which prompted me to start basic howtos available company-wide.
I kinda get it, without experience and trying, how are they to know ( unless they are already 'into it')? After all, corporate training is laughable at best.
dont be so sure they didnt. they can go back and forth hallucinating with each other
This is where the absolutism of let agents to 100% of the work fails. You get adversarial agents pulling all reverences into a table, they might miss some, so run this a few times.
Then have another set of agents, with skills like web browsing (to verify that links actually exist, maybe that references and abstracts actually match, etc), have one engineer (or agent) write a small script to help with this (just make sure you test it, and a bit).
So your work is not verified until your references table is 90% green checkmarks, maybe with uncertainty figures.
A human can then verify the ones with under 90% certainty.
This alone gets you a long way there. Does not costs the millions they're being paid.
It's quite interesting that these companies marketed themselves as them best of the best in excellence, accept no mistakes. I can imagine the countless keynotes and books about this. Or the sales pitches.
Has always been a lie, they just understood how to hide it. Today they don't, and it's embarrassing.
> A human can then verify the ones with under 90% certainty.
How about the author actually reads the finished report a couple of times and checks all the references?
It really is the lowest bar - even lower maybe than running a spell check.
> How about the author actually reads the finished report a couple of times and checks all the references?
But then you wouldn't be embracing the new agentic ways of working!
The hallucinations here (https://gptzero.me/news/investigations-kpmg/) would have passed a cursory reference check. It's easy to see when it's laid out in a table that "BNP Paribas. AI Integration: Transforming Financial Journeys. The Banking Scene, 2025." is a false citation, because the title doesn't quite match and it wrongly attributes BNP Paribas authorship to an article written about BNP Paribas by some random Belgian guy doing business as "The Banking Scene". It'd be a lot harder to see when you're skimming through browser tab 9 of 45 and see all the key words match up.
I'm not talking about a reference check by someone other than the author. You'd not put a reference in in the first place, that you hadn't read, since you couldn't formulate the text that relates to the reference?
Ed: thanks for the link - I hadn't seen that yet.
How about the author actually, y'know
authors
the report?
Thinking that such prompt will cause the report to be factual is root issue. No it wont, no it is not enough.
This hype cycle is unique in that the tech writes its own hype.
The amount of pro-AI AI bots on chat forums is insane.
KPMG got called out only now for bullshit and hallucinations?