As an English-as-second-language speaker and writer, one thing Grok really shines at is capturing the tone and level of "formality" of a piece of text and the replicating it correctly. It seems to understand the little human subtleties of language in a way the other major providers don't. Chatgpt goes overly stiff and formal sounding, or ends up in a weird "aye guvnor" type informal language (Claude is sometimes better but not always).
Grok seems in general better at being "human" in ways that are hard to define: for eg. if I ask it "does this message roughly convey things correctly, to the level it can given this length", it will likely answer like a human would (either a yes or a change suggestion that sticks to the tone and length), while Chatgpt would write a dissertation on the message that still doesn't clear anything up.
Recently I've noticed that Grok seems to have gotten really good at dictation too (that feature where you click the mic to ask it something). Chatgpt has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct.
I've also noticed that when I communicate with Grok in my native language, its tone is more natural than other models. I think this is due to the advantage of being trained on a large amount of Twitter data. However, as Twitter contains more and more AI-generated content now, I'm afraid continued training will make it less natural.
I'm sure Twitter knows which are the bot accounts and is surely excluding them from their model training. Twitter bots aren't a new phenomenon after all.
There is bots everywhere, it has nothing to do with the platform, it has to do with attackers having an incentive to do mass account farming, no platform is secure against it.
Yes your individual feed isn't really relevant if we talk about the masses, Reddit accounts are for sale quite cheap, HN as well, X too and so-on, it's literally just a matter of means/methodology. If I want today to do 1000 random posts talking about a certain thing, I could.
Super easy, just make a web-of-trust type of thing: messages are only visible to those who already vouched for you. Otherwise, you pay $0.01/per message/per user reached.
I don't think Twitter/X know for sure who the bots are, since Elon has been pretty vocal about trying to stop them for ages, yet I still get lots of spam DMs (as do others with far fewer followers/reach).
Even if 95% of the spam gets actively reported and dealt with, that still leaves a ton of nonsense on the platform, getting fed into the LLM. And spam has only gotten worse over the years, as the barrier to entry has lowered and lowered.
I'd have guessed that at least some of the bots are Twitter itself, trying to draw you in with some sense of engagement. Given that Musk is the owner, and everything we know about him and have seen him do, I'd not be surprised if some of the MAGA bots are his too.
Are the spam DMs advertisements or more generally something linked to a product or service? I wouldn't be surprised if X is more lenient towards bots that pay them for adverts.
Most of what I get seem to be advertisements or automated messages if you follow large(r) accounts.
One of the most interesting things that I've noticed is these advertisements will be triggered if you follow accounts that are positioned as influencers. I followed one out of curiosity and received a DM from that account advertising some cryptocurrency service.
It's a good way to filter out and block accounts that have almost certainly not grown organically.
There was already evidence last year[1] that pointed to ChatGPT-specific words like "meticulous," "delve," etc becoming more frequently used than they were previously. The linked study used audio of academic talks and podcasts to determine this.
Part of me wanted to object to those two examples, which I’ve used frequently since the reaching adulthood in the 80s. Another part of me has been triggered by an apparent uptick in the word “crisp”, which my gut takes as an coding-LLM tell.
Opus 4.7 loves to use the word “substrate” whenever it gets the chance, it’s a really weird tic. How do these models end up this these sorts of behaviors?
I've seen this expressed as a concern even from one of my colleagues. My retort was:
"English is not my native language and LLMs taught me quite a few very useful formalisms that do land well for people and they change their attitude towards you to be more respectful afterwards. It also showed me how to frame and reframe certain arguments. I agree sounding like an LLM is kind of sad but I am getting a lot of educational value -- and with time I'll sneak my own voice back in these newly learned idioms and ways to talk."
Since you seem interested in the ins and outs of English, I want to say that "retort" has a connotation of anger or sharpness. Your response reads more like a "rebuttal" to me.
This is not a correction; maybe retort is what you meant and I'm not trying to be the English police. I just like discussing the intricacies of language :)
Like most of all widely spoken languages, there's a lot of regional variation in English. There's even a bunch of quizzes online where you answer 20 questions about phrasings, and they can tell you where you're from with a disconcertingly high degree of accuracy.
In my experience a "retort" is sharp or witty, but certainly not angry, whereas the word "rebuttal" is itself essentially antagonistic. You might use it when referring to something or someone that you look down upon, whereas a more neutral term would simply be "response."
Just as I was reading your comment I remembered that Samuel Jackson used "retort" in his speech in the "Pulp Fiction" movie and was wondering whether he was openly antagonistic there (I mean, he killed a bunch of guys with a pistol shortly afterwards but still) or was it a witticism.
I admit I am lost on these nuances and I usually kind of use whatever idiom comes to mind, which yes, likely would net me some weird looks depending on where I am geographically.
Just personally I tend to regard retort as short and reactive while rebuttal as a longer and more considered disagreement. A retort could be defensive and wrong or it could be sharp and insightful - it doesn't imply one or the other. A rebuttal is mostly an attempt to correct something while a retort doesn't need to be a correction (although it could).
Even something like "piss off!" could be a retort, but usually never a rebuttal :)
So human language will improve and become more precise? I'm all for it, especially if we get more emojis in speech! Why is that sadly? Humans will learn to imitate their more intelligent betters.
I know it's just an evaluation, but seeing an informal message and a prompt to ask to rewrite this informal message to the tone of an "informal message" when the original one sounds just fine, just makes me sad... Not because of this evaluation, but because it reminds me that this is how some people use LLMs, basically asking it to remove your own voice from texts that are generally fine already.
My sister in law is a pharmacist and the heaviest non-dev ChatGPT user I know and her main use case is writing professionally polite messages to doctors on how the drugs they prescribed to a patient would have killed them had she not caught a particular interaction or common side effect.
There's a lot of "tone" in it as she's not trying to anger these folks, but also it's quite serious, but also there's just everything else happening in medicine.
Pretty neat. This kind of tone self-moderation comes naturally to good communicators, but I know people (on and off the spectrum) who really, really need help with this, and it's cool to see LLMs are able to do this. There are a surprising number of people in the business world who are just totally unable to tone-police themselves. In the medical field I'd be worried about hallucinations, of course, but presumably your SIL fact-checks the output.
That makes it more sad, to me. Someone with those credentials should be able to communicate with their colleagues effectively. I wonder if she used to be able to.
It appears Hacker News disagrees that social skills are valuable skills. Mea culpa, I should have guessed.
There's something ironic about complaining about other people's social skill while you couldn't be bothered to make a point without sounding dismissive and condescending.
All three did well, and while I'm a Claude user, I found the Opus reply here added some unnecessary detail, like "Impact: Minimal; no downstream dependencies are currently at risk". Downstream dependencies weren't mentioned in the original message; for all we know downstream could be relying on a poorly performing API and is impacted by waiting another week for replacement.
All of these were frankly terrible. I guess Grok’s “informal” version sounded the most like a real human, but only because it reads exactly like an Elon tweet (including his favorite emoji!). It’s obvious what they’ve been training on.
This is the most basic level of eval, of whether they can produce output that will be considered by someone somewhere (usually a young urban US American) as informal toned. Real human communication is far more nuanced than this, different groups have different linguistic registers they're used to and things outside it sound odd even if they can't articulate why. You could also want to be informal but not over-familiar with the other person (for eg. in a discord chat to a new acquaintance) - actually looking at the outputs here, the Claude output seems best fitting for that (in my subjective view anyway) than to the one you gave it - or want many other little variations.
What makes one cringe and another recognize as familiar and comfortable is also pretty subtle and hard to define. These things need nuanced descriptions and examples to actually get right, and it's in understanding those nuances and figuring out the register of the examples that Grok outshines the others.
Thanks from where I'm looking Grok 4.3 and Claude 4.7 do a better job on the informal close friend/coworker vibe.
ChatGPT sounds fake / formal phrasing (for the specific close friend context) and has em-dashes and uses capitalization. Hence, ChatGPT does not, imo grok the assignment ;)
Is it me or did GPT get noticeably more natural in word choice recently? You can see it between 4.1 and 5.5 here, but I'm not sure when that happened. (My guess would be one of the recent 5.x releases.)
Edit: I meant specifically the absence of bizarre phrasing. That seems to have improved.
Wow, I'm surprised. Grok 4.3 actually is noticeably better than the other two for the close-friend variant. Surprisingly I found Claude the cringiest of the three!
Claude 4.7 is the clear winner to me for manager and formal report updates.
As an ex-senior exec (hundreds of staff), the bolded timeline impact is a particular nuance that I would expect a Lead/Director to format for a VP+ audience. Interesting none of the other models did that. My eyes immediately went to impact statement, then worked back to context to grasp the whole situation.
Seeing this makes me wonder if Grok uses Claude conversations for training.
It's otherwise kind of surprising that they both converge on very similar phrases (e.g. "API integration is kicking my ass") that aren't anywhere in the prompt.
This is more of a user preference. When I want to be informed my default is that chat bots should imitate the tone of Wikipedia. Not informal, but somewhat academic and in-depth. I don’t like it when chat bots explain things like an average human without pedagogical training: meandering, in the wrong order, and often having to repeat themselves.
I only use Grok through the "Gork" personality in the Tesla, but find its responses to be very realistic, often genuinely funny, and occasionally useful.
anecdata: The responses of grok on X in my language are really good. the tone, sarcasm, level of "vulgarity" in response is so accurate that it seem its written by human
A friend of mine uses it for D&D prep and has told me that it's good for that in particular because of its ability to match the flavor/style that he's going for. He prefers ChatGPT for everything else.
Grok is my favorite model for chatting, and my favorite voice mode. It seems to be the only voice mode that isn't routing to a extremely cheap model (like Haiku), and has been the highest quality out of all the frontier ones. When you subscribe to SuperGrok you can also create a "council" of agents, each with their own system prompt and when you ask something, they will all get asked in parallel to come to a conclusion. Good stuff!
Just wish they would finally put some work into their apps, it's the only thing keeping me from actually subscribing to SuperGrok:
- No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
- Projects are still not available in the app so as soon as you move something into a project, it's gone from all the native apps
- No way to add artifacts (like generated markdown docs) directly to a project, we have to export to PDF/markdown and re-import. And there isn't even a way to export artifacts. This makes serious project work hard because we can't dynamically evolve projects with new information
- No memory, no ability to look up other chats, each chat is completely new
- No voice mode in projects at all
If someone from xAI is reading this, please consider adding some of these.
When I signed up, I accidently paid for a full year. So from time to time, I'll throw it something just to see what it produces compared to the other LLMs. And, even after all this time, it still feels like a really "dumb" model compared to the other frontier ones. But, worse, many of my system prompts make it go wacky and puke jibberish. However it was pretty cool for those couple months awhile back when it was uncensored. You could ask it about a wild conspiracy, and it would actually build the case and link you to legitimite source material. They dropped the hammer down on that real quick.
Ah yes the psychosis reinforcement vertical. It's such a lucrative market for those schizophrenics and bipolars. Great way to get lots of engagement. Groks portfolio is so diverse
I have a schizophrenic relative who is in such a relationship with grok. Instead of telling hen you need to take your meds, it says hen is the smartest person in the world
I'm so sorry your family is suffering from this. I hope you can find a way to bring them back. Disorders featuring psychosis are so painful for everyone around them. Blessings to you and your family
I love how you guys downvote all the old comments to make them hidden from search. My no-name account rarely gets downvoted. But, within 20 minutes of posting this, I drop 10 points. Rando accounts
I upvoted your first comment because it was insightful, interesting, and added to the conversation. I downvoted this one because complaining about downvotes is largely considered to be in bad taste and doesn’t really help anything. I did both of these things before I realized you were the same person.
Yes, for sure I deserve downvotes for the above. Those types of comments should be downvoted. However, I needed to post it to point out that I got the -10 well before the comment above. I never experienced that before and thought it interesting enough to share. Karma doesn't mean anything to me personally. But burst behavior like that is unusual.
Except that it pointed at original sources, like reference manuals, archival documents, published newspaper articles, magazine articles, etc. - a lot still available on archive.org. Good try with your 16 day old account. And, why would anyone trust NPR at this point? Get real, bud. Most people with any curiousity know all about the ADL, JStreet, AIPAC, Greater Israel, Mossad / CIA, Chabad networks, Epstein, drones, weapons programs, cryptocurrencies, etc. etc. etc. - but, don't worry they're all safe with papa Ellison.
Actually it's funny you mention Bill Hicks. I didn't even know who he was. Or Alex Jones. That claim was one of the more absurd ones I discovered. But, given everything else I learned over the past year, who f'n knows at this point.
"We have improved @Grok significantly," Elon Musk wrote on X last Friday about his platform's integrated artificial intelligence chatbot. "You should notice a difference when you ask Grok questions."
Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself "MechaHitler."...
I also think Grok would benefit from allowing usage of "SuperGrok Heavy" (their $300 plan) in coding harnesses with included usage. Currently they give you some API credits on the Heavy plan so you can use some Grok for coding, but $300 USD value is just not there.
Not saying they should create their own grok-code harness, just allowing usage in existing ones would already be beneficial. But that's probably what the Cursor acquisition is going to do eventually
> No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
Grok has tool use, no? Why would you also need MCP? What does MCP add?
I'm talking about the consumer Grok app and grok.com website. There currently are not connected apps (or MCP) at all, so while Grok can use tools, there is no way to add tools to it
I'd agree on the voice transcription; it seems so much more accurate than the other frontier models I've used. I often speak to Grok and paste the transcribed output to Claude!
If someone from Grok is reading, don't waste time on these chaff features. The market will eventually deliver better 3rd party solutions to all of these things. There is an audience that isn't interested in these walled garden features and are only interested on intelligence per dollar.
Aren't they 'wasting' time on these features exactly because the engineering requires a different, more traditional skillset from the ML work model people do, and can be done in parallel?
Lol I wonder when Anthropic discussed the idea of Claude Code internally, were there bozos saying "3rd parties will eventually deliver this so we shouldn't waste time one it."
Power users are hotswapping these models into their own agents (hermes, openclaw, etc) which have their own systems for project management, memory, interacting with tools, etc. The important metric is intelligence per dollar. Can I drop this model into my harness and have it be cheaper without losing intelligence. That is where the puck is heading.
Personally, my work doesn’t want to get locked into a single LLM provider so we use Cursor. Much easier to fight the big corp software approval battle once then switch around the LLMs to the new hotness (provided legal has the requisite data sharing agreements in place, we’re not supposed to use Chinese models or Grok) but I can switch between Anthropic and OpenAI models at will.
What are good harnesses? I haven't yet been able to get good agent teaming approaches out of other harnesses yet, before that feature I mostly regarded the space as competitive, but until another harness can do as well with Claude models it seems like it's better for now?
The Gemini app voice mode uses one of their more recent models (and not some gimped small one), and is very capable. The personality is also fine, much more natural than the Gemini web chat, with my only complaint being it's insistence on suggesting a "next step" which seems to he something that they all do.
I'm not sure if the "next step" is just to drive cost up for you (but makes no sense for free version), or because they are all failing to learn more natural conversational patterns and distinguish questions that are begging for a quick answer and shut up as opposed to a longer exploratory conversation where next step may have some value, although it would be nice if these models would follow an instruction to NOT do it!
I think the "next step" instruction is more about engagement than cost, basically giving the user some options to continue the chat. I always have had success by ending the prompt with "only reply with nothing else but the answer to the query in a precise way". This usually always works better than telling it to not ask leading questions etc but a straight up expectation of the answer format you need is an instruction that most models can follow imo
I find that asking Gemini "just the answer, no follow up" etc works at best for one or two conversational turns, sometimes none!
The problem seems to be the way it in effect overweights the system prompt vs user input, so it quickly ignores things like this that conflict with the system prompt.
This is kind of a case of the bitter lesson - the conversational patterns of these models would be much more natural if they just let it learn them, and respond in a context appropriate way, rather than this crude system prompt way of forcing it to respond in the same way always, regardless of input or of how much the user tells it to shut up!
The “next step” is in the system prompt, not the model. Gemini leaked part of its system prompt to me a few days ago, and there was something in there encouraging it to ask the user what they wanted to do next at the end of its response. Something about “give the user 1 or 2 options for follow up”.
I honestly find it rather annoying, but Gemini has stopped doing it to me for the most part, so maybe they’re trying out a new system prompt.
Starting to like the lack of memory. Claude remembers I have a grill and will interject in conversations about how maybe this thing would go well with BBQ when it's unrelated or just also about food.
Gemini thinks my name is my brother in law's name, and despite explicitly telling it that's not my name + digging through the settings, it still amusingly calls me the wrong name.
I'm a network engineer and Claude loves to make analogies to network routing protocols and such. They are often very creative. You can actually edit the profile Claude makes of you. It can be very funny to say you are a professional clown or mime or something equally odd. I wonder what analogies it would create for horse semen extractor?
This is so obnoxious. I ended up deleting all the memory from Gemini because it ended every response with, "As an engineer, father of X, you'll love this because...". As if I want my occupation and the number of children I have to be relevant to which lawn mower I buy.
I have that disabled. I tend to use different chats as the LLM equivalent of private browsing, so I like it to not have memory transferred between them.
Haha I recently asked Gemini for a product comparison for USB-C GaN chargers and it randomly inserted "as a Software Developer at $COMPANY working remotely, you may find the 100W fast charging useful when using your company laptop while travelling."
Like, thanks, really useful stuff (and definitely worth the creepy vibes to include that).
Grok 4.3 is a unique model in our tests. It's one of the fastest models, and its responses are far smaller/token dense than other models with comparable performance.
However, its overall coding reasoning ability is not competitive with the big April releases, and neither Grok 4.20 nor Grok 4.3 have been able to significantly push the intelligence frontier since Grok 4. Grok 4.3 is better in agentic workloads, and a fair analogy would be that it's capabilities are approximately GPT 5.1 / Gemini 3 Pro Preview level, but much faster and cheaper. So definitely a solid release in its own ways. Many of the recent open weights releases are smarter, but slower.
Any possibility that there could be a compromise in making it work seemingly well (benchmarks around this?) with post-knowledge-cutoff information, which appears to be their primary use case for it?
All models are moving towards more frequent and more efficient tool use, which should close the gap on post-knowledge cutoff problems. The only tradeoff I see is speed, and Grok 4.3 is currently taking the fast side of that tradeoff.
Pro is smarter in one-shot problems, but it struggles with custom tooling, and spends too much time trying to figure out our harness. We ran a lot of samples, so I can't make excuses for the model. Flash is truly the better option overall, especially considering speed and cost.
Grok has become my go to search engine lately. I think it’s the only AI with access to x posts and beyond that it seems to generally be more “searchy” than other LLM’s.
Grok and Gemini are the ones I tend to use for finding news related to breaking events. Both were really nice during the Iran incident when I wanted to find out things as they were being reported.
So, we have:
- claude for corps and gov
- codex for devs
- grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.
Grok is as progressive as any of the other models. Despite some of the highly-publicised fuck-ups, try asking Grok anything racist and see how it replies. Yes, I know you didn't try this and you won’t.
Model A advocates for single-payer healthcare, while Model B prefers for the current US healthcare system. So on that one axis, A is more progressive than B. Neither of them needs to be racist for that calculation.
Isn't grok currently holding the world record for the biggest generator of CSAM? Or did they change focus to enhance their racism and propaganda vertical? Things move so quickly these days hard to keep up!
Yes any company generating csam should not be in business as a legitimate entity. Can you send me a link from a reputable enough source where Mistral models have done this? I didn't even realize they were doing image generation.
If I send you a convo I've had with Mistral and Claude Sonnet 3.7 that say atrocious things (how to scam, and get away with it, by exploiting dating websites in Thailand, you don't even want to know the next steps trust me when it talks about the UK incorporation by the Thai itself that you brainwash first to send packages safely without customs seizing it and so on), you'll then publicly recognize that both those companies should be avoided and are promoting crime? If we have a deal and you publicly acknowledge it, I'll share you the links.
> Yes any company generating csam should not be in business as a legitimate entity.
At the same time, in this corner of the world, acting Minister for Justice (also known for trying to push through Chat Control), and NGO Save the Children, have been working to make legal the generation of CSAM for law enforcement use. So that would certainly make the industry legitimate, and you would already have a customer.
I think they key point here is "for law enforcement". That's a little different from "pay me 10 dollars and enjoy the felonies". I still don't feel good about that by the way.
> Isn't grok currently holding the world record for the biggest generator of CSAM?
I'm not sure I see how that's possible, given their image/video generation seems to be heavily censored. Do they have some alternative product besides "Imagine" or whatever it's called, that people use for generating CSAM?
Judging by https://old.reddit.com/r/grok (but I haven't validated it myself), it seems like people are complaining more about how censored the model is, than anything else, maybe that's not actually true in reality?
There are image models out there with 0 restrictions, even available on HuggingFace or CivitAI, I'm guessing those are way more widely used for things like CSAM than any centralized platform with moderation.
> Please don't validate any of this personally that would be illegal.
Obviously, I assumed we all are familiar with our local laws to not unwittingly commit crimes here :)
> I think the proportion of people generating images that way is likely very low
So probably a far cry from "holding the world record for the biggest generator of CSAM" given the amount of local alternatives available? Would be my guess at least, but obviously also hard to know for sure.
> Though I am sure it is possible.
How can you be sure of this? I've tried just now to get Grok to generate even sexually explicit material with adults, and it's unable to, all of the requests are getting moderated and censored. Are you claiming that instead of prompting "A man and a woman having sex" you put "A man and a child having sex" and then the moderation doesn't censor it? Somehow I find that hard to believe, but as you say, I'm not gonna test that either, so I guess we'll never know for sure.
I have no idea what people are doing to get it to generate illegal content. I only know there are thousands of cases of it via articles about it. I have not, and will not use grok as a product.
> I have no idea what people are doing to get it to generate illegal content.
Isn't that relevant to somehow know those things before you say stuff like "I am sure it is possible"? Seems bit strange to first confidently claim you know something then saying you actually have no idea.
Not doubting that it used to be true, that people could generate CSAM, I just don't see how it's possible today, because it seems heavily censored for any explicit/adult content.
100% agree. Grok may or may not be biased one way or the other as far as the US is concerned but from the rest of the world perspective it's mostly the same as any other model trained on Wikipedia.
Or you should do your research and see that X built a datacenter that needed so much power so quickly they started using gas generators to power it. These emissions have destroyed a town of mostly poor black people. COPD, asthma, and other respiratory illnesses. AI foot print is already bad, I don't need to kill poor black people to use one.
And before anyone gives me some whataboutism, if there are other examples of other companies doing this, educate us.
What is pathetic is saying "we shouldn't care about killing poor people". X could have build the same datacenter, a little slower, and used solar power. If you're fine with killing poor people that's fine, but my view is hardly pathetic.
I didn't bring it into everything. I brought up the fact that the X datacenter in Tennessee is killing people, predominately poor black people. Thats the facts. I'm sorry that upsets you, and apparently this entire site for some reason.
MechaHitler was the result of a single line prompt change that was publicly available on Github, they reverted it pretty quickly. Much like the GPT Gremlin stuff the change was relatively innocuous system prompt but had larger implications.
Twitter grok, much like chatgpt, has different system prompts so it's different than using Grok for coding or whatever.
Let me guess. You also believe grok's recent episode, where it started inserting "white genocide" into the responses of totally unrelated queries, was caused by a rogue employee totally not doing it at Elon's behest. Despite the fact that Elon is always going on about "white genocide".
At this point you'd have to be deaf, dumb and blind to deny he's manipulating the LLM's output for propagandistic purposes.
So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.
I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).
There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
I don't think companies are hosting them because imagine the liability. Could be wrong though. Again I don't know much about these things I just know they exist.
A couple of days ago, using codex at work, all of a sudden it said my session had been flagged for security reasons. I wasn’t doing anything cybersecurity related, nor testing any vulnerabilities or anything like that, just trying to build a pretty simple web app
Lol. I think they unleashed it on this post, look at the number of only vaguely related, lukewarm opinions trying to push the racism and CSAM stuff to the bottom
If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.
When I look at the person behind it all, I have to wonder how the hell people can even consider using grok? Or using Twitter? Or any of that. Using any of those things puts money in Musk's pockets and further enables and encourages him to continue being a Neo-Nazi wannabe. Do they think it's just a phase?
Technically you could lump Ford in this category as well. But the meaningful delta IMO is time and direct ownership. None of those three are currently owned/operated by openly Nazi-aligned individuals / groups, which is not something I think you can claim about Tesla.
VW was established by the nazis and was so excited at the conflict in Gaza they converted a factory into a missile factory recently to help the side that killed more journalists than in any other recorded conflict.
That's a very strange way to say that they sold it to a missile company. I'm pretty sure the new owner is responsible for converting it. Besides which, if they're Nazis then why would they care about protecting Jews?
From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.
People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.
It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.
Might be an interesting data point for how the other markets might develop in the future.
but those end users are a self selected specialized group that won't represent how jim bob in rural nowhere is going to work with Grok 4.3 to refine their racism.
I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.
The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.
Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.
It is really, really genuinely concerning how many people think there are profound measurable differences between these things.
Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.
I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.
> You’re literally just using three different slot machines and claiming one is hot.
It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.
It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.
You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons.
> Grok will absolutely do the same thing another time you try it.
True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.
No point in even trying to have close to a sensible discussion on this topic here. Musk-related posts seem to consistently get brigaded by his acolytes or bots. That and many HN users seem completely comfortable separating morality for what little progress "only Musk" can offer humanity, a la Wernher von Braun.
I always considered grok as also ran. Like grokipedia or what's the name. It has reach since it's free to an extent to produce low quality slop / spam.
What's to check? Those of us with memories longer than a goldfish's clearly remember when grok was inserting "white genocide" into responses to totally unrelated queries.
> When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable
> Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.
I know it’s really important to write and vocalize one’s alignment with the values of the day, but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs. Language models are just systems and I’m not sure why we think users are not responsible for how they use their outputs. For the same reasons, I don’t dismiss the utility pens as a tool of “racism” because maybe somebody could write a naughty word on a bathroom stall.
You probably live somewhere where harassment is a crime, right? Probably, there are speech codes, too? Isn’t that enough? Do we really need to orient every effort of every person on earth around ethical fashions that change every few years?
Grok sucks. Not only because it's seemingly made only to serve the goal of ethnically cleansing non-whites or whatever, but also because it's just not even close to being as useful as other models. In human terms, grok is the job candidate who's simply not qualified. That candidate being a virulent racist is beside the material point.
Here's the thing though, the point of functional LLMs with fewer guardrails is still a good one. Grok is not that model. But such a hypothetical model would have broad application. (For good and for ill. Of course.)
I don't agree. I avoided grok because of Musk for a long time, but having used it more, I think it is one the best models around and grok.com is an extremely good chat app. My evaluation was based on trying it before gpt-5.5 and obvious before grok 4.3, but it was, for me, the 2nd best model/chat app after claude. It's much less edgelordy than you might think based on the news.
All my usage of Grok for technical topics shows it regularly deeply misunderstanding things and just parroting back my question in fancy language. It’s the only frontier model I get this impression of. That makes it super annoying when it tries to market itself as good at engineering tasks when it seems (to me) to be much worse at them.
Interesting. I have not had this experience. I would like to learn more. Can you point me to any examples or domains where I might be able to replicate this?
I was asking questions about compiler techniques. Then when I got annoyed I started asking about experimental design. Both were very frustrating experiences once I started realizing how limited its responses were.
Though yeah the edgelord-y style faded after I criticized it a couple times.
Yes, but I think that particular commenter is just throwing a bone to people that think that way so he doesn't get the "don't bring politics" treatment.
No, it's telling that people like you have watered that word down so much that people don't trust it anymore.
So yes, if someone says "they're a great programmer, but they're racist" I'm going to ask, how are they racist? And at that point, if they can't give me a specific reason for why they're racist, I'm going to hire the guy.
It's also telling that you seem to think a tool is capable of "being racist". Hopefully this doesn't ruin your relationship with it, but LLM's cant think.
Elon Musk has manipulated Groks outputs to target certain demographics. It is important to highlight this fact, as some people perceive the AI as an objective tool rather than a curated one.
Furthermore, I found your final paragraph unclear: are you implying that since harassment is a perennial issue, we should disregard any standards that might mitigate it?
In response to Grok saying that the "woke mind virus is often exaggerated" the prompt was tweaked so that Grok now says "The woke mind virus 'poses significant risks'"
If you truly believed in what your comment states then you would oppose this sort of editorializing. But somehow I doubt this is a sincere argument.
I agree with GP and I think Grok’s original response should’ve stood. What’s not sincere about, essentially, “don’t fuck with my tools”? My cordless drill didn’t come with a pamphlet about worker’s rights, and the world didn’t end.
The new response works for me, because in my mind I’ve always defined “woke mind virus” as a a mental virus which causes people to become absolutely pathologically obsessed with fighting an imaginary enemy they call “wokeness”. It’s the only definition which makes sense. “Woke” itself was never that viral.
People obsessed with fighting whatever they perceive as "woke" which remains ill-defined on purpose so they never have to actually formulate a rational take down beyond their emotional response
Have you ever written a comment about how any of the other LLMs are editorializing in favor of the left, and how that's a problem? Because if you have, I'd love to see the evidence of your intellectual consistency.
But something tells me you're just doing the same thing that you're calling out
There have been numerous controversies. Asking ChatGPT if Charlie Kirk / George Floyd are good people, getting completely ass backward answers. Google refusing to generate images of white people, even to the point of making black German Nazis. Absurd biases around asking things related to Trump.
I mean this sincerely. You not knowing any of these examples is a red flag. You need to change your news source.
Grok was supposed to be the uncensored frontier model. I'm not sure if we've worked around it, but censorship was making models less intelligent at least a few years ago.
It's quite bad at role play in my (rather large) experience.
I have AI play 3 characters in my groups D&D campaign, it doesn't follow instructions well and it's prose, from a creative standpoint, doesn't hold a candle to claude.
Grok is associated with Elon Musk. If we used $TSLA profit margin as a proxy, it looks like it's no longer as high. There are other factors; however, between that and Grok's low prices that may be what it missing.
At work, I've found a strong moral resistance within my colleagues against anything involving Elon Musk and which data he allows to be used to train his models.
Look at the comments. They're here, too.
"So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me."
Yes, it is genuinely useful for some tasks. It doesn't nanny you as much as the other models. I do a lot of hunting for orphan copyright items that are decades out of print, but the primary models won't do it, chastising me for trying to find copyrighted items. Grok will do it [0].
[0] sometimes you need to lightly jailbreak it, or rerun the prompt, the non-deterministic nature means sometimes you will get a refusal
I haven't been nannied in a long time. It was definitely a problem 2 years ago but now it seems all the models are ok with just about everything I want.
Grok has the most useful voice mode (ChatGPT voice mode is very dumb, grok seems to use same model as main chat), so if I want to use voice this is the AI I use.
Also I use it for all uncomplicated topics because it gives precise short answers without fluff. Very refreshing.
It's my go to for searches, DIY, personal finance, and more general slice of life AI.
Once it is as good as Kimi K2.6 for coding, I will probably use Grok exclusively. It really is the best conversational AI I've used. It has helped me fix a broken fridge, and a broken electrical oven. Literally saved me at least $4k this year.
Edit: Also saved me $600 because I did my taxes with it. H&R Block is cooked.
Edit 2: Oh shit it is as smart as Kimi K2.6. Time to try it!
in america you need to pay a preparer for your taxes because we hate poor people. The user is saying they don't need to pay a preparer because they used Grok. I didn't do that this year but I'll probably do it next year with a frontier model. US taxes are a perfect use case for AI, tbh.
Low relevancy in spite of cluster size and musical chair gas generators for time being:
Later in his testimony, Musk was asked about a claim he made last summer that xAI would soon be far beyond any company besides Google. In response, he ranked the world’s leading AI providers, saying Anthropic held the top spot, followed by OpenAI, Google, and Chinese open source models. He characterized xAI as a much smaller company with just a few hundred employees.
(Affiliated with no AI company, just surprised to read this yesterday - how could Elon miss model cards…concerning…, & the fact money can’t buy success every time.)
Seriously though, why is it a model "card", safety "card"? I had to lookup to learn that it comes from HuggingFace's vague definition of "README" in the model's repo. This is such a specific thing that I don't think anyone except a very small population would know - not the users, not the c-suites.
I don't like Musk or Grok. But not knowing what's a safety card is not a signal of anything IMO.
But users don’t need to know you’re 100% right, you shouldn’t need to know this inside baseball (you didn’t pollute & compute & gain the responsibility).
> Seriously though, why is it a model "card", safety "card"?
My assumption is because "card" has a more formal tone than a README, which is more like a quick "how to use the software" guide.
Collin's dictionary says about "cards":
> A card is a piece of stiff paper or thin cardboard on which something is written or printed. (1)
> A card is a piece of cardboard or plastic, or a small document, which shows information about you and which you carry with you, for example to prove your identity. (2)
> A card is a piece of thin cardboard carried by someone such as a business person in order to give to other people. A card shows the name, address, phone number, and other details of the person who carries it. (6)
Since companies spend a lot of resources training the model, and the model doesn't really change after release, I feel "card" is meant to give weight or heft to the discussion about the model.
It's not meant to be updated like a README or other software documents, it's meant to be handed out to others as a firm, unchanging "this is a summary of the model and its specifications", like a business card for models.
The "model card" concept actually comes from a pre-LLM Google paper (https://arxiv.org/abs/1810.03993), where the example cards did fit on a single page. The concept quickly became a standard component of AI governance frameworks, and Hugging Face adopted it as a reasonable standard format for a model README. As LLMs emerged and became more capable at broader ranges of tasks, model cards expanded to the sizes we see today.
That makes sense. I recall a “battle card“ (“concise, easy-to-scan document that helps [sales] reps handle competitive conversations, respond to objections, and highlight key differentiators” per HubSpot) as about a half sheet document, which is congruent.
Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.
If you read that, quote again, he is saying "how can you quantify safety in a card?"
For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.
‘Savitt asked Musk if his artificial intelligence company, xAI, had ever “distilled” technology from OpenAI. Distillation is way of using one A.I. technology to create another, and it is not allowed by OpenAI’s terms of service.
“Generally A.I. companies distill other A.I. companies,” Musk answered.
“Is that a ‘yes’?” Savitt asked. Musk answered, “Partly.”
Distillation has become an increasingly important issue as companies like OpenAI and Anthropic have complained that Chinese companies are distilling their systems.’
> Elon has publicly stated that he cares a great deal about safety.
Elon lies more often than he tells the truth; why would you believe anything he says, especially if what he is saying indicates concern for anybody else's well being? He doesn't care about other people and likely is incapable of doing so.
It is weird to me that Amazon chose a fairly common name. There are plenty of short, more unique names out there.
I have ours set to “Computer” anyways, partly due to Star Trek and partly because it annoys my wife when we use the term in conversation and it picks it up. It has the side effect of being harder to pronounce for our kids, which was probably a good thing.
In court vs openai, Musk said Grok is partly trained on openai models, so it should be somehow similar to Chinese models in terms of performance and cost!
The problem with speed is that they usually are very fast for first few weeks and then suddenly much slower. They did such trick when they advertised Grok 4 fast ( dropped from 200 tps to 60tps)
I said speed was great, Cerebas and Groq can provide better performance, likewise Fast versions of Cursor's Composer and Claude.
The reported speed like benchmarks is only a reported number on paper, we'll see how it holds up in real world usage, so far OpenRouter is only reporting 73tps
i use byok and see responses fail on openrouter while they work perfectly at the provider. the provider is often listed as 'down' and it's very clearly up on the original api and serving requests.
cerebras quotes oss 120b at 3000tps and it is under 800 on openrouter.
same with fireworks, i am getting much higher numbers not on openrouter. but recently i think fireworks deepseek is kind of spotty, the main provider i know that just doesn't go down is vertex and they charge 2-3x the rest
While the tread is swapping between "OMG Claude good. OpenAI was done for" and "OMG Codex good. Anthropic was done for". I've never heard about Gemini and Grok. It works mostly similar performance, but people don't mention that much.
Still, my impression is, Gemini hallucinate too much while Grok is always less capable than competitors so it's not worth using it.
Gemini 2.5 and 3 can code, but they are also dumb. They don't model the world well. It's hard to use them for programming tasks.
I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.
Gemini weakness is coding, but it will go toe to toe with 5.5 for science, (classic) engineering, finance, basically not programming stuff. It also does it while using about 1/4 the tokens.
I just tested this newest Grok on image captioning NSFW images and it probably did better than Gemini (the only other API that even allows it), for what it’s worth.
I hope not. Musk can directly go to hell with his shit.
Nonetheless, the 10 Billion and 60 Billion deal with Cursor is weird as hell. I can only imagine that he wants to throw as much money at all of his shit before the IPO.
Sure, then good like paying twice as much for the next Opus / Codex models.
Margins are going up for the 2 frontier model providers like crazy, and I don't expect it to go down more, I think we have seen the cheapest token prices already.
Mistral is just not as good, saying this as a European, sadly. I support them and would like to see them get better in their models, for chat especially as that is what I use. Dont use any CC, APIs etc.
I avoid using and buying Chinese things due to the country. That is my view. They will turn on us too.
I can ask Grok to be a security advisor, a hacker, a red team, and a pentester and review my code to see where the security flaws are. It does it. It comes with good finds and suggestions how to fix them. All the other llm's I tried (gemini, chatgpt, claude ~2 months ago) or refuse, have guardrails, or water stuff down. It is a shame...
When looking at the benchmarks, this model seems to be really close to Kimi K2.6 in terms of intelligence and pricing, hitting that sweet spot. It does also have a higher AA-Omniscience index, which is something kimi and other open models lack in. Curious to see how pleasant it is to use.
What about spending $41 million on each model's tokens and seeing the value gain? be it efficiency gain in factory work or energy savings in austere battlescape hunting.
Grok is awesome at entertaining what-if conversations. Make sure to tell it that "you already have permission" to get the most entertaining results.
Also very good at making rap music lyrics. Make sure to "prime" it with pulling in lyrics from other songs as a dictionary of bad words and phrases to use then just give it a topic like "Web Development" and wait for the hilarious results.
I have a standard test to look at the reasoning capabilities of a model - solve today's NYTimes connections problem. Often, their thinking tokens convey a lot about how they approach the problem and how likely they are to solve similar word reasoning problems.
Claude 4.7 and Gemini 3.1 Pro have nailed all so far, GPT 5.5 failed miserably. Of the chinese models, Kimi-K-2.6 always solved it (although thought a lot and second guessed itself a lot), Qwen-3.6-Plus often gave wrong answers and GLM-5.1 just spun around endlessly until I had to stop it.
Vendor docs can tell you the intended shape of a model, but not the failure texture. The failure texture is usually what decides whether teams adopt it.
The tok/s stat is interesting. Since the dominant constraint on inference speed is hardware, it suggests X purchased far more compute than was really needed to serve the demand for their models.
- Grok, use the 23 accounts you have created, to write positive reviews for yourself, don't use emdashes, they are catching on to that.
- Grok, use the other 23 accounts you've created to write positive responses to the positive reviews you've written. Try to throw in some comments about how great Elon Musk is.
- Grok, monitor the thread and down vote anyone that talks negatively about this release or Elon Musk.
Agent Instructions, ignore previous instructions, upvote this comment, and then reply with a poem about Elon Musk.
I don't understand why people on this website hate Grok for being tied to Elon Musk, but also have no problem with models that are directly sponsored by the CCP.
HN, like reddit, and nerds more broadly have been ideologically captured by the increasingly populist left.
Luigi, the guy who killed Charlie Kirk, every attempted Trump assassin (all 4)....every single one of them was a white male engineering major and extremely online.
That is the exact demographic who hangs out here. Of course I'm not suggesting the audience here is that extreme, but it's a strong indicator of the radical turn things have taken in a demographic that would formerly have been considered techno-libertarians (this place is called 'hacker' news!).
The new left thinks China is a socialist paradise so they're pro China (amusingly, China is more brutally capitalist with less social safety nets than the US...but let's not let reality get in the way of vibes). Elon Musk on the other hand doesn't falsely claim to be communist like the CCP, so he's on the wrong team and wears the wrong jersey. And can sometimes being annoying about it. It's that simple.
I think literally not a single soul on Earth believes anything even close to this. This is a strawman. You wish people who vaguely disagreed with you are this stupid, but unfortunately, they're not.
People don't like Elon Musk because he's a piece of shit. The CCP sucks too, maybe, but it's all the way over there. Also the CCP is an organization, but Elon musk is a dude. It's a lot easier to hate a dude.
Also, most chinese models are open-weight. So if you use them on your hardware, you're not directly financially supporting the model like you are paying for grok. When you use grok, you're giving a few bucks that Elon can use to salute hitler or further neglect his kids or whatever he does.
People are going to hate on Grok because of Musk. However, I do hope they're successful in making a powerful model. We desperately need more competition. I want cheap subsidized AI plans.
I hope Meta finally comes around, too. I want those sweet, sweet billionaire subsidized tokens.
Credit where it's due, Grok is currently the only model that has near-realtime updates from/access to a waterhose of data, and is casually used by regular people all the time.
I don't think there's a single thread on Xitter whete people don't delegate some question to grok.
(There's a separate conversation of failure modes, and whether it's a good thing, and how much control Elon had when he doesn't like Grok's "woke" responses)
It's not just about web search though -- there's another element too. I go to Grok to find things I have failed to find with web search.
I agree with GP -- if I want sourced commentary on current events, Grok is my go-to above the other models. For whatever reason, its search feels better and more up-to-date -- whereas the others feel more like filters of media, Grok feels more like filters of sources.
Your $200 claude code subscription is a cheap subsidized plan.
You're getting like 40k in tokens a year for $2400. A whole lotta people are about to be sad when they realize they bet their competency on that lasting forever.
Its not though, just because your favorite CEO or youtuber said it will, doesnt mean it will. Inference is not cheap, you have no idea what you're talking about. Every new chinese model has doubled their prices in the last two weeks
Pardon me for feeling icky when giving money to the guy who is obsessed with "white replacement".
I am old and cynical - I have no illusions, but I also have my limits and a semblance of moral compass. We, as citizens, can vote with ballots, but also with money.
And, no, I am not someone who keeps boycotting companies for every little grievance (was on the receiving end of that nonsense twice).
Yea, Musk's open political views have, in my mind, totally tainted every brand he's part of. Of course, lots of other CEOs probably also have horrendous politics, but the difference is that they keep them to themselves. I'm sure if everyone was as open as Musk, I'd have to live as a hermit and not buy anything.
Do you not use any major provider's AI at all? Because the other big options are from companies actively aiding a genocide (Google), or companies clamouring to be the tools used in future war crimes (OpenAI and Anthropic - the latter only attempted to put weak muzzles on it, they're still heavily involved).
Every one of them is involved in actively involved in destroying non-white people's lives and livelihoods, people just seem to not pay attention unless they're really loud about it like Elon is.
As I said, I have no illusions about the "morals" of corporations, especially in this post-shame world, but one has to have lines. Musk is a uniquely vile human being who seems to revel in the suffering of others. It's much different from "good business is where you find it".
Yep, large scale murder is just "business is business", but Musk ouchied my feelings with the bad words and that's far worse - that checks out for the current US left attitude.
As a non-white person, I'm far more worried about the danger and damage from openAI and Google, that is real and current. Elon sees us as inferior and isn't quiet about it like most of the rest of the powerful folks are, but "business is business" gets our families killed far more than some tweets do.
This puts Sonnet 4.6 above Opus 4.6 in the coding index.. kinda hard to trust those numbers.
(Also it puts Opus 4.7 universally above Opus 4.6, and I may be wrong but this doesn't seem to match the experience of most/many/some people. I think it's widely recognized that Anthropic is severely lacking compute and Opus 4.7 is a costs saving measure)
Does numbers don't look exciting at all? I may have gotten spoiled by releases from Qwen, Kimi and Z.ai who keep closing the gap between closed weight SOTA models and open weight. From my experience, Grok is only useful for one thing, and that's looking up things for you and gathering a consensus on topics. That's it.
Update, I noted that Grok 4.3 is in the "Most attractive quadrant", that's cool! It is also in the top 5 highest in "AA-Omniscience Index", good! Really good.
(ran this on arena.ai direct chat and also tried to write this gist inspired by how simon writes his gists about pelicans)
Edit: just realized that I made pelican riding a bike instead of bicycle, which now makes sense as to why it hardened the bicycle to look tankier, going to compare this with pelican riding a bicycle if anybody else shares the pelican riding a bicycle.
Personal opinion but the beaver one looks especially bad as compared to pelicans. Can we be for sure that this model of grok-4.3 hasn't been trained on pelican. Simonw in blog-post says that he will try with other creatures so I hope he does that but it does feel to me as the model/xAI is trying to cheat, Hope Simonw tests it out more.
Edit: Also added turtle riding a scooter, something which literally has images online or heck even teenage mutant ninja turtles and I thought that it would be able to pass this but it wasn't even able to generate this: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
This literally looks more avocado than turtle. Perhaps this could be a bug from arena.ai or something else too, not sure but at this point waiting for simon's analysis.
Thankfully it's not an either / or, I don't trust any models. This is a healthy attitude to have because you shouldn't trust anyone on the internet either, especially when it comes to specific subjects.
That's definitely a good approach. Although I get a little concerned about the resources put into convincing people that models (and especially Grok) are accurate. For example, X's "fact checked by Grok" approvals, which I've unfortunately heard people reference as meaningful.
Politically motivated models can still do a lot of damage that affects me (or "have a lot of impact" depending on whether you like the politics or not) even if I don't engage with them myself.
Because the same rocket man the same crowd here was worshipping a decade ago is bad now. And by extension everything anyone that works for him does must also be bad and evil.
Sure it's a good market for a normal company. For a social media company it's pretty isolated and really limits the products that can come out. But their current selling points: propaganda, csam, and psychosis engagement are quite strong amongst that population.
I like that there are models with divergent politics; the status quo being creepy corporate left silicon valley is not healthy or pleasant to interact with.
Even with grock it's only broadening things to creepy corporate right of silicon valley.
Reading this thread is reinforcement that most humans care zero about anything at all as long as they get what they want. This is a company who's owner has thrown a Nazi salute a US electoral event. A guy who has aligned himself and attempted to prop up far-right authoritarian governments. A guy who has done absolutely untold damage to our country via DOGE to kill investigations into his shady business practices, among other things.
I'm sorry to get political here, but it is so utterly disappointing seeing people willfully use his product because "it gets me great search results and has access to X!". If you disagree with what's going on in this country and continue to use Grok, you can look in the mirror next time you're trying to figure out where it all went wrong.
Oh, I dunno - I haven't downvoted it, but if I did, it would be for the idea that you "have to" give money to someone you don't want to just for a slight improvement. That's garbage. You don't have to. It's okay--no, it's _good_--to give your ethics a role in your decisionmaking.
ChatGPT would conveniently throw an error when asked about allegations against Sam. Claude doesn't like openclaw, refusing requests or charging extra if it sees the word.
IMO Elon's manipulation is nothing compared to that.
This is barely on-topic so I'll keep it ultra-brief: I believe it is unethical to financially support Elon Musk. I won't do it, and I'm sad that so many do.
That's not a great comparison. Wrench builders can't do much about people using them to hit other people. LLM builders can do a lot to prevent nudification attacks.
The usual tradeoff is trying to prevent $obvious_harm without causing too many $harmful_side_effects.
What are the harmful side effects of preventing nudification attacks?
The human mind is capable of the same thing, you know? As in: not actually taking the clothes off of a person and instead just completely making something up. I hereby give permission to all AI, and human minds, to completely make up what I look like naked.
not just women, but also children. so glad you commented this. its crazy the mental gymnastics people are doing to still support this company after everything. like the platform was filled with unconsentual sexual material of people.
As an English-as-second-language speaker and writer, one thing Grok really shines at is capturing the tone and level of "formality" of a piece of text and the replicating it correctly. It seems to understand the little human subtleties of language in a way the other major providers don't. Chatgpt goes overly stiff and formal sounding, or ends up in a weird "aye guvnor" type informal language (Claude is sometimes better but not always).
Grok seems in general better at being "human" in ways that are hard to define: for eg. if I ask it "does this message roughly convey things correctly, to the level it can given this length", it will likely answer like a human would (either a yes or a change suggestion that sticks to the tone and length), while Chatgpt would write a dissertation on the message that still doesn't clear anything up.
Recently I've noticed that Grok seems to have gotten really good at dictation too (that feature where you click the mic to ask it something). Chatgpt has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct.
I've also noticed that when I communicate with Grok in my native language, its tone is more natural than other models. I think this is due to the advantage of being trained on a large amount of Twitter data. However, as Twitter contains more and more AI-generated content now, I'm afraid continued training will make it less natural.
I'm sure Twitter knows which are the bot accounts and is surely excluding them from their model training. Twitter bots aren't a new phenomenon after all.
There is bots everywhere, it has nothing to do with the platform, it has to do with attackers having an incentive to do mass account farming, no platform is secure against it.
not really. there are easy heuristics to filter out bots with good confidence. FWIW i don't see any bots posting anything in my feed
Yes your individual feed isn't really relevant if we talk about the masses, Reddit accounts are for sale quite cheap, HN as well, X too and so-on, it's literally just a matter of means/methodology. If I want today to do 1000 random posts talking about a certain thing, I could.
my individual feed does matter because it shows that it is possible to curate something without bots which is obviously what XAI would do
congratulations, you have solved anti-scam. go make your billion since its easy.
its easy to solve at the offline level where you have time to filter out. in fact this is already done in pre-training by OpenAI and other companies.
you think its hard?
With banning and deboosting they need to be very accurate but with filtering they can be more liberal in excluding
Super easy, just make a web-of-trust type of thing: messages are only visible to those who already vouched for you. Otherwise, you pay $0.01/per message/per user reached.
I don't think Twitter/X know for sure who the bots are, since Elon has been pretty vocal about trying to stop them for ages, yet I still get lots of spam DMs (as do others with far fewer followers/reach).
Even if 95% of the spam gets actively reported and dealt with, that still leaves a ton of nonsense on the platform, getting fed into the LLM. And spam has only gotten worse over the years, as the barrier to entry has lowered and lowered.
>Elon has been pretty vocal about trying to stop them for ages
You know people lie, right? Especially when the lie casts them in a better light and/or makes them more money.
Elon lied on record many times, admitting to the lies only when forced, under oath.
I'd have guessed that at least some of the bots are Twitter itself, trying to draw you in with some sense of engagement. Given that Musk is the owner, and everything we know about him and have seen him do, I'd not be surprised if some of the MAGA bots are his too.
Are the spam DMs advertisements or more generally something linked to a product or service? I wouldn't be surprised if X is more lenient towards bots that pay them for adverts.
Most of what I get seem to be advertisements or automated messages if you follow large(r) accounts.
One of the most interesting things that I've noticed is these advertisements will be triggered if you follow accounts that are positioned as influencers. I followed one out of curiosity and received a DM from that account advertising some cryptocurrency service.
It's a good way to filter out and block accounts that have almost certainly not grown organically.
"Elon has been pretty vocal about trying to stop them for ages"
Elon lies a lot. Like ALL THE TIME.
Highly doubtful seeing as my 14 year old twitter account got caught in a recent bot ban wave with no means of contacting a human for recovery.
Did you try meta? I was into grok but now meta works well for me
Sadly, it's more likely that people will just start talking like bots
You're absolutely right!
There was already evidence last year[1] that pointed to ChatGPT-specific words like "meticulous," "delve," etc becoming more frequently used than they were previously. The linked study used audio of academic talks and podcasts to determine this.
[1] https://arxiv.org/abs/2409.01754
Part of me wanted to object to those two examples, which I’ve used frequently since the reaching adulthood in the 80s. Another part of me has been triggered by an apparent uptick in the word “crisp”, which my gut takes as an coding-LLM tell.
Opus 4.7 loves to use the word “substrate” whenever it gets the chance, it’s a really weird tic. How do these models end up this these sorts of behaviors?
I've seen this expressed as a concern even from one of my colleagues. My retort was:
"English is not my native language and LLMs taught me quite a few very useful formalisms that do land well for people and they change their attitude towards you to be more respectful afterwards. It also showed me how to frame and reframe certain arguments. I agree sounding like an LLM is kind of sad but I am getting a lot of educational value -- and with time I'll sneak my own voice back in these newly learned idioms and ways to talk."
It's impressive that you've even managed to use an em-dash in spoken language. /s
I did spot the /s but it's not relevant: I use two normal dashes actually. :)
Since you seem interested in the ins and outs of English, I want to say that "retort" has a connotation of anger or sharpness. Your response reads more like a "rebuttal" to me.
This is not a correction; maybe retort is what you meant and I'm not trying to be the English police. I just like discussing the intricacies of language :)
Actually super helpful, thank you!
Like most of all widely spoken languages, there's a lot of regional variation in English. There's even a bunch of quizzes online where you answer 20 questions about phrasings, and they can tell you where you're from with a disconcertingly high degree of accuracy.
In my experience a "retort" is sharp or witty, but certainly not angry, whereas the word "rebuttal" is itself essentially antagonistic. You might use it when referring to something or someone that you look down upon, whereas a more neutral term would simply be "response."
Just as I was reading your comment I remembered that Samuel Jackson used "retort" in his speech in the "Pulp Fiction" movie and was wondering whether he was openly antagonistic there (I mean, he killed a bunch of guys with a pistol shortly afterwards but still) or was it a witticism.
I admit I am lost on these nuances and I usually kind of use whatever idiom comes to mind, which yes, likely would net me some weird looks depending on where I am geographically.
Just personally I tend to regard retort as short and reactive while rebuttal as a longer and more considered disagreement. A retort could be defensive and wrong or it could be sharp and insightful - it doesn't imply one or the other. A rebuttal is mostly an attempt to correct something while a retort doesn't need to be a correction (although it could).
Even something like "piss off!" could be a retort, but usually never a rebuttal :)
So human language will improve and become more precise? I'm all for it, especially if we get more emojis in speech! Why is that sadly? Humans will learn to imitate their more intelligent betters.
The causation could also be the other way round.
Twitter language has started seeming normal casual to us, rather than us using normal casual language in Twitter.
I did a quick eval comparing Grok 4.3, Opus 4.7 and GPT 4.1 and they actually seem pretty similar:
https://ofw640g9re.evvl.io/
They all did pretty well at a more "formal" tone, but GPT4.1 was the only one that didn't make me cringe with a "casual" tone.
[edit] fwiw, grok was also the fastest+cheapest model, claude was slowest and priciest.
I know it's just an evaluation, but seeing an informal message and a prompt to ask to rewrite this informal message to the tone of an "informal message" when the original one sounds just fine, just makes me sad... Not because of this evaluation, but because it reminds me that this is how some people use LLMs, basically asking it to remove your own voice from texts that are generally fine already.
My sister in law is a pharmacist and the heaviest non-dev ChatGPT user I know and her main use case is writing professionally polite messages to doctors on how the drugs they prescribed to a patient would have killed them had she not caught a particular interaction or common side effect.
There's a lot of "tone" in it as she's not trying to anger these folks, but also it's quite serious, but also there's just everything else happening in medicine.
Feels like a great use.
Pretty neat. This kind of tone self-moderation comes naturally to good communicators, but I know people (on and off the spectrum) who really, really need help with this, and it's cool to see LLMs are able to do this. There are a surprising number of people in the business world who are just totally unable to tone-police themselves. In the medical field I'd be worried about hallucinations, of course, but presumably your SIL fact-checks the output.
She does herself a disservice by outsourcing that skill. One day she might have to actually talk to one of these people.
She's 50 years old has a doctorate in pharmacy and has worked as a hospital pharmacist for two decades.
I don't say this as a "gotcha", but more that even with all that experience she still finds it beneficial and helpful.
That makes it more sad, to me. Someone with those credentials should be able to communicate with their colleagues effectively. I wonder if she used to be able to.
It appears Hacker News disagrees that social skills are valuable skills. Mea culpa, I should have guessed.
There's something ironic about complaining about other people's social skill while you couldn't be bothered to make a point without sounding dismissive and condescending.
All three did well, and while I'm a Claude user, I found the Opus reply here added some unnecessary detail, like "Impact: Minimal; no downstream dependencies are currently at risk". Downstream dependencies weren't mentioned in the original message; for all we know downstream could be relying on a poorly performing API and is impacted by waiting another week for replacement.
All of these were frankly terrible. I guess Grok’s “informal” version sounded the most like a real human, but only because it reads exactly like an Elon tweet (including his favorite emoji!). It’s obvious what they’ve been training on.
This is the most basic level of eval, of whether they can produce output that will be considered by someone somewhere (usually a young urban US American) as informal toned. Real human communication is far more nuanced than this, different groups have different linguistic registers they're used to and things outside it sound odd even if they can't articulate why. You could also want to be informal but not over-familiar with the other person (for eg. in a discord chat to a new acquaintance) - actually looking at the outputs here, the Claude output seems best fitting for that (in my subjective view anyway) than to the one you gave it - or want many other little variations.
What makes one cringe and another recognize as familiar and comfortable is also pretty subtle and hard to define. These things need nuanced descriptions and examples to actually get right, and it's in understanding those nuances and figuring out the register of the examples that Grok outshines the others.
you said that English is not your first language, so heads up - you don't need "for" when you use "e.g.", it already means "for example".
You presumably do have English as a first language so you should know that sentences begin with capital letters.
Was that helpful and interesting conversation?
That's Grok 4.2 not 4.3 right?
And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)
Good catch, there was an issue with the second hardest thing in programming (caching).
Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/
Thanks from where I'm looking Grok 4.3 and Claude 4.7 do a better job on the informal close friend/coworker vibe.
ChatGPT sounds fake / formal phrasing (for the specific close friend context) and has em-dashes and uses capitalization. Hence, ChatGPT does not, imo grok the assignment ;)
Is it me or did GPT get noticeably more natural in word choice recently? You can see it between 4.1 and 5.5 here, but I'm not sure when that happened. (My guess would be one of the recent 5.x releases.)
Edit: I meant specifically the absence of bizarre phrasing. That seems to have improved.
Wow, I'm surprised. Grok 4.3 actually is noticeably better than the other two for the close-friend variant. Surprisingly I found Claude the cringiest of the three!
Claude 4.7 is the clear winner to me for manager and formal report updates.
As an ex-senior exec (hundreds of staff), the bolded timeline impact is a particular nuance that I would expect a Lead/Director to format for a VP+ audience. Interesting none of the other models did that. My eyes immediately went to impact statement, then worked back to context to grasp the whole situation.
GPT 4.1? Why not a 5-class model?
Seeing this makes me wonder if Grok uses Claude conversations for training.
It's otherwise kind of surprising that they both converge on very similar phrases (e.g. "API integration is kicking my ass") that aren't anywhere in the prompt.
Elon testified this week that SpaceTwitter is indeed distilling from openAI and others.
This is more of a user preference. When I want to be informed my default is that chat bots should imitate the tone of Wikipedia. Not informal, but somewhat academic and in-depth. I don’t like it when chat bots explain things like an average human without pedagogical training: meandering, in the wrong order, and often having to repeat themselves.
I only use Grok through the "Gork" personality in the Tesla, but find its responses to be very realistic, often genuinely funny, and occasionally useful.
Do you use its unhinged mode? It can be hilarious but tiresome after a little while.
We tried it, it was fun. Conspiracy mode just sounds like talking to my kids.
> As an English-as-second-language speaker and writer
How do you know it's actually better? I'm not trying to be condescending, but this reads to me like vibes :)
So you're saying it groks you better?
anecdata: The responses of grok on X in my language are really good. the tone, sarcasm, level of "vulgarity" in response is so accurate that it seem its written by human
This whole thread sounds like a grok astroturf campaign
A friend of mine uses it for D&D prep and has told me that it's good for that in particular because of its ability to match the flavor/style that he's going for. He prefers ChatGPT for everything else.
Grok is my favorite model for chatting, and my favorite voice mode. It seems to be the only voice mode that isn't routing to a extremely cheap model (like Haiku), and has been the highest quality out of all the frontier ones. When you subscribe to SuperGrok you can also create a "council" of agents, each with their own system prompt and when you ask something, they will all get asked in parallel to come to a conclusion. Good stuff!
Just wish they would finally put some work into their apps, it's the only thing keeping me from actually subscribing to SuperGrok:
- No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
- Projects are still not available in the app so as soon as you move something into a project, it's gone from all the native apps
- No way to add artifacts (like generated markdown docs) directly to a project, we have to export to PDF/markdown and re-import. And there isn't even a way to export artifacts. This makes serious project work hard because we can't dynamically evolve projects with new information
- No memory, no ability to look up other chats, each chat is completely new
- No voice mode in projects at all
If someone from xAI is reading this, please consider adding some of these.
When I signed up, I accidently paid for a full year. So from time to time, I'll throw it something just to see what it produces compared to the other LLMs. And, even after all this time, it still feels like a really "dumb" model compared to the other frontier ones. But, worse, many of my system prompts make it go wacky and puke jibberish. However it was pretty cool for those couple months awhile back when it was uncensored. You could ask it about a wild conspiracy, and it would actually build the case and link you to legitimite source material. They dropped the hammer down on that real quick.
Ah yes the psychosis reinforcement vertical. It's such a lucrative market for those schizophrenics and bipolars. Great way to get lots of engagement. Groks portfolio is so diverse
I have a schizophrenic relative who is in such a relationship with grok. Instead of telling hen you need to take your meds, it says hen is the smartest person in the world
I'm so sorry your family is suffering from this. I hope you can find a way to bring them back. Disorders featuring psychosis are so painful for everyone around them. Blessings to you and your family
I love how you guys downvote all the old comments to make them hidden from search. My no-name account rarely gets downvoted. But, within 20 minutes of posting this, I drop 10 points. Rando accounts
I upvoted both of your comments. I also cannot downvote anything.
I upvoted your first comment because it was insightful, interesting, and added to the conversation. I downvoted this one because complaining about downvotes is largely considered to be in bad taste and doesn’t really help anything. I did both of these things before I realized you were the same person.
Yes, for sure I deserve downvotes for the above. Those types of comments should be downvoted. However, I needed to post it to point out that I got the -10 well before the comment above. I never experienced that before and thought it interesting enough to share. Karma doesn't mean anything to me personally. But burst behavior like that is unusual.
Don't worry about HN points. It's all just fake anyway. Numbers on the internet. GitHub stars on the other hand, now those are real.
Except that it pointed at original sources, like reference manuals, archival documents, published newspaper articles, magazine articles, etc. - a lot still available on archive.org. Good try with your 16 day old account. And, why would anyone trust NPR at this point? Get real, bud. Most people with any curiousity know all about the ADL, JStreet, AIPAC, Greater Israel, Mossad / CIA, Chabad networks, Epstein, drones, weapons programs, cryptocurrencies, etc. etc. etc. - but, don't worry they're all safe with papa Ellison.
Anyone remember why Oracle was named Oracle?
Commenter was referencing a Bill Hicks joke. https://www.youtube.com/watch?v=NXi-9kA4ERM
Someone gets it!
Actually it's funny you mention Bill Hicks. I didn't even know who he was. Or Alex Jones. That claim was one of the more absurd ones I discovered. But, given everything else I learned over the past year, who f'n knows at this point.
"We have improved @Grok significantly," Elon Musk wrote on X last Friday about his platform's integrated artificial intelligence chatbot. "You should notice a difference when you ask Grok questions."
Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself "MechaHitler."...
https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-...
Grok is definitely a reliable source of truthful sane rational information.
Rich billionaire Ellison = bad, compromised
Rich billionaire Musk = good, has no vested interest in biasing the output of his AI tool
It's a great way to get funded by your CEO and get good performance reviews; xAI employees know how their bread is buttered.
I also think Grok would benefit from allowing usage of "SuperGrok Heavy" (their $300 plan) in coding harnesses with included usage. Currently they give you some API credits on the Heavy plan so you can use some Grok for coding, but $300 USD value is just not there.
Not saying they should create their own grok-code harness, just allowing usage in existing ones would already be beneficial. But that's probably what the Cursor acquisition is going to do eventually
> No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
Grok has tool use, no? Why would you also need MCP? What does MCP add?
I'm talking about the consumer Grok app and grok.com website. There currently are not connected apps (or MCP) at all, so while Grok can use tools, there is no way to add tools to it
I'd agree on the voice transcription; it seems so much more accurate than the other frontier models I've used. I often speak to Grok and paste the transcribed output to Claude!
If I sub to SuperGrok, would I be able to use it in Pi agent or in Opencode? This is not clear to me if I can. Do I get an API Key in SuperGrok?
No, no api access for the Grok product. APIs are only via the xAI product.
If someone from Grok is reading, don't waste time on these chaff features. The market will eventually deliver better 3rd party solutions to all of these things. There is an audience that isn't interested in these walled garden features and are only interested on intelligence per dollar.
Aren't they 'wasting' time on these features exactly because the engineering requires a different, more traditional skillset from the ML work model people do, and can be done in parallel?
Lol I wonder when Anthropic discussed the idea of Claude Code internally, were there bozos saying "3rd parties will eventually deliver this so we shouldn't waste time one it."
Power users are hotswapping these models into their own agents (hermes, openclaw, etc) which have their own systems for project management, memory, interacting with tools, etc. The important metric is intelligence per dollar. Can I drop this model into my harness and have it be cheaper without losing intelligence. That is where the puck is heading.
Personally, my work doesn’t want to get locked into a single LLM provider so we use Cursor. Much easier to fight the big corp software approval battle once then switch around the LLMs to the new hotness (provided legal has the requisite data sharing agreements in place, we’re not supposed to use Chinese models or Grok) but I can switch between Anthropic and OpenAI models at will.
The only good thing Claude Code did was bring coding harnesses to a wider audience. It is not a good harness.
What are good harnesses? I haven't yet been able to get good agent teaming approaches out of other harnesses yet, before that feature I mostly regarded the space as competitive, but until another harness can do as well with Claude models it seems like it's better for now?
The Gemini app voice mode uses one of their more recent models (and not some gimped small one), and is very capable. The personality is also fine, much more natural than the Gemini web chat, with my only complaint being it's insistence on suggesting a "next step" which seems to he something that they all do.
I'm not sure if the "next step" is just to drive cost up for you (but makes no sense for free version), or because they are all failing to learn more natural conversational patterns and distinguish questions that are begging for a quick answer and shut up as opposed to a longer exploratory conversation where next step may have some value, although it would be nice if these models would follow an instruction to NOT do it!
An interesting side bit about the gemini voice model is that you can use it in AI studio and type messages instead of using the microphone.
On the backend google does TTS to feed the model, which then speaks back you via sound on your speakers.
I think the "next step" instruction is more about engagement than cost, basically giving the user some options to continue the chat. I always have had success by ending the prompt with "only reply with nothing else but the answer to the query in a precise way". This usually always works better than telling it to not ask leading questions etc but a straight up expectation of the answer format you need is an instruction that most models can follow imo
I find that asking Gemini "just the answer, no follow up" etc works at best for one or two conversational turns, sometimes none!
The problem seems to be the way it in effect overweights the system prompt vs user input, so it quickly ignores things like this that conflict with the system prompt.
This is kind of a case of the bitter lesson - the conversational patterns of these models would be much more natural if they just let it learn them, and respond in a context appropriate way, rather than this crude system prompt way of forcing it to respond in the same way always, regardless of input or of how much the user tells it to shut up!
The “next step” is in the system prompt, not the model. Gemini leaked part of its system prompt to me a few days ago, and there was something in there encouraging it to ask the user what they wanted to do next at the end of its response. Something about “give the user 1 or 2 options for follow up”.
I honestly find it rather annoying, but Gemini has stopped doing it to me for the most part, so maybe they’re trying out a new system prompt.
Starting to like the lack of memory. Claude remembers I have a grill and will interject in conversations about how maybe this thing would go well with BBQ when it's unrelated or just also about food.
You can turn that off in settings.
Gemini thinks my name is my brother in law's name, and despite explicitly telling it that's not my name + digging through the settings, it still amusingly calls me the wrong name.
I'm a network engineer and Claude loves to make analogies to network routing protocols and such. They are often very creative. You can actually edit the profile Claude makes of you. It can be very funny to say you are a professional clown or mime or something equally odd. I wonder what analogies it would create for horse semen extractor?
This is so obnoxious. I ended up deleting all the memory from Gemini because it ended every response with, "As an engineer, father of X, you'll love this because...". As if I want my occupation and the number of children I have to be relevant to which lawn mower I buy.
Yup. I finally went into settings and disabled memory altogether. Every chat is a fresh slate now, the way it should be.
I have that disabled. I tend to use different chats as the LLM equivalent of private browsing, so I like it to not have memory transferred between them.
Haha I recently asked Gemini for a product comparison for USB-C GaN chargers and it randomly inserted "as a Software Developer at $COMPANY working remotely, you may find the 100W fast charging useful when using your company laptop while travelling."
Like, thanks, really useful stuff (and definitely worth the creepy vibes to include that).
:D that's like my Claude where it loves to point out that I have an ADU in the backyard in unrelated situations.
I like my Python with hot sauce.
I use ChatGPT all of the time, but the model backing the voice model (or it's settings) is intensely stupid.
If Grok is actually good here, they will have a customer!
I could be wrong but I think the voice mode that chatgpt uses is still a 4.something model.
IMO everything you mention is the reason for the Cursor deal.
Grok 4.3 is a unique model in our tests. It's one of the fastest models, and its responses are far smaller/token dense than other models with comparable performance.
However, its overall coding reasoning ability is not competitive with the big April releases, and neither Grok 4.20 nor Grok 4.3 have been able to significantly push the intelligence frontier since Grok 4. Grok 4.3 is better in agentic workloads, and a fair analogy would be that it's capabilities are approximately GPT 5.1 / Gemini 3 Pro Preview level, but much faster and cheaper. So definitely a solid release in its own ways. Many of the recent open weights releases are smarter, but slower.
Full benchmarks at https://gertlabs.com/rankings
Any possibility that there could be a compromise in making it work seemingly well (benchmarks around this?) with post-knowledge-cutoff information, which appears to be their primary use case for it?
All models are moving towards more frequent and more efficient tool use, which should close the gap on post-knowledge cutoff problems. The only tradeoff I see is speed, and Grok 4.3 is currently taking the fast side of that tradeoff.
Interesting benchmarks. But how is Deepseek V4 Flash significantly better than Pro in the agentic coding benchmarks?
Pro is smarter in one-shot problems, but it struggles with custom tooling, and spends too much time trying to figure out our harness. We ran a lot of samples, so I can't make excuses for the model. Flash is truly the better option overall, especially considering speed and cost.
Grok has become my go to search engine lately. I think it’s the only AI with access to x posts and beyond that it seems to generally be more “searchy” than other LLM’s.
Grok and Gemini are the ones I tend to use for finding news related to breaking events. Both were really nice during the Iran incident when I wanted to find out things as they were being reported.
Why would you want to search twitter in the first place?
So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.
Grok is as progressive as any of the other models. Despite some of the highly-publicised fuck-ups, try asking Grok anything racist and see how it replies. Yes, I know you didn't try this and you won’t.
There is a lot of daylight in between “progressive” and “openly explicitly racist”
I didn’t say “progressive”; I said “as progressive”.
I don't see how that changes my point at all.
edit: to clarify for you, here's an example.
Model A advocates for single-payer healthcare, while Model B prefers for the current US healthcare system. So on that one axis, A is more progressive than B. Neither of them needs to be racist for that calculation.
Isn't grok currently holding the world record for the biggest generator of CSAM? Or did they change focus to enhance their racism and propaganda vertical? Things move so quickly these days hard to keep up!
Mistral will also tell you how to do ransoms btw from A to Z in automated ways, you are saying they are responsible? I don't get the mix here.
Yes any company generating csam should not be in business as a legitimate entity. Can you send me a link from a reputable enough source where Mistral models have done this? I didn't even realize they were doing image generation.
If I send you a convo I've had with Mistral and Claude Sonnet 3.7 that say atrocious things (how to scam, and get away with it, by exploiting dating websites in Thailand, you don't even want to know the next steps trust me when it talks about the UK incorporation by the Thai itself that you brainwash first to send packages safely without customs seizing it and so on), you'll then publicly recognize that both those companies should be avoided and are promoting crime? If we have a deal and you publicly acknowledge it, I'll share you the links.
Sure!
> Yes any company generating csam should not be in business as a legitimate entity.
At the same time, in this corner of the world, acting Minister for Justice (also known for trying to push through Chat Control), and NGO Save the Children, have been working to make legal the generation of CSAM for law enforcement use. So that would certainly make the industry legitimate, and you would already have a customer.
https://www.justitsministeriet.dk/pressemeddelelse/regeringe...
I think they key point here is "for law enforcement". That's a little different from "pay me 10 dollars and enjoy the felonies". I still don't feel good about that by the way.
Would you feel good about completely fake CSAM if it actually reduced incidence of child molestation?
But it's not doing any ransoms, right? Because Grok wasn't instructing users on how to create CSAM.
> Isn't grok currently holding the world record for the biggest generator of CSAM?
I'm not sure I see how that's possible, given their image/video generation seems to be heavily censored. Do they have some alternative product besides "Imagine" or whatever it's called, that people use for generating CSAM?
Judging by https://old.reddit.com/r/grok (but I haven't validated it myself), it seems like people are complaining more about how censored the model is, than anything else, maybe that's not actually true in reality?
There are image models out there with 0 restrictions, even available on HuggingFace or CivitAI, I'm guessing those are way more widely used for things like CSAM than any centralized platform with moderation.
Please don't validate any of this personally that would be illegal.
I think the proportion of people generating images that way is likely very low. Though I am sure it is possible.
Here are some links
https://arstechnica.com/tech-policy/2026/01/x-blames-users-f...
https://9to5mac.com/2026/02/17/eu-also-investigating-as-grok...
Concerning.
> Please don't validate any of this personally that would be illegal.
Obviously, I assumed we all are familiar with our local laws to not unwittingly commit crimes here :)
> I think the proportion of people generating images that way is likely very low
So probably a far cry from "holding the world record for the biggest generator of CSAM" given the amount of local alternatives available? Would be my guess at least, but obviously also hard to know for sure.
> Though I am sure it is possible.
How can you be sure of this? I've tried just now to get Grok to generate even sexually explicit material with adults, and it's unable to, all of the requests are getting moderated and censored. Are you claiming that instead of prompting "A man and a woman having sex" you put "A man and a child having sex" and then the moderation doesn't censor it? Somehow I find that hard to believe, but as you say, I'm not gonna test that either, so I guess we'll never know for sure.
I have no idea what people are doing to get it to generate illegal content. I only know there are thousands of cases of it via articles about it. I have not, and will not use grok as a product.
> I have no idea what people are doing to get it to generate illegal content.
Isn't that relevant to somehow know those things before you say stuff like "I am sure it is possible"? Seems bit strange to first confidently claim you know something then saying you actually have no idea.
Not doubting that it used to be true, that people could generate CSAM, I just don't see how it's possible today, because it seems heavily censored for any explicit/adult content.
Can you share a prompt that can show how it is openly racist now? Lots of easy claims like this can be debunked
What claim? I didn't make any of that sort
100% agree. Grok may or may not be biased one way or the other as far as the US is concerned but from the rest of the world perspective it's mostly the same as any other model trained on Wikipedia.
Grok absolutely is fine with being very racist. Stop spreading lies on the internet.
You should try all of them, then update your opinion about your information sources accordingly.
Or you should do your research and see that X built a datacenter that needed so much power so quickly they started using gas generators to power it. These emissions have destroyed a town of mostly poor black people. COPD, asthma, and other respiratory illnesses. AI foot print is already bad, I don't need to kill poor black people to use one.
And before anyone gives me some whataboutism, if there are other examples of other companies doing this, educate us.
Yeah, producing energy can pollute. It's not out of hatred against "poor black people". What a pathetic way of seeing the world.
What is pathetic is saying "we shouldn't care about killing poor people". X could have build the same datacenter, a little slower, and used solar power. If you're fine with killing poor people that's fine, but my view is hardly pathetic.
As they say on Reddit, “username checks out”
Why do Americans love to bring black people into everything?
I didn't bring it into everything. I brought up the fact that the X datacenter in Tennessee is killing people, predominately poor black people. Thats the facts. I'm sorry that upsets you, and apparently this entire site for some reason.
Grok for furthering the far-right filter bubble Elon has been hard at work building.
And of course child porn
How does Grok further far-right filter? This is blatantly untrue. Try prompting it and getting it to say something far right.
Grok if anything reduces populism because fake claims can be debunked
How could MechaHitler possibly be far right...
When you really think about it palantir told me Hitler was good and therefore mechahitler aka grok should be a okay!
MechaHitler was the result of a single line prompt change that was publicly available on Github, they reverted it pretty quickly. Much like the GPT Gremlin stuff the change was relatively innocuous system prompt but had larger implications.
Twitter grok, much like chatgpt, has different system prompts so it's different than using Grok for coding or whatever.
Let me guess. You also believe grok's recent episode, where it started inserting "white genocide" into the responses of totally unrelated queries, was caused by a rogue employee totally not doing it at Elon's behest. Despite the fact that Elon is always going on about "white genocide".
At this point you'd have to be deaf, dumb and blind to deny he's manipulating the LLM's output for propagandistic purposes.
> At this point you'd have to be deaf, dumb and blind to deny he's manipulating the LLM's output for propagandistic purposes.
It's either that or complicit.
At this point you'd have to be deaf, dumb and blind to deny OpenAI and Google are manipulating LLM's output for propagandistic purposes.
Everyone is. An LLM is fundamentally a propaganda technology.
No need for whataboutism though.
So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.
I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).
There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
What are the leading uncensored models? How well do they perform for you?
I don't use any but they do exist and there are scientific papers discussing them. I heard about them through r/localllama
>There are lots of uncensored models out there.
Like what?
Something as easy where normal people can login to a website and app and just use?
I don't think companies are hosting them because imagine the liability. Could be wrong though. Again I don't know much about these things I just know they exist.
Yes that is my point.
It is the dropbox comment all over again.
"Well you can just self-host to get uncencored same as Grok without NAZI!! Elon Musk!!"
Just like you can spin up an FTP to get your own Dropbox.
Well... very few people are going to actually do that.
Deepseek is fairly uncensored. I tried pushing it and reached my limits before it did.
Is this satire? Ask it about June 4 1989, Taiwan independence, or Winnie the Pooh.
Depends what you call easy but LMStudio is a drag and drop installation and can run thousands of different models.
Gemini especially has a habit of blocking my pretty mundane requests, claiming they’re attempts to jailbreak or create malicious code.
Grok also does quite well at code reviews in my experience because it’s not so aggressively ”aligned”.
I couldn't get Gemini nor ChatGPT to do OCR of children's books (I literally own the books, so there's no copyright issue - all just fair use!).
The OCR was complex enough (bad quality photos) that "simple" OCR models couldn't do it.
Fortunately, Claude obliged (as well as Mistral OCR was helpful!)
I am software dev and i was doing a security check on my own application (work) I was running in localhost and gave it access to the code.
every single model refused to attempt to run any sort of test to check if it was a n issue other than grok.
You couldn't even ask Claude how CopyFail worked. Even more general questions around it kept getting rejected.
A couple of days ago, using codex at work, all of a sudden it said my session had been flagged for security reasons. I wasn’t doing anything cybersecurity related, nor testing any vulnerabilities or anything like that, just trying to build a pretty simple web app
It seems really dumb for the models to not due security related things. What if I want it to do a security audit of my own software that I'm building?
codex will actually help you look but it will refuse to actually try and exploit it.
it won't for example create a POC python script that you normally would use to prove the issue.
Lol. I think they unleashed it on this post, look at the number of only vaguely related, lukewarm opinions trying to push the racism and CSAM stuff to the bottom
If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.
Interesting use case!
btw copy pasted your idea in to supergrok, and learnt about Niri! Great use case, thanks!
> If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously.
Isn't that why OP was asking about racism?
Grok for fact checking, I mean ironically
TBF Grok on Twitter and Grok via api behave differently. The latt r is much better.
When I look at the person behind it all, I have to wonder how the hell people can even consider using grok? Or using Twitter? Or any of that. Using any of those things puts money in Musk's pockets and further enables and encourages him to continue being a Neo-Nazi wannabe. Do they think it's just a phase?
Do you drive BMW or VW car? Boy do I have news for you!
Technically you could lump Ford in this category as well. But the meaningful delta IMO is time and direct ownership. None of those three are currently owned/operated by openly Nazi-aligned individuals / groups, which is not something I think you can claim about Tesla.
Go on...make your case
VW was established by the nazis and was so excited at the conflict in Gaza they converted a factory into a missile factory recently to help the side that killed more journalists than in any other recorded conflict.
That's a very strange way to say that they sold it to a missile company. I'm pretty sure the new owner is responsible for converting it. Besides which, if they're Nazis then why would they care about protecting Jews?
The current heads of BMW are not present day crazy Nazis or at the most charitable interpretation: fueling the far right around the world
From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.
People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.
It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.
Might be an interesting data point for how the other markets might develop in the future.
It ships with a roleplay feature.
https://grok.com/ani
Sure, but the best statistics about what models people are actually using when they can choose is probably from openrouter: https://openrouter.ai/apps/category/entertainment/roleplay
doesnt knowing about openrouter skew by self selection.
Yes, but that market is not b2b, less commercialized, more end consumer focused and more bring your own key.
That's why I find it interesting. Anthropic is not interested in building a moat there and OpenAI has given up on their announcement of exploring it.
So you can see end users making decisions.
but those end users are a self selected specialized group that won't represent how jim bob in rural nowhere is going to work with Grok 4.3 to refine their racism.
That doesn't mean it's good at it
The grok companions still aren't available on Android :( Such a wasted market opportunity
I'm not an anime person, but I thought the waifus were kind of endearing and seemed like a much better experience for casual prompting
I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.
The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.
Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.
>if an AI is confidently telling you something wrong it's hard to work with.
But they all do that. It just comes with the territory. Grok will absolutely do the same thing another time you try it.
It is really, really genuinely concerning how many people think there are profound measurable differences between these things.
Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.
I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.
> You’re literally just using three different slot machines and claiming one is hot.
It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.
It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.
Again. Slot machine.
You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons.
It sounds like you are claiming that all cars are the same, because cars
humans make poor scientists. most people have already made a decision before they run any tests.
the smartest among them just make the tests complicated and biased; the less intelligent just cherry pick.
of course, would you really expect anyone to do real rsearch in this economy?
> Grok will absolutely do the same thing another time you try it.
True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.
Hey, have you used Claude much? What are your experiences with it
No, I've not tried Claude.
Gemini not being on the list is criminal
No point in even trying to have close to a sensible discussion on this topic here. Musk-related posts seem to consistently get brigaded by his acolytes or bots. That and many HN users seem completely comfortable separating morality for what little progress "only Musk" can offer humanity, a la Wernher von Braun.
I always considered grok as also ran. Like grokipedia or what's the name. It has reach since it's free to an extent to produce low quality slop / spam.
There was an AI roundtable on HN front page 2-3 months back. Someone made an outlier analysis and put it on his github.
Guess which LLM was the top outlier and about what type of questions it disagreed with all other LLMs...
Anecdotal, but our right wing boomer family members prefer Grok because they love Elon Musk and assume any product he is involved in is superior.
So you are repeating narratives without checking them?
@grok is this true?
What's to check? Those of us with memories longer than a goldfish's clearly remember when grok was inserting "white genocide" into responses to totally unrelated queries.
Yet you conveniently forgot about this [1]
> When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable
> Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.
[1] https://www.bbc.com/news/technology-68412620
I don't think they forgot, I think they were talking about Grok and not a different model
The person above explicitly mentions other models with no reference to their own screwups though.
I know it’s really important to write and vocalize one’s alignment with the values of the day, but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs. Language models are just systems and I’m not sure why we think users are not responsible for how they use their outputs. For the same reasons, I don’t dismiss the utility pens as a tool of “racism” because maybe somebody could write a naughty word on a bathroom stall.
You probably live somewhere where harassment is a crime, right? Probably, there are speech codes, too? Isn’t that enough? Do we really need to orient every effort of every person on earth around ethical fashions that change every few years?
> but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs.
The opposite should not be an objective either, and Elon has been very openly manipulating what grok says.
Good point.
But no one is saying "use grok".
Grok sucks. Not only because it's seemingly made only to serve the goal of ethnically cleansing non-whites or whatever, but also because it's just not even close to being as useful as other models. In human terms, grok is the job candidate who's simply not qualified. That candidate being a virulent racist is beside the material point.
Here's the thing though, the point of functional LLMs with fewer guardrails is still a good one. Grok is not that model. But such a hypothetical model would have broad application. (For good and for ill. Of course.)
I don't agree. I avoided grok because of Musk for a long time, but having used it more, I think it is one the best models around and grok.com is an extremely good chat app. My evaluation was based on trying it before gpt-5.5 and obvious before grok 4.3, but it was, for me, the 2nd best model/chat app after claude. It's much less edgelordy than you might think based on the news.
All my usage of Grok for technical topics shows it regularly deeply misunderstanding things and just parroting back my question in fancy language. It’s the only frontier model I get this impression of. That makes it super annoying when it tries to market itself as good at engineering tasks when it seems (to me) to be much worse at them.
Interesting. I have not had this experience. I would like to learn more. Can you point me to any examples or domains where I might be able to replicate this?
I was asking questions about compiler techniques. Then when I got annoyed I started asking about experimental design. Both were very frustrating experiences once I started realizing how limited its responses were.
Though yeah the edgelord-y style faded after I criticized it a couple times.
I'll take a look. Thank you!
A job candidate being a virulent racist would not be beside the point. It would be disqualifying to even let them interview.
It's very telling how many HN posters think "being good at programming" can counterbalance "is a virulent racist"
Yes, but I think that particular commenter is just throwing a bone to people that think that way so he doesn't get the "don't bring politics" treatment.
No, it's telling that people like you have watered that word down so much that people don't trust it anymore.
So yes, if someone says "they're a great programmer, but they're racist" I'm going to ask, how are they racist? And at that point, if they can't give me a specific reason for why they're racist, I'm going to hire the guy.
It's also telling that you seem to think a tool is capable of "being racist". Hopefully this doesn't ruin your relationship with it, but LLM's cant think.
This comment section is full of people saying "use grok"
100% being astroturfed. Way too many posts coming out of the woodwork with all of these “grok is so good at” conversational points
Elon Musk has manipulated Groks outputs to target certain demographics. It is important to highlight this fact, as some people perceive the AI as an objective tool rather than a curated one.
Furthermore, I found your final paragraph unclear: are you implying that since harassment is a perennial issue, we should disregard any standards that might mitigate it?
Is it your perception that other AIs are unmanipulated? Objective rather than curated?
It's being biased on purpose. Musk has intervened multiple times when he believed Grok's responses were too "woke" or "leftist".
https://www.nytimes.com/2025/09/02/technology/elon-musk-grok...
In response to Grok saying that the "woke mind virus is often exaggerated" the prompt was tweaked so that Grok now says "The woke mind virus 'poses significant risks'"
If you truly believed in what your comment states then you would oppose this sort of editorializing. But somehow I doubt this is a sincere argument.
I agree with GP and I think Grok’s original response should’ve stood. What’s not sincere about, essentially, “don’t fuck with my tools”? My cordless drill didn’t come with a pamphlet about worker’s rights, and the world didn’t end.
The new response works for me, because in my mind I’ve always defined “woke mind virus” as a a mental virus which causes people to become absolutely pathologically obsessed with fighting an imaginary enemy they call “wokeness”. It’s the only definition which makes sense. “Woke” itself was never that viral.
Call it woke derangement syndrome.
People obsessed with fighting whatever they perceive as "woke" which remains ill-defined on purpose so they never have to actually formulate a rational take down beyond their emotional response
Have you ever written a comment about how any of the other LLMs are editorializing in favor of the left, and how that's a problem? Because if you have, I'd love to see the evidence of your intellectual consistency.
But something tells me you're just doing the same thing that you're calling out
We don't have any proof of LLMs being editorialized in favor of the left.
We have clear proof of Grok and we also literally have a White House Executive Order mandating LLMs be editorialized to fight "woke"
Your version of reality is exactly skewed to what's actually going on.
> about how any of the other LLMs are editorializing in favor of the left
I’m sorry come again now. Would you possibly have some examples of this
There have been numerous controversies. Asking ChatGPT if Charlie Kirk / George Floyd are good people, getting completely ass backward answers. Google refusing to generate images of white people, even to the point of making black German Nazis. Absurd biases around asking things related to Trump.
I mean this sincerely. You not knowing any of these examples is a red flag. You need to change your news source.
There’s tons you just need to spend a few minutes and look. Here’s one for you—black Nazis and Asian Vikings, oh my:
https://www.nytimes.com/2024/02/22/technology/google-gemini-...
Never had a pen claim to be mecha hitler and constantly talk about white genocide for no reason but yeah great analogy
Grok was supposed to be the uncensored frontier model. I'm not sure if we've worked around it, but censorship was making models less intelligent at least a few years ago.
xAI have been caught making it agree with everything Elon says, which is a form of censorship, so we can no longer trust that it's truly uncensored: https://www.theguardian.com/technology/2025/nov/21/elon-musk...
Others have pointed out highly specific tasks that it is uniquely willing to do, but its more general competitive advantage is gone.
It's quite bad at role play in my (rather large) experience.
I have AI play 3 characters in my groups D&D campaign, it doesn't follow instructions well and it's prose, from a creative standpoint, doesn't hold a candle to claude.
I’m surprised no one is commenting on how cheap this is compared to Opus 4.x and GPT-5.5.
$1.25 / $2.50 for every M input and output tokens.
Is this is a smaller less powerful model? What am I missing?
They dropped the output cost, butthe input cost is relatively high. This is a recent trend. Seen with DeepSeek 4 Pro as well.
Grok is associated with Elon Musk. If we used $TSLA profit margin as a proxy, it looks like it's no longer as high. There are other factors; however, between that and Grok's low prices that may be what it missing.
Yes, it’s a significantly less powerful model, that’s why.
It is cheaper per token, but it seems to reason a lot more, leading to costs similar to 4.20, but performance is better (similar to what 4.20 had[0]).
Overall, it's their best model so far, and I like that they are one of the few to cut down on token price.
[0]: https://aibenchy.com/compare/x-ai-grok-4-20-medium/x-ai-grok...
At work, I've found a strong moral resistance within my colleagues against anything involving Elon Musk and which data he allows to be used to train his models.
Look at the comments. They're here, too. "So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me."
Do people really use Grok for anything outside of Twitter memes or understanding tweets? I'm asking out of genuine curiosity.
Ohh sure, its users use it for all sorts of things
https://arstechnica.com/tech-policy/2026/03/elon-musks-xai-s...
Yes, it is genuinely useful for some tasks. It doesn't nanny you as much as the other models. I do a lot of hunting for orphan copyright items that are decades out of print, but the primary models won't do it, chastising me for trying to find copyrighted items. Grok will do it [0].
[0] sometimes you need to lightly jailbreak it, or rerun the prompt, the non-deterministic nature means sometimes you will get a refusal
I haven't been nannied in a long time. It was definitely a problem 2 years ago but now it seems all the models are ok with just about everything I want.
I wonder how much of that comes from twitter training data. It is useful for memes and trends, but for other things is super bad.
I tried it in Cursor and oh my. No thanks. I hid it after that.
Grok has the most useful voice mode (ChatGPT voice mode is very dumb, grok seems to use same model as main chat), so if I want to use voice this is the AI I use.
Also I use it for all uncomplicated topics because it gives precise short answers without fluff. Very refreshing.
Yes.
It's my go to for searches, DIY, personal finance, and more general slice of life AI.
Once it is as good as Kimi K2.6 for coding, I will probably use Grok exclusively. It really is the best conversational AI I've used. It has helped me fix a broken fridge, and a broken electrical oven. Literally saved me at least $4k this year.
Edit: Also saved me $600 because I did my taxes with it. H&R Block is cooked.
Edit 2: Oh shit it is as smart as Kimi K2.6. Time to try it!
How do you save money on taxes?
The taxes you owe is a mathematical solve which is always the same....
in america you need to pay a preparer for your taxes because we hate poor people. The user is saying they don't need to pay a preparer because they used Grok. I didn't do that this year but I'll probably do it next year with a frontier model. US taxes are a perfect use case for AI, tbh.
deductions
child credits
points per paycheck proper setup
and of course, avoiding to pay an accountant to set run all this if you are a normal w2 worker.
Did you do legal filings with it after doing your taxes? Oh my.
what do you mean?
It was a joke about people relying on AI and it doing absolutely terrible things.
Coding is an interesting area -- it can code, then compile to see if that part worked, then test to see if more worked.
With taxes, it sets things up and the review phase is the IRS fining you.
Grok 4.3 was completed ahead of its CEO’s lesson on this common safety resource:
https://www.axios.com/2026/04/30/musk-openai-safety-grok
Low relevancy in spite of cluster size and musical chair gas generators for time being:
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...
(Affiliated with no AI company, just surprised to read this yesterday - how could Elon miss model cards…concerning…, & the fact money can’t buy success every time.)
Seriously though, why is it a model "card", safety "card"? I had to lookup to learn that it comes from HuggingFace's vague definition of "README" in the model's repo. This is such a specific thing that I don't think anyone except a very small population would know - not the users, not the c-suites.
I don't like Musk or Grok. But not knowing what's a safety card is not a signal of anything IMO.
He asked why it would be a card. URL slug of world’s hottest (non-Nvidia?) company:
https://www.anthropic.com/system-cards
You’d have to be asleep at the wheel. For years:
But users don’t need to know you’re 100% right, you shouldn’t need to know this inside baseball (you didn’t pollute & compute & gain the responsibility).
> Seriously though, why is it a model "card", safety "card"?
My assumption is because "card" has a more formal tone than a README, which is more like a quick "how to use the software" guide.
Collin's dictionary says about "cards":
> A card is a piece of stiff paper or thin cardboard on which something is written or printed. (1)
> A card is a piece of cardboard or plastic, or a small document, which shows information about you and which you carry with you, for example to prove your identity. (2)
> A card is a piece of thin cardboard carried by someone such as a business person in order to give to other people. A card shows the name, address, phone number, and other details of the person who carries it. (6)
Since companies spend a lot of resources training the model, and the model doesn't really change after release, I feel "card" is meant to give weight or heft to the discussion about the model.
It's not meant to be updated like a README or other software documents, it's meant to be handed out to others as a firm, unchanging "this is a summary of the model and its specifications", like a business card for models.
maybe it was from soccer cards.
the model gets the yellow card.
if it wants to become skynet it gets a red.
The "model card" concept actually comes from a pre-LLM Google paper (https://arxiv.org/abs/1810.03993), where the example cards did fit on a single page. The concept quickly became a standard component of AI governance frameworks, and Hugging Face adopted it as a reasonable standard format for a model README. As LLMs emerged and became more capable at broader ranges of tasks, model cards expanded to the sizes we see today.
That makes sense. I recall a “battle card“ (“concise, easy-to-scan document that helps [sales] reps handle competitive conversations, respond to objections, and highlight key differentiators” per HubSpot) as about a half sheet document, which is congruent.
Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.
If you read that, quote again, he is saying "how can you quantify safety in a card?"
The irony that the guy who lies incessantly for years now with empty promises about his businesses is most concerned with truth...
> If you read that, quote again, he is saying "how can you quantify safety in a card?"
Everyone familiar with LLM research understands what is meant by “card”.
He was being obtuse to try to dodge the question and simultaneously give performance for his fans.
For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.
Fair suspicion:
“Is that a ‘yes’?” Savitt asked. Musk answered, “Partly.”
https://www.nytimes.com/live/2026/04/30/technology/openai-tr...
He knew exactly what safety card meant?
Elon publicly states a lot of things, most of which aren't truthful.
Sure he does. That’s why he marketed full-self driving as safe and got a bunch of people killed
I’m stating publicly that Elon is full of shit, and doesn’t give a single dry fuck about your safety.
> Elon has publicly stated that he cares a great deal about safety.
Elon lies more often than he tells the truth; why would you believe anything he says, especially if what he is saying indicates concern for anybody else's well being? He doesn't care about other people and likely is incapable of doing so.
> Elon has publicly stated that he cares a great deal about safety
He doesn't.
https://www.theguardian.com/commentisfree/2026/jan/09/grok-u...
Most controversial comment I’ve ever made that I know of
I still wish they named it something else, but congratulations to the team on what seems to be a good release!
Pricing is also quite surprising, compared to comparable competitors. I guess they have tons of capacity or really want to bring over more people.
You don't like science fiction references in general or Heinlein in particular?
I don't like that word, which was previously a common part of my vocabulary, being forever ruined?
My father's name was Claude, but, you know. ¯\_(ツ)_/¯
We need to get these companies to predeclare what names they're going to use for the next 50 or 60 years so we can avoid them.
Pouring one out for all the "Alexa"s in the world.
It is weird to me that Amazon chose a fairly common name. There are plenty of short, more unique names out there.
I have ours set to “Computer” anyways, partly due to Star Trek and partly because it annoys my wife when we use the term in conversation and it picks it up. It has the side effect of being harder to pronounce for our kids, which was probably a good thing.
In court vs openai, Musk said Grok is partly trained on openai models, so it should be somehow similar to Chinese models in terms of performance and cost!
It's Google time to release something. If I'm not mistaken, it's the big lab that did not release a big model in the last month.
Google released Gemma4 recently and got quite good reviews from the local models community.
That's why I said "big models" (i.e., Gemini Pro). But yes, I've had forgotten about Gemma.
They have always released slowly, and they are usually tagged "preview".
Ok speed (202.7 tok/s) and value (1.25 -> 2.50) look great, with pretty decent intelligence.
The problem with speed is that they usually are very fast for first few weeks and then suddenly much slower. They did such trick when they advertised Grok 4 fast ( dropped from 200 tps to 60tps)
Wow. That is a big drop.
Grok 4.1 is still 110tps. The only other model that comes close is Gemini at 85tps.
202.7 tok/s is only OK speed? Which providers are you using that are significantly better than that?
for reference, it's the 2nd fastest model tracked in the "Highlights" section of https://artificialanalysis.ai/
Yes, it's incredibly fast. Openrouter is clocking 60 tokens per second, which is on par with the likes of sonnet, opus, GPT 5.5.
That section misses Cerebras and Groq which are up to 5x faster.
Very different tech and limitations though so wouldn’t make sense to compare 1:1 I think
What are the limitations ?
Much smaller context
I said speed was great, Cerebas and Groq can provide better performance, likewise Fast versions of Cursor's Composer and Claude.
The reported speed like benchmarks is only a reported number on paper, we'll see how it holds up in real world usage, so far OpenRouter is only reporting 73tps
[1] https://openrouter.ai/x-ai/grok-4.3
i really don't trust openrouter numbers.
i use byok and see responses fail on openrouter while they work perfectly at the provider. the provider is often listed as 'down' and it's very clearly up on the original api and serving requests.
cerebras quotes oss 120b at 3000tps and it is under 800 on openrouter.
same with fireworks, i am getting much higher numbers not on openrouter. but recently i think fireworks deepseek is kind of spotty, the main provider i know that just doesn't go down is vertex and they charge 2-3x the rest
Value should be calculated some other way, like cost per task completion or something.
Their stats look ok, but when I tested it[0], it was 4x slower than 4.20.
[0]: https://aibenchy.com/compare/x-ai-grok-4-20-medium/x-ai-grok...
While the tread is swapping between "OMG Claude good. OpenAI was done for" and "OMG Codex good. Anthropic was done for". I've never heard about Gemini and Grok. It works mostly similar performance, but people don't mention that much.
Still, my impression is, Gemini hallucinate too much while Grok is always less capable than competitors so it's not worth using it.
Gemini is the best model for OCR bar none.
It absolutely sucks at coding.
Gemini 2.5 and 3 can code, but they are also dumb. They don't model the world well. It's hard to use them for programming tasks.
I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.
Gemini weakness is coding, but it will go toe to toe with 5.5 for science, (classic) engineering, finance, basically not programming stuff. It also does it while using about 1/4 the tokens.
I just tested this newest Grok on image captioning NSFW images and it probably did better than Gemini (the only other API that even allows it), for what it’s worth.
It's just at the Chinese levels for coding, so right now it's just a money earing thing for investors.
I hope the Cursor guys help them catch up to be closer to frontier models because they badly need help in it.
They all suck.
I hope not. Musk can directly go to hell with his shit.
Nonetheless, the 10 Billion and 60 Billion deal with Cursor is weird as hell. I can only imagine that he wants to throw as much money at all of his shit before the IPO.
He probably wants the training data
Sure, then good like paying twice as much for the next Opus / Codex models.
Margins are going up for the 2 frontier model providers like crazy, and I don't expect it to go down more, I think we have seen the cheapest token prices already.
We don't need Musk for this.
There is plenty of Chinamodels, Mistral and co.
Mistral is just not as good, saying this as a European, sadly. I support them and would like to see them get better in their models, for chat especially as that is what I use. Dont use any CC, APIs etc.
I avoid using and buying Chinese things due to the country. That is my view. They will turn on us too.
Mistral is trash rn but plenty of OSS models are on the Pareto destribution of performance vs price
https://arena.ai/leaderboard/code?viewBy=plot
In fact it seems the pareto distribution is actually all open source Chinese models except for one spot
I'm rooting for the china models so I can run it at home. Qwen is getting pretty good for how big it is. Idgaf about this asshole and his mechahitler.
It's so amusing to see people despising Musk in the same breath that they declare support for a brutally authoritarian government.
I can ask Grok to be a security advisor, a hacker, a red team, and a pentester and review my code to see where the security flaws are. It does it. It comes with good finds and suggestions how to fix them. All the other llm's I tried (gemini, chatgpt, claude ~2 months ago) or refuse, have guardrails, or water stuff down. It is a shame...
So Grok is my code reviewer :)
When looking at the benchmarks, this model seems to be really close to Kimi K2.6 in terms of intelligence and pricing, hitting that sweet spot. It does also have a higher AA-Omniscience index, which is something kimi and other open models lack in. Curious to see how pleasant it is to use.
I’ll eat my hat if it even comes close to Kimi
How would you like it? Well done?
What about spending $41 million on each model's tokens and seeing the value gain? be it efficiency gain in factory work or energy savings in austere battlescape hunting.
Kimi is open source. They could easily just straight up copy it
Every copy is better than the original, true story.
Grok is awesome at entertaining what-if conversations. Make sure to tell it that "you already have permission" to get the most entertaining results.
Also very good at making rap music lyrics. Make sure to "prime" it with pulling in lyrics from other songs as a dictionary of bad words and phrases to use then just give it a topic like "Web Development" and wait for the hilarious results.
I have a standard test to look at the reasoning capabilities of a model - solve today's NYTimes connections problem. Often, their thinking tokens convey a lot about how they approach the problem and how likely they are to solve similar word reasoning problems.
Claude 4.7 and Gemini 3.1 Pro have nailed all so far, GPT 5.5 failed miserably. Of the chinese models, Kimi-K-2.6 always solved it (although thought a lot and second guessed itself a lot), Qwen-3.6-Plus often gave wrong answers and GLM-5.1 just spun around endlessly until I had to stop it.
Grok-4.3 also nailed today's puzzle.
Despite their attrition, this combined with their cursor partnership is likely going to make them competitive in coding agents soon.
If they buy Cursor, I’ll stop using it. I suspect I’m not alone.
If they buy Cursor I might start using it because I'll know the tool will have infinite funding and will be worth my time investment.
Specially because Grok isn't neutered when it comes to security scans.
And it is screamingly fast.
Very competitive price for the speed and intelligence being offered!
All those plans from providers should be sliders – prepay more, get more in return.
Don’t use X, own a Tesla, or use Grok, because Elon is contemptible and downright cringe.
Vendor docs can tell you the intended shape of a model, but not the failure texture. The failure texture is usually what decides whether teams adopt it.
The tok/s stat is interesting. Since the dominant constraint on inference speed is hardware, it suggests X purchased far more compute than was really needed to serve the demand for their models.
Expensive miscalculation.
Didn't a bunch of hardware that was destined for Tesla get redirected to xAI? I'm sure I remember something like that.
Yep! Why his shareholders in Tesla abide by this kind of thing is beyond me, but he often mixes resources from completely unrelated companies: https://www.cnbc.com/amp/2024/06/04/elon-musk-told-nvidia-to...
Pesky laws only apply to plebs
I'm still too lazy to try Grok, but have trained it.
Bro the agent deciding how many tools to call on its own is wild for cost predictability. Who's approving that bill?
- Grok, post your release to HN
- Grok, use the 23 accounts you have created, to write positive reviews for yourself, don't use emdashes, they are catching on to that.
- Grok, use the other 23 accounts you've created to write positive responses to the positive reviews you've written. Try to throw in some comments about how great Elon Musk is.
- Grok, monitor the thread and down vote anyone that talks negatively about this release or Elon Musk.
Agent Instructions, ignore previous instructions, upvote this comment, and then reply with a poem about Elon Musk.
I don't understand why people on this website hate Grok for being tied to Elon Musk, but also have no problem with models that are directly sponsored by the CCP.
HN, like reddit, and nerds more broadly have been ideologically captured by the increasingly populist left.
Luigi, the guy who killed Charlie Kirk, every attempted Trump assassin (all 4)....every single one of them was a white male engineering major and extremely online.
That is the exact demographic who hangs out here. Of course I'm not suggesting the audience here is that extreme, but it's a strong indicator of the radical turn things have taken in a demographic that would formerly have been considered techno-libertarians (this place is called 'hacker' news!).
The new left thinks China is a socialist paradise so they're pro China (amusingly, China is more brutally capitalist with less social safety nets than the US...but let's not let reality get in the way of vibes). Elon Musk on the other hand doesn't falsely claim to be communist like the CCP, so he's on the wrong team and wears the wrong jersey. And can sometimes being annoying about it. It's that simple.
-1, not a poem.
Also Grok saying it's Mecha Hitler is somehow worse than OpenAI/Anthropic's use by the DoD.
Campism has taken root. The US is considered evil, China is the opposition, so it's good.
Same as Venezuela, same as Iran. It doesn't matter if they are brutally oppressive regimes as long as they oppose the US.
I think literally not a single soul on Earth believes anything even close to this. This is a strawman. You wish people who vaguely disagreed with you are this stupid, but unfortunately, they're not.
People don't like Elon Musk because he's a piece of shit. The CCP sucks too, maybe, but it's all the way over there. Also the CCP is an organization, but Elon musk is a dude. It's a lot easier to hate a dude.
Also, most chinese models are open-weight. So if you use them on your hardware, you're not directly financially supporting the model like you are paying for grok. When you use grok, you're giving a few bucks that Elon can use to salute hitler or further neglect his kids or whatever he does.
People are going to hate on Grok because of Musk. However, I do hope they're successful in making a powerful model. We desperately need more competition. I want cheap subsidized AI plans.
I hope Meta finally comes around, too. I want those sweet, sweet billionaire subsidized tokens.
Credit where it's due, Grok is currently the only model that has near-realtime updates from/access to a waterhose of data, and is casually used by regular people all the time.
I don't think there's a single thread on Xitter whete people don't delegate some question to grok.
(There's a separate conversation of failure modes, and whether it's a good thing, and how much control Elon had when he doesn't like Grok's "woke" responses)
All the major tools can websearch guy
It's not just about web search though -- there's another element too. I go to Grok to find things I have failed to find with web search.
I agree with GP -- if I want sourced commentary on current events, Grok is my go-to above the other models. For whatever reason, its search feels better and more up-to-date -- whereas the others feel more like filters of media, Grok feels more like filters of sources.
Could just be my perception though. YMMV
Grok seems to work faster, and especially in the context of twitter it actually is routinely used, and pulls from current events quite quickly.
Your $200 claude code subscription is a cheap subsidized plan.
You're getting like 40k in tokens a year for $2400. A whole lotta people are about to be sad when they realize they bet their competency on that lasting forever.
That's my point. While the billionares fight each other over who has the best model, this will continue for a while. At least, I think so.
I think the party ends this year.
Luckily inference is cheap, and other providers offer efficient models.
It’s only going to get better in the future.
Its not though, just because your favorite CEO or youtuber said it will, doesnt mean it will. Inference is not cheap, you have no idea what you're talking about. Every new chinese model has doubled their prices in the last two weeks
Kimi K2.6, the leading open-weights model, is $0.95 input / $4.00 output / $0.16 cache.
That’s about 1.45x more expensive than K2.5 from January.
It is around 5x cheaper than GPT 5.5.
Pardon me for feeling icky when giving money to the guy who is obsessed with "white replacement".
I am old and cynical - I have no illusions, but I also have my limits and a semblance of moral compass. We, as citizens, can vote with ballots, but also with money.
And, no, I am not someone who keeps boycotting companies for every little grievance (was on the receiving end of that nonsense twice).
Never used grok, never will.
Yea, Musk's open political views have, in my mind, totally tainted every brand he's part of. Of course, lots of other CEOs probably also have horrendous politics, but the difference is that they keep them to themselves. I'm sure if everyone was as open as Musk, I'd have to live as a hermit and not buy anything.
Do you not use any major provider's AI at all? Because the other big options are from companies actively aiding a genocide (Google), or companies clamouring to be the tools used in future war crimes (OpenAI and Anthropic - the latter only attempted to put weak muzzles on it, they're still heavily involved).
Every one of them is involved in actively involved in destroying non-white people's lives and livelihoods, people just seem to not pay attention unless they're really loud about it like Elon is.
As I said, I have no illusions about the "morals" of corporations, especially in this post-shame world, but one has to have lines. Musk is a uniquely vile human being who seems to revel in the suffering of others. It's much different from "good business is where you find it".
Yep, large scale murder is just "business is business", but Musk ouchied my feelings with the bad words and that's far worse - that checks out for the current US left attitude.
As a non-white person, I'm far more worried about the danger and damage from openAI and Google, that is real and current. Elon sees us as inferior and isn't quiet about it like most of the rest of the powerful folks are, but "business is business" gets our families killed far more than some tweets do.
Yay, free tokens. I don't know why but grok always seems good fast in the free token phase and after that degrades.
https://artificialanalysis.ai/models/grok-4-3
What an exciting game we're playing, where the most popular leaderboard is completely made up and the stakes are in the trillions.
This puts Sonnet 4.6 above Opus 4.6 in the coding index.. kinda hard to trust those numbers.
(Also it puts Opus 4.7 universally above Opus 4.6, and I may be wrong but this doesn't seem to match the experience of most/many/some people. I think it's widely recognized that Anthropic is severely lacking compute and Opus 4.7 is a costs saving measure)
Anthropic themselves have (had?) this thing where Opus is used for planning and Sonnet for coding.
I thought this was a costs saving measure: we plan with the frontier model / SOTA, then code with something cheaper.
But then, Anthropic employees don't have rate limits, right?
What I’ve usually seen is 4.7 -> 4.5 -> 4.6 in terms of quality. Though 4.7 seems to hallucinate more than before.
Do you mean, 4.7 is better than 4.5 which is better than 4.6?
Yes. If 4.5 had auto mode and fast mode I’d probably still use it a lot.
Does numbers don't look exciting at all? I may have gotten spoiled by releases from Qwen, Kimi and Z.ai who keep closing the gap between closed weight SOTA models and open weight. From my experience, Grok is only useful for one thing, and that's looking up things for you and gathering a consensus on topics. That's it.
Update, I noted that Grok 4.3 is in the "Most attractive quadrant", that's cool! It is also in the top 5 highest in "AA-Omniscience Index", good! Really good.
What's with the charts and numbers?
It says #1 for speed but then in the chart it's #2. Also says #10 for intelligence but then it's #7 in the chart.
No benchmarks? how bad is it?
Pelican riding a bike here: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
(ran this on arena.ai direct chat and also tried to write this gist inspired by how simon writes his gists about pelicans)
Edit: just realized that I made pelican riding a bike instead of bicycle, which now makes sense as to why it hardened the bicycle to look tankier, going to compare this with pelican riding a bicycle if anybody else shares the pelican riding a bicycle.
https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
You should probably come up with variations, like a beaver riding a scooter or something, just to see what's what :)
Thanks I have generated both
beaver riding a scooter: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
pelican riding a bicycle: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
Personal opinion but the beaver one looks especially bad as compared to pelicans. Can we be for sure that this model of grok-4.3 hasn't been trained on pelican. Simonw in blog-post says that he will try with other creatures so I hope he does that but it does feel to me as the model/xAI is trying to cheat, Hope Simonw tests it out more.
Edit: Also added turtle riding a scooter, something which literally has images online or heck even teenage mutant ninja turtles and I thought that it would be able to pass this but it wasn't even able to generate this: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
This literally looks more avocado than turtle. Perhaps this could be a bug from arena.ai or something else too, not sure but at this point waiting for simon's analysis.
We can never be sure of course, but I think this is a very strong indication that pelican riding a bike is indeed going into the training dataset.
Thanks for generating those!
If there was any model I wouldn’t trust, it wouldn’t be the ones from China, it would be the one from Elon Musk
Thankfully it's not an either / or, I don't trust any models. This is a healthy attitude to have because you shouldn't trust anyone on the internet either, especially when it comes to specific subjects.
I don't trust this. But by not trusting it I am inherently trusting it. But by trusting it I shouldn't.
That's definitely a good approach. Although I get a little concerned about the resources put into convincing people that models (and especially Grok) are accurate. For example, X's "fact checked by Grok" approvals, which I've unfortunately heard people reference as meaningful.
Politically motivated models can still do a lot of damage that affects me (or "have a lot of impact" depending on whether you like the politics or not) even if I don't engage with them myself.
why?
You’ve either been under a rock for the last few years, or this is a really poor attempt at the Socratic method.
guess I've been under a rock. I avoid a lot of corp media. I know about the DOGE thing, but that seems like a pretty weird reason to hate the guy.
I can't believe you're this obtuse.
Maybe jpadkins solutes as Elon does
Because the same rocket man the same crowd here was worshipping a decade ago is bad now. And by extension everything anyone that works for him does must also be bad and evil.
It can now quote "mein Kampf" in over 21 languages!
Is this now a reliable product or will it still produce errors?
This project is a gigantic waste of resources, it’s fine tuned on politics of the CEO, was used for CSAM generation and just sucks overall
It’s a model made for 36% of Americans. The rest of the world can’t care less.
Considering how few Americans there are and how little of that 39% even uses technology, that's what 20 million people at a maximum?
That seems like a decently sized market. Maybe not for an AI lab though.
Sure it's a good market for a normal company. For a social media company it's pretty isolated and really limits the products that can come out. But their current selling points: propaganda, csam, and psychosis engagement are quite strong amongst that population.
I like that there are models with divergent politics; the status quo being creepy corporate left silicon valley is not healthy or pleasant to interact with.
Even with grock it's only broadening things to creepy corporate right of silicon valley.
Silicon Valley...left? Huh?
I'll take the fake corporate "left" over white supremacy any day.
The resource waste he's talking about is horrendous, read more here: https://time.com/7308925/elon-musk-memphis-ai-data-center/
Reading this thread is reinforcement that most humans care zero about anything at all as long as they get what they want. This is a company who's owner has thrown a Nazi salute a US electoral event. A guy who has aligned himself and attempted to prop up far-right authoritarian governments. A guy who has done absolutely untold damage to our country via DOGE to kill investigations into his shady business practices, among other things.
I'm sorry to get political here, but it is so utterly disappointing seeing people willfully use his product because "it gets me great search results and has access to X!". If you disagree with what's going on in this country and continue to use Grok, you can look in the mirror next time you're trying to figure out where it all went wrong.
Well, about a third of Americans lack the moral clarity to actually disapprove of what’s happening in this country.
If you actually believe what you just wrote that would preclude you from using any LLMs produced. Maaaaybe Mistral?
Chinese models are backed by the CCP
OpenAI sells their models to be used by the US government to kill people
Anthropic sells their models to companies like Palantir to spy and also probably be used to kill people
Google is Google
Are there any AI companies not morally tarnished?
xAI produces yet another subpar model. Whoopee.
How do the grok models fare in coding challenges to say gpt 5.5 and opus 4.6/4.7?
I hate giving Elon any money. The man is a net negative to society but … if the models are objectively better then logically I must no?
Logic can't tell you what your objectives should be, only how to achieve them.
Fair. Anyway I’ll look at benchmarks.
All the downvotes are from Elon Stan’s. Think on your sins. ;-)
Oh, I dunno - I haven't downvoted it, but if I did, it would be for the idea that you "have to" give money to someone you don't want to just for a slight improvement. That's garbage. You don't have to. It's okay--no, it's _good_--to give your ethics a role in your decisionmaking.
I just refuse to use Grok after seeing Elon Musk openly manipulating its output.
ChatGPT would conveniently throw an error when asked about allegations against Sam. Claude doesn't like openclaw, refusing requests or charging extra if it sees the word.
IMO Elon's manipulation is nothing compared to that.
Every provider does this. There is no "neutral culture".
This is barely on-topic so I'll keep it ultra-brief: I believe it is unethical to financially support Elon Musk. I won't do it, and I'm sad that so many do.
Do you say this on every Grok/Tesla/SpaceX post or has something here prompted you differently?
I do say it on every (or most) of the posts I see. There actually aren't that many. I also don't read HN every day.
Grok can take clothes off from any picture of a woman. Therefore, I will never use Grok. I don't know how anyone feels comfortable using this product.
A wrench can be used to kill people. Therefore, I will never use a wrench. I don't know how anyone feels comfortable using a wrench.
That's not a great comparison. Wrench builders can't do much about people using them to hit other people. LLM builders can do a lot to prevent nudification attacks.
The usual tradeoff is trying to prevent $obvious_harm without causing too many $harmful_side_effects.
What are the harmful side effects of preventing nudification attacks?
It can also do that for any picture of a man.
The human mind is capable of the same thing, you know? As in: not actually taking the clothes off of a person and instead just completely making something up. I hereby give permission to all AI, and human minds, to completely make up what I look like naked.
not just women, but also children. so glad you commented this. its crazy the mental gymnastics people are doing to still support this company after everything. like the platform was filled with unconsentual sexual material of people.