sonzohan 23 hours ago

I also recently decided to buy a datacenter GPU and slap it into a system. Some notes from my experience that the author doesn't mention in their article:

Decommissioned NVIDIA V100s and AMD MI50s are fairly cheap, $200 for 16gb and $400-500 for 32gb, for local experimentation. They are also very old. There's an enthusiast community keeping these two cards alive and working with current platforms and models.

Nitpick, but the V100 doesn't support bfloat16. The performance hit is not a big deal if you're fiddling with local models, but the card is on it's way out in terms of hardware features.

The MI50 does support bf16, but not the current edition of AMD ROCm. Vulkan support is good and the MI50 works with most major platforms (llama.cpp, vllm, etc.), but it's not without some pain points like manual recompilation. Fortunately the open source community has already paid most of your way.

The cooling requirements for these cards cannot be understated. A consumer grade GPU may throttle if in a small case without additional fans, but if given the same treatment a datacenter GPU will overheat itself idling. You will need to buy, at least, a bunch of decent 120mm fans to prevent this or invest in some water cooling.

I ultimately went with an AMD MI100 32GB ($950). I'm an AMD fan, current ROCm editions support it, and it was low-fuss to get things working. I'm debating getting a second so I can try out bigger models like qwen3-coder-next.

  • Silagi 22 hours ago

    Did you consider the R9700 or B70 when you went for the MI100? If so, what made you choose the MI100?

    I've been playing with picking up a card in this class but haven't been able to justify it when running the Qwen3.6 MOE model on a 6800xt is tolerable for the type of projects I've been willing to point local AI at.

    • sonzohan 21 hours ago

      I looked at those, the Arc 1100, the w6800, MI50, MI60, v100, v620, and basically anything with 32gb of RAM:

      1. I wanted an AMD card.

      2. I have an RTX 3090 that's been fun to play with, but I want to get back to using it for gaming.

      3. I was looking for between 30-60 tokens/second in terms of performance on the beefier models I want to run. Looking at stock Qwen3 32B the benchmarks reported about 41 tokens/second for MI100. w6800 was 18, MI50 & MI60 could do 60s but had a lot of compromises/special things to achieve that.

      4. I used FitMyLLM for some spec-based comparisons (https://www.fitmyllm.com/). The MI100 is roughly double the performance on Qwen 3.5 35B A3B Q5_K_M to the R9700 (462 token/s prefill vs 239 tokens/s, 217 tokens/s vs 118 token/s for inference)

      5. I was willing to throw up to $1k at a GPU; I really wanted to throw closer to $650.

      To be honest, if money was no objection I would've sprung for a MI210. I also considered the MI250 as they showed up for $1250-1400 with a whopping 128GB, but the PCIE converters for that form factor don't have working AMD drivers yet.

      • rft 21 hours ago

        > The MI100 is roughly double the performance on Qwen 3.5 35B A3B Q5_K_M to the R9700 (462 token/s prefill vs 239 tokens/s, 217 tokens/s vs 118 token/s for inference)

        Those prefill numbers look really low to me. I can run nearly that same model (qwen 3.6) at q4km with q6 cache on a single 3090 and get 2.3k-4.4k prefill and 100-170 generation. Just based on raw numbers I would expect the R9700 to land around 70-90 generation (about 2/3 of memory bandwidth of a 3090) and at least the same or higher prefill (nearly 3x FP16 TOPS on the R9700). That means the numbers really don't add up. Is the benchmark done with some special settings, e.g. parallel requests or with very low prompt length?

        • sonzohan 20 hours ago

          Numbers are from https://www.fitmyllm.com/ so they're not a real hardware benchmark just what you're expected to get. YMMV.

          • rft 20 hours ago

            Ah, ok. I took a look at the 3090 numbers and they list 400 tok/s prefill, so if I normalize my expectations to that base line the numbers you posted do make sense. I haven't dug deep into that site's methodology, but their estimates seems way off. Especially since they don't take into account cache quant when deciding whether or not you can run a model. Overall I found that website a bit confusing, but maybe the UX just didn't click with me.

  • doubled112 21 hours ago

    > if given the same treatment a datacenter GPU will overheat itself idling

    I have a friend who has learned this through several server grade cards over the years.

    Yes your Intel 10G NIC was cheap. No you cannot just stick it in your desktop. It is expecting server level airflow, probably with a cold intake side.

    He printed a fan mount, slapped it on, and they’ve been happy together since.

    • cthalupa 20 hours ago

      I've had multiple X520-DAs in desktop cases for years without doing anything special for cooling. Hell, one of them was in a fully watercooled system that had very little in-case airflow.

      • p_l 19 hours ago

        The -DA might have been critical in them doing so well ;-) -TA2 might have heated somewhat mightily (only recently lower thermal power twisted-pair gear started showing up)

    • gerdesj 19 hours ago

      Show your friend QSFP and QSFP+ ...

  • kethinov 21 hours ago

    qwen3-coder-next runs fine on my consumer grade nvidia 4070. Performance is not spectacular, but it's only a little bit slower than a properly-fit model.

    • sonzohan 21 hours ago

      What are your settings and tokens/second? Even with 2 GPUs (MI100, RX 6600 XT 8GB) and 32GB of RAM it was running at a snails pace for me.

      I didn't try a sched_spread with a 3090 and the MI100 which would provide 56GB ram

      • kethinov 16 hours ago

        It's not speedy. I get 1-3 tokens per second.

        The machine:

        CPU: 24 × AMD Ryzen 9 9900X 12-Core Processor

        RAM: 128gb

        GPU: NVIDIA GeForce RTX 4060 Ti 16gb (I typo'd the GPU above)

        (This is via Ollama on Ubuntu.)

        But 1-3 tokens per second is much faster than a lot of other high end models I've tried, so I was pretty pleased with it. Obviously other models run much faster on this hardware though.

  • overfeed 20 hours ago

    > You will need to buy, at least, a bunch of decent 120mm fans to prevent this or invest in some water cooling

    There's a cottage industry of 3D-printed fan-shrouds for data center GPUs - 120mm are often the sweet spot for quietness and practicality. The shoud smugly fits the GPUs intake, so it gets all the airflow from the attached fan(s), whose speed curves can be attached to GPU temperature.

    • sonzohan 19 hours ago

      Shoutout to the makers and hackers supplying those!

mickeyp 1 day ago

Impressive work. But the problem is not the 30 tok/s which is fine for agentic coding and chat.

It's prefill; slow prefill kills agentic workloads dead.

If you have 100,000 tokens at ~150tok/s per the OP, you're looking at:

    You have: 100000 / (150/s)

    You want: hms

     11 min + 6.6666667 sec

Which is quite a wait indeed.

  • Aurornis 1 day ago

    Most people won’t be dumping 100K tokens into it at once, but I agree that all of the prefill time that adds up during a session becomes a lot to account for.

    This is also a problem for all of the Mac local LLMs. Macs are a great way to get a lot of high bandwidth memory, but their compute is very far behind current gen dedicated GPUs. Some of the expensive Mac Studio setups allow you to run very large models with usable tokens/s, but you can be waiting a long time for it to get to the point of generating those tokens.

    • Tepix 23 hours ago

      When you're using OpenCode it's easy to reach 100,000 tokens after a while.

    • pyrolistical 20 hours ago

      The prefix cache is working properly 100k doesn’t prefill more than once

  • HarHarVeryFunny 23 hours ago

    I wonder if this could be usefully mitigated with a combination of prompt (prefix) caching and an agent that let you control what the prompt prefix consisted of. The goal would be to incur that slow prefill once to build the prompt cache, then have subsequent prompts consist of mostly this fixed prefix plus specific instructions.

    For a language like C++ where modules are split into definition (.h) and implementation (.cpp) parts, one choice of prefix would be all the header files for the project (which aren't likely to change much).

    More generally the idea would be to have an agent that had cached-prefix reuse as it's primary context management goal.

    Another possibility, to support caching of files that have since changed, would be for the agent to build the context as a fixed prefix reflecting some or all of the codebase in its start-of-session state, then append any changes to that, with appropriate prompting to only use the latest definition of a function.

    e.g.

    Say file A initially contains functions X, Y and Z, then the prompt prefix is built to include X Y Z. If the user then modifies Y -> Y', then just add that to the context, so that the cached prefix is unchanged, giving X Y Z Y'.

  • pastage 23 hours ago

    A quick search say that this is a standard feature you cache the prefill and load it at PCIe bandwidth so it should be about 0.2s

  • anigbrowl 16 hours ago

    Can't you structure things like loading a codebase or priming with reference material to happen overnight or during meal breaks etc? I guess it's frustrating if you want to switch to a project and have the LLM begin co-working with immediately, but even the best human collaborator would require a long period to get up to speed before being able to make meaningful contributions.

Teknomadix 1 day ago

Tesla V100 SXM2 16GB is NOT DGX class as the author writes. It's HGX class. The V100 comes in two classes, SXM2 and SXM4, the latter coming with a Max of 80gb on board memory. Typically these are installed 8×A100 80GB SXM4 on an HGX riser, and what that gives you is NVSwitch fabric and 640GB of pooled HBM2e (on package stacked memory /w ~2 TB/s of memory bandwidth). 2u standard rack footprint too.

  • legitronics 1 day ago

    I have no idea what you are trying to say.

    V100 came as sxm2 and sxm3. And it was 16 and 32gb.

    HGX is DGX with extra toppings.

    • _zoltan_ 22 hours ago

      No. DGX is when it's Nvidia's design, HGX when it's a 3rd party design.

  • _zoltan_ 22 hours ago

    What on earth are you talking about? Your comment makes no sense.

    The V100 and A100 are different generations altogether.

    The V100 does not have 2TB/s.

abejfehr 1 day ago

Based on the title I was really hoping to see how this was used for gaming, but they just ran an LLM on it

  • axpy906 1 day ago

    Same. With no new NVIDIA gaming GPUs this year, seems like an interesting problem to solve.

  • mschuster91 1 day ago

    I don't think that is even possible, every piece of silicon on that chip that is required to do gaming is ripped out in favor of more compute cores.

  • darkwater 1 day ago

    They said in the beginning that it doesn't even have a video out, so you cannot do gaming.

    • yjftsjthsd-h 1 day ago

      I thought you could run games by rendering on one GPU and outputting on another? Usually comes up with dual iGPU/dGPU setups, but could work here

    • toast0 1 day ago

      I've seen things where you have multiple video cards and can use one gpu to render to a framebuffer which is transferred to the other video card to output. I'm sure it adds latency, and it's probably unsupported... But no output doesn't mean can't do gaming... It just means gaming will be iffy.

      There's some virtualized desktop server stuff too. Run a bunch of desktop sessions on a beefy computer and send a video stream to desktop players. With the right codec settings, the latency is probably ok for many games.

    • hakfoo 23 hours ago

      I'm actually surprised there hasn't been a dedicated effort to support display offload to, say, the CPU's iGPU.

      I'm sure manufacturers would love saving a dollar per card, and OEMs would appreciate eliminating the support calls from "I just bought a new $2500 gaming PC and no video" because they plugged the monitor into the iGPU instead of dGPU.

      • NortySpock 20 hours ago

        I have occasionally wondered the same thing.

        Thinking about it more, on my setup I have a DVI port on the motherboard that I would be happy to use with a DVI cable, but I instead need to buy a DisplayPort <-> DVI converter cable to plug directly into my video card...

        Yeah, seems like an obvious thing for some motherboard providers to want to provide.

      • p_l 19 hours ago

        This is exactly what "Optimus" and "hybrid graphics" is, the issue is that you need to configure that - laptops will provide information to OS "hey, this card has no video output" or "hey, there's an output MUX connected to output X on iGPU and output Y on dGPU", and drivers pick that up and know they have to setup transferring frames between the two or trigger the mux etc.

        nVidia has also used the datacenter cards to run GeForce Now, at least for some lines of the cards, plus some of them come with license (or you can buy it extra?) for nVidia GRID that provides more flexibility for multi-instancing etc to run in virtual desktop

    • lightedman 23 hours ago

      Never a problem. RemoteFX does (did) everything you'd want. Make your OS, log in remotely through an accelerated client. The real problem is Microsoft did something around Windows Server 2008 R2 that killed performance (literally halved it) for RemoteFX. You're only now reobtaining the virtualized video performance we used to have back in 2008.

matja 1 day ago

The AMD MI250X GPUs are also interesting - 128GB of HBM2E at 3TB/s, sometimes you see them second-hand for under $1k, the catch obviously is that it needs an OAM socket. Never seen an easy way to hook them up to a regular mainboard.

  • Teknomadix 1 day ago

    These are interesting, and offer beefy through put. No point in adapting to a PCI lane thought, stuck behind the slot-bus bottleneck.

  • Gracana 1 day ago

    An additional complication is that MI250Xes are two GPUs in one package, so you need to connect the first and last x16 SERDES groups to the host, otherwise you'll only see one GPU (or it won't work at all, idk).

    Also, the cheap HPE pulls on eBay need some proprietary HPE magic to work, and I have yet to see anyone figure that out.

  • plagiarist 1 day ago

    Ahh luckily this OAM socket will prevent me from spending money.

  • sonzohan 21 hours ago

    This person has built a converter for the OAM socket, but it is only confirmed working with NVIDIA cards at the moment (https://www.reddit.com/r/NVIDIA_SXM2PCIE/comments/1d076cn/oa...)

    It fits an MI250X, and the system sees it, but the drivers don't work. They tested an HPE MI250X. There's a rumor on the thread that there are two kinds of MI250X: Ones from HPEs and everyone else's. The HPEs require a special firmware, the normal ones do not. However, the majority of the MI250Xs on the secondhand market are HPE so caveat emptor.

mondainx 1 day ago

Great write-up, I've often considered these DC cards for a project and now you've convinced me to pick one up; you describe the price of the unit against what one spends on tokens and that does it for me.

  • tymscar 23 hours ago

    Thats why I did it. I think it’s important to put things like that into perspective

lucamark 1 day ago

Congrats! Most people won’t want to debug drivers, kernels, ACPI, adapters, and fan headers. But for those who do, the capability-per-pound is absurd.

neutrinobro 17 hours ago

Very nice to see this older hardware getting repurposed. I have been running 2x Tesla V100s in a dual-core supermicro X10DRU-i server. With qwen3.6-27B-mtp I get about 35-40tok/s for inference for moderate context sizes (<128k), and have run long running agent tasks on it which consume 100s of millions of tokens (>$100s if I had to pay claude API costs). However, the main purpose that I have to for these cards is for scientific compute, the FP64 performance (7+ TFLOPS!) is fantastic given their age, and not something you can get on even the latest consumer grade cards since Nvidia nerfed their performance after Kepler. The server lives in the basement though...it is freaking loud!

bob1029 1 day ago

> And yes, if you want the absolute best, Opus 4.8 exists. It also costs more per 20 minutes of heavy use than I paid for this entire GPU and adapter setup combined. But the gap is shockingly small.

I don't think this is a fair characterization of the situation. I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month. The fact that we figured out how to burn double this in 20 minutes is impressive, but I don't think it reflects the reality that many are experiencing right now. There are some exceptionally gluttonous approaches to harnessing LLMs that I think are serving as convenient straw men in these discussions.

Paying for the API will almost always be more economical than self-hosting equivalent infrastructure. I am not against self-hosting, but the article suggests a primarily economic motivation for this effort. If you are consuming fewer than 10^9 tokens per month, I really don't think it's worth your time to try and compete with the hyperscalars. Most of the money is to be found in the integration of this technology with existing businesses.

  • vidarh 1 day ago

    I use hosted providers myself, but I can churn through $100 worth of tokens in half a day even with cheap models like Deepseek easily. If someone's use is as light as yours, then sure - grab a subscription and you'll save far more. For higher use it will come down to how cheap your electricity is whether it is worth offloading at least some of it (for me it's not, FWIW)

    • iJohnDoe 1 day ago

      Could you share a bit about what you’re working on or what type of projects require that much usage? Is it hobby, production, revenue generating?

      • vidarh 1 day ago

        A mix. I have hobby projects that churn through that much when I don't need the tokens for others things. I also have projects for clients that easily consumes those levels. As well as a stealth-ish potential startup. Currently I'm at 4 different subscriptions + more than I'd like in spend via OpenRouter...

        What multiplies it very quickly is when you start feeding them with test suites and "Ralph loops" that run until the test suites pass, or complex chains with lots of sub-agents being triggered.

        If you're sitting there watching everything, it will be hard to burn all that much even if you're running multiple things in paralle.

        • codebolt 21 hours ago

          I'm skeptical of letting agents run free like this. Even Opus makes decisions I don't always agree with. And I quickly lose my mental model of how the code is evolving.

          I get more enjoyment and better results when the coding process is me and the agent working through a plan, at each step sparring over what to do next and how. Then I also catch the bad decisions before they manifest in the code.

    • solenoid0937 21 hours ago

      Same, very surprised when people on HN are shocked by high token burn - it's really not hard if you've figured out how to use LLMs!

  • oceanplexian 1 day ago

    Claude is something like $35 per million tokens. If I was using API pricing I could trivially spend $100 in a single hour long coding session, with /fast turned on in about 10 minutes. Not sure how you guys are using it.

    • foolfoolz 1 day ago

      coding is the easy part of using claude

    • MattRix 1 day ago

      Opus is normally $5 per mtok, no idea why anyone would use /fast if they were at all concerned about price. ($5 is still pricy though tbh)

      • krzyk 1 day ago

        Opus is $5 per mtok of input tokens, but $25 for output.

        • MattRix 2 hours ago

          Yes, but input is usually what people are talking about since that is the vast majority of token usage.

  • KronisLV 21 hours ago

    > I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month.

    According to ccusage (https://github.com/ryoppippi/ccusage) if I didn’t have the 100 USD Max subscription, I’d have to pay Anthropic around 4173 USD for the month of May.

      Input     │ Output     │ Cache Create │ Cache Read    │ Total Tokens  │ Cost (USD)
      1,948,016 │ 19,435,081 │ 103,626,350  │ 6,244,194,278 │ 6,369,203,725 │ $4173.09
    

    Edit: pulled the latest numbers, not using Fast mode at all, but still Opus for most tasks.

    Nothing too egregious with my usage patterns, typically Claude Code just churning tasks in 1-2 projects at a time, sometimes while I’m asleep - and I hit around 60-80% of the weekly caps most of the time.

    • bloudermilk 21 hours ago

      How do you orchestrate this? I’m on max and would love to be hitting my caps when I’m not actively working a project

      • KronisLV 21 hours ago

        In my case: the Claude Code desktop app makes having a bunch of parallel sessions easy, at least compared to when I had just a bunch of terminal windows open https://claude.com/download can also couple that with Remote Control https://code.claude.com/docs/en/remote-control

        Previously I still had the issue of it occasionally stopping let's say after Stage 2/7 is done in some plan and asking me to continue, though I was asleep. The options there were either looping it (like RALPH loop), or more recently they also released their dynamic workflows alongside Opus 4.8: https://claude.com/blog/introducing-dynamic-workflows-in-cla... and now I just use that.

        So essentially you come up with a plan and just ask it to create a dynamic workflow for you, and it's gonna go through everything step by step, sometimes parallelizing (as it normally would with sub-agents) as necessary. Can also use worktrees if needed.

        Here's an example of the UI: https://imgur.com/a/4Gr3Z2T (note that I'm using DeepSeek there for a small local utility, with a tool I'm using for managing various providers with Claude Code, but works the same with subscription)

        I looked at the stuff Cline was doing with their Kanban boards too, but in the end realized that I don't really need those (for now) and that Claude Code is enough.

segmondy 1 day ago

The most interesting and perhaps useful for most would be how they control the fan. If you are thinking of doing this, you really want to get those fans under control, they are loud. For anyone thinking of these, v100s idle super high! 25-35watt with nothing loaded and easily 50w when a model is loaded.

omarqureshi 1 day ago

Could probably avoid the crazy fan with a waterblock - I've seen a whole kit, v100 + PCIE adapter + block for £235. Yes, you'll have to pay for pump, radiators and radiator fans, but that should really quieten it down

suralind 21 hours ago

This is great! I've been trying to get into local models for a while as I share the sentiment that local models will eventually be so good that there won't be a need to use frontier models for most coding tasks (perhaps that's already true today?).

I have zero experience building computers - where would I even start? I mean, aside from the things already well documented and mentioned in the blog post.

  • tymscar 17 hours ago

    Building computers is very easy. I would suggest watching a YouTube video to get the general gist, and then once you buy the parts, just Google for whatever doesn’t go well.

    I built my first Pentium 4 one when I was like six, so I’m sure someone much older that’s into tech can do it without an issue.

    There are also tons of Discord communities that are willing to help you live if you encounter any issues.

jonhohle 1 day ago

I was just looking into this and was worried about the fan setup. Interesting that he was able to solve it with good results.

In case anyone is interested, I’m using PCIE passthrough on a FreeBSD host to a Linux guest with an older Pascal card. It’s worked great and I’ve been thinking about putting a nicer card in there. The SXM route seems great, but I’ve been burned (almost literally because of the heat) by DC components before.

j4k0bfr 5 hours ago

Thank you for writing this! And sharing the NixOS config. I've been looking for an excuse to get into Nix for ages, but this might finally do it.

P.S. Man the AI writing complaints on this thread are quite upsetting. OP is handling it a lot better than I would, lol.

mettamage 23 hours ago

> The way it works is that a vision encoder (similar to what ChatGPT and Claude use) takes image pixels and translates them into the LLM’s token embedding space. The model does not “see” the image the way a human does. Instead, the vision encoder compresses the image into a sequence of vectors that live in the same mathematical space as text tokens. The LLM then processes those vectors as if they were just another sequence of tokens.

Could you also do this for music and specifically sound synthesis? It would be awesome to vibe synthesize sounds and then see the VSTi parameters surrounding it.

  • Ey7NFZ3P0nzAe 9 hours ago

    I don't think so. Cramming new senses into the latent space of the model is one thing, but having a model output tokens that can be detokenized into sound is completely different and requires a very different type of data.

rbanffy 18 hours ago

This lack of driver support for hardware that’s still functional and available in reasonable quantities is annoying.

I was very close to buying a retired POWER7+ server with an ungodly amount of memory, but decided being unable to run a modern Linux kernel would be more work than I wanted to have. Modern kernels need POWER8 and above.

whoamii 1 day ago

The real question: did your local LLM write this post?

  • 20wenty 1 day ago

    There are many tells aren't there? There was clearly hard human work and experimentation here, but it's a shame the OP let AI do chunks of the writing. Once you see it, it's much harder to take the post seriously.

    • iugtmkbdfil834 1 day ago

      I disagree. Not everyone has a good writing style. In those instances I think it is fair to default to llm recommendation. We may be allergic to it, but we saw one formulaic response too many ( though admittedly it does raise a question of whether HN was the intended audience for it ).

      In any event, not all of us have a unique writing style worth preserving just like not all of us can write clear and clean code. Just saying.

      • unshavedyak 1 day ago

        I really wish it was more common to use AI for augmenting than authoring. Eg i find coding with LLMs neat when you primarily "talk" to it through code, by filling out structs, funcs, fields, etc - where it would use your changes as the template and then to work to effectively autocomplete the gaps. The more you iteratively write the less it fills in, but also the less it deviates from your intent, design, etc.

        I feel like writing could use a similar harness, where it attempts to minimally reword the authors sentences, perhaps just tweaking grammar, spelling, etc. In the coding example i think the human code would be near unchangeable, the LLM would pivot around it - but in the writing example i think the human writing would have to be more mutable. I imagine it would be a configurable setting.

        I've not really seen a system which focuses on this human<->LLM look, but it feels interesting to me.

        • iugtmkbdfil834 1 day ago

          In a sense, there is a clear market for it ( people want 'authentic' experience ). I can kinda understand it. I want pure linux experience without systemd, but I recognize that in the current ecosystem, it comes at a cost.

          So the language harness makes sense to me, but corps are already cracking down on token use ( and such a harness would likely only add to the cost ). The other question is whether the people, who could benefit it would even recognize it as a problem though.

          • yjftsjthsd-h 1 day ago

            > I want pure linux experience without systemd, but I recognize that in the current ecosystem, it comes at a cost.

            Running Alpine/Gentoo/Devuan isn't that expensive. (I'm assuming the cost is time/effort when I say this; let me know if there's another relevant metric)

            • iugtmkbdfil834 22 hours ago

              No, you are right on point. I think I reached the same level of 'troubleshooting fatigue' my buddy did ( but he does that for a living, which adds another layer to this ). At certain point, I just want stuff to work. And right now at least, systemd provides least amount of annoyance in terms of time spent chasing issues on home machines.

              FWIW, I tried Void and Devuan, but that may have been too early for me then. Naturally, now that stuff mostly works, I am debating whether I can make that attempt again;p

      • gsquaredxc 1 day ago

        It’s not about preserving a unique writing style. When I see LLM writing my brain automatically discards the content of the writing. To me, seeing LLM writing is equivalent to going to a high-end restaurant and getting served on generic paper plates. Sure, the food looks perfectly fine and there is, in theory, nothing wrong with a paper plate. Once you see that paper plate, however, you will question how nice that establishment actually is, because a lack of care for the plates undermines the quality of the food. You automatically categorize all establishments that serve on paper plates in a specific category, one that might make you concerned if you will get food poisoning that night. LLM writing is exactly the same way for me. I don’t know if this LLM-assisted piece of text is actually a Michelin three star establishment or has had several heath violations in the last year. However, I didn’t pay for it, so putting in effort to determine if it’s LLM-assisted writing from an expert or just LLM slop that isn’t from the purported author at all isn’t worth the time.

        I’m much more willing to read typos and bad writing than LLM writing. If I want to read the LLM rewritten version, I can run an LLM over the original writing myself. I have not yet found true that anyone is better at prompting than anyone else in a way that suggests that I wouldn’t get substantially the same results myself. Thus, I don’t think providing the version that has passed through the telephone game is accomplishing something that couldn’t be done by readers later. I have spent the vast majority of my life reading the original writing styles of people and didn’t have an issue then. I’m not convinced a problem I had was solved when we started post-processing writing with an LLM.

        • lukeschlather 23 hours ago

          I skim a lot. I skimmed this article and appreciated the author documenting their process. I am indifferent to LLM or human writing for technical content. I suspect I skimmed most of the LLM parts, but judging writing quality was not why I read this post, I read it because I was curious about how useful the GPU is, and if I could replicate the author's work. Some carefully written prose wouldn't have helped me do that any better. The prose in this article did the job.

          • iugtmkbdfil834 23 hours ago

            This is mostly how I feel about it. If anything, the weird llm jitters served almost like punctuation markers. Still, I get why it riles some people up.

    • xp84 1 day ago

      (TL;DR Can we just judge written works by their actual content?)

      I’m really in the “who gives a shit” camp on something like this. A lot of people probably have an LLM punch up a blog post. It is good at turning bullet points and notes into prose, fixing run-ons, etc. Maybe I’m naive but I trust that the kind of person who posts a clearly noncommercial post like this on HN gives a crap enough that they read the final draft and confirmed it isn’t inaccurate.

      This pearl-clutching about the mere use of AI regardless of how responsible or appropriate the use is, seems like a professor in 1985 throwing an essay back in a student’s face as “this was obviously printed from a computer and not typewritten like a PROPER essay! I can tell just by looking at it!”

    • tymscar 23 hours ago

      Not at all, no. I had this chat before about how I am one of those unlucky few that loved the way LLMs write nowadays since the mid-2000s.

      Slowly but surely, I had to remove my beloved lists, emojis (though LLMs do less of that now, maybe I can incorporate them back), and emdashes.

    • lowbloodsugar 22 hours ago

      Oh then please stop reading. There are many of us who are really good at solving complex problems and also really bad at communicating them. Your attitude is just the latest bastion of bigotry. So do feel free to self-select out of useful knowledge and experience.

    • Gormo 3 hours ago

      > There are many tells aren't there?

      There are no usable tells that apply generally in the first place. Pretty much all of the hyped-up memes in circulation about how to detect LLM output are highly unreliable.

      LLMs write the way they do because they are trained on common patterns of human writing. All of the tropes people point to in LLM output are there precisely because they've already been in widespread use for some time.

jmyeet 1 day ago

Some context:

- In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use;

- A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive;

- A 5090 has ~21k CUDA cores vs ~5k;

- The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation.

Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.

Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.

  • b112 1 day ago

    I bet 3 years, but otherwise agree.

  • echelon 1 day ago

    > Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.

    Even more interesting: it'll devalue all of SaaS and the entire US tech sector.

    We might have just shot our most valuable non-AI tech products in the foot.

    • wholinator2 1 day ago

      How so? I understand that flooding the market with physical goods will reduce prices and thus profits. But how would that also reduce the nonphysical SAAS stuff?

      • mschuster91 1 day ago

        > But how would that also reduce the nonphysical SAAS stuff?

        The resulting economic crash will affect everyone, we're (IMHO) looking towards a dotcom-bust level wipeout. And many SaaS and other companies run asset-lean (i.e. they have no server hardware because that's all cloud, no real estate because it's all either wework or conventionally rented), margin-lean (the VC business model requires that, as the basic recipe is to achieve market domination by burning cash) and cash-lean (often enough, it's less than a quarter of expenses on the bank accounts).

        All that "lean-ness" looks great on an investor's quarterly release sheet: no massive amounts of wealth tied up in assets and no cash sitting around on bank accounts that could be released towards investors as dividends or, if it comes from third parties, costs the company interest... but it prevents resiliency against crises.

    • mschuster91 1 day ago

      > We might have just shot our most valuable non-AI tech products in the foot.

      Counterpoint: the fiber buildout during the dotcom boost. That crashed the economy pretty hard when the bubble burst, but we are still benefitting from all the dark fiber that was arranged for and built out back in that era. A lot of today's ISPs were able to grab up that fiber after the bust for cents on the dollar.

      Assume that OpenAI and Anthropic go bust, which at least one of them likely will, and possibly a fair few of the datacenters that are under construction will also collapse. Someone will be able to snatch these physical assets again for cents on the dollar and run open-weight models on them or train new ones.

      The problem isn't (and no, this is not an AI tell, everything I write here got typed on a 2022 M2 MBA by hand) the assets, they will be put up for productive usage, just as with any other large bankruptcy or bubble in history. The problem is the "IOU" that is being passed from one hand to the next like a hot potato. Assuming a recovery of, maybe, 20% after the collapse, at 1.6 trillion dollars of assets under management by some kind of private investment/debt we're looking at about 1.3 trillion dollars in valuation that is going to be wiped out.

      And given that a lot of the investment market is actually backed by pension funds... this is going to be a bloodbath. Not only will there be a lot of people laid off in addition to the layoffs we already saw "due to AI", but when the pension funds and thus their payouts collapse? We'll see retirees flooding the employment markets who just try to make a living, rendering the situation for everyone else even worse. Flipping burgers used to be a gig for students, these days students compete with people of all ages desperate to survive - and thus desperate to undercut others in wages.

      Another problem will be the capacity buildout in the semiconductor industry. It's already heading toward an oligopoly after numerous boom-bust cycles: you only have two and a half GPU chip vendors (NV, AMD, Intel), two vendors of general-purpose CPU vendors (Intel and AMD - I exclude Apple because they do not sell their CPUs to any third party and ARM because 99% of non-Apple ARM chips do not go towards servers, desktops and laptops), three RAM manufacturers (Samsung, SKhynix, Micron) and two and a half physical chip manufacturers (TSMC, Samsung, Intel). When the AI bubble bursts, it will be one of a hell of an effort to prevent at least one actor from going bankrupt.

      [1] https://prospect.org/2025/11/19/ai-bubble-bigger-than-you-th...

      • hakfoo 22 hours ago

        You're expecting that there's going to be a supply collapse only, but there's a real risk the collapse hits both supply and demand.

        A lot of the current AI business is FOMO and vanity metrics. Nobody really wants to acknowledge the support tickets where the first three responses are the customer cursing because they didn't appreciate being handed off to a chatbot, or the reworks, or the compliance/policy/privacy concerns, or the internal friction and brand damage it's causing.

        Right now, a lot of that is being dazzled away by how "cheap" the alternative is, since it's built on an unsustainable cost base. It's like someone opened a "restaurant" where the food was actually supplied by making a bazillion new DoorDash accounts to claim promotional credits and having them drop the food at the "kitchen". During the initial phase, the customers will forgive that the burger was cold because it was $1.79.

        Once the funny money runs out and services start shuttering or pricing for actual profitability, people are going to ask about actual quality and return on investment. There will be a demand rollback.

        Even if you can do it cheaper with an open-model running on fire-sale hardware, we probably don't need 500 "chatbot listens and transcribes your meeting" services that weren't that much better than dictation software running locally on a Pentium III. We probably don't need AI-powered support experiences that manage to be worse than actually keyword-searching your company's Confluence. We probably don't need to be spinning up coding agents to spend 15 minutes discombobulating and bibblewabbling and re-reading 82 billion tokens of context before making a two-line change that an actual developer with learned experience in the code would make in 15 seconds.

        • mschuster91 17 hours ago

          > You're expecting that there's going to be a supply collapse only, but there's a real risk the collapse hits both supply and demand.

          That was what I alluded to in the last paragraph. Semiconductor industry and everything associated with it will get screwed hard.

          But in case you mean a demand collapse from the entire economy because even more people get laid off... yes, agreed. Dotcom bubble bust, here we come, full steam ahead.

          > We probably don't need AI-powered support experiences that manage to be worse than actually keyword-searching your company's Confluence.

          I'd pay good money for an AI that could actually ingest Confluence. In literally every organization that does not have a dedicated team to manage it on all aspects, it inevitably devolves into a tire fire. Unfortunately there's no easy way (yet?) to "after-train" a model - what I'd envision here is a nightly batch job that adds another layer to the AI model from all the information in the Confluence so searches don't incur a giant cost for the AI agent to process everything.

          > We probably don't need to be spinning up coding agents to spend 15 minutes discombobulating and bibblewabbling and re-reading 82 billion tokens of context before making a two-line change that an actual developer with learned experience in the code would make in 15 seconds.

          Oh we do. The stonk markets don't like it when companies employ people. People need office space, they need associated services (say, IT, fruit baskets and other amenities), they need wages, and in everywhere but the US you can't just go and fire them on a whim. The less people an organization has, the better the company looks on an investor relations press release. That is why the large AI organizations are investing untold billions of dollars... the race to be the first one that can fully replace a class of human employment. Say an SWE makes 130k/y on average - fire 100 of the 150 you have, that's 13 million dollars. That can buy you a looooot of tokens or hardware.

binyu 20 hours ago

The V100 and the 4090 are based on vastly different architectures, the former uses the older Volta while the latter uses Ada. Last I checked you cannot meaningfully combine them. The 3090 is better than the V100, just get two 3090 and a NVLink.

  • tymscar 17 hours ago

    Well I did in fact meaningfully combined them without an issue, that was the whole point of the blogpost.

    • binyu 1 hour ago

      Yes but it creates a bottleneck that negates the benefit of using multiple cards that way. Look into it. Cheers

      • tymscar 1 hour ago

        Well it doesn’t matter because the bottleneck here is actually quite small for me. The issue is vram. If anything the bottleneck is my 4080.

        • binyu 1 hour ago

          Gotcha, I am not saying your setup is inherently wrong or useless. I am glad it works for your use cases. Godspeed

          • tymscar 1 hour ago

            I think its a very fair thing you have flagged!

  • cthalupa 16 hours ago

    You can split tensors across an AMD GPU and Nvidia GPU - different architectures are not an issue. People run LLMs across some pretty crazy setups.

    • binyu 1 hour ago

      It depends but you cannot directly mix for example Ampere with Ada coz the lack of support for native FP8 in Ampere.

rbanffy 18 hours ago

This lack of driver support for hardware that’s still functional and available in reasonable quantities is annoying.

I was very close to buying a retired POWER7+ server with an ungodly amount of memory, but decided being unable to run a modern Linux kernel would be more work than I wanted to have. Modern kernels need POWER8 and above.

OTOH, if these chips were fully supported, they wouldn’t hit the second hand market at the prices they do.

ewy1 1 day ago

despite gaming being used in the title, it is not mentioned in the article, but i'm curious how this performs.

i've ran some multi vendor frankenstein setups before and sometimes it even works, so i'm curious to hear your experience with it.

Ey7NFZ3P0nzAe 10 hours ago

While we're at it, has anyone any tips on turning a consumer RTX 3090 into a blower type format for putting inside a server rack? Any kind of workaround like this.

00dazzle 1 day ago

That's the same price per VRAM GB as an arc pro B70

  • tymscar 23 hours ago

    But with miles better support, thats why I went this route. Cuda is hard to beat

mg794613 19 hours ago

Very interesting, but slight note on the fact that writer sort of forgets to calculate in the price of the 4080 it works together with.

He spend 200 to upgrade his existing setup, great nonetheless. But not "32gb gram for only 200 bucks"

  • tymscar 17 hours ago

    I did mention that in my case I had the 4080, but you do not need the 4080 whatsoever. You can run the same model with less context on a single £200 V100 or with the same amount of context on two of these.

peibye 22 hours ago

All that work just to write an ai blog post. This is a cool topic but I just can’t deal with the aiisms.

drumhead 22 hours ago

The first dual core processor I built a machine with was an Opteron. It was a nice piece of hardware.

viseyth 1 day ago

Volta (and Pascal, which I'm using) should still be supported with driver 580 as long as you don't use the open modules, and you can use up to cuda 12.9 and cudnn 9.10.2. No need to limit yourself to an old kernel.

  • markus92 1 day ago

    It is. We still run quite a few of them in prod and with 580 drivers they run just fine. Very useful GPUs still.

KnuthIsGod 1 day ago

AI written posts will kill HN.

  • tymscar 23 hours ago

    AI didnt edit a single word of this post.

QuantumNoodle 19 hours ago

This is very cool. Can anyone point me to any resources for setups that can ron decent inferencing at home?

axpy906 1 day ago

Wow. V100. That brings back memories. Way to go.

melonpan7 22 hours ago

Great value for money if you have the time for tinkering and getting the compatibility to work.

  • DANmode 22 hours ago

    and it only gets easier and more defined from here.

  • tymscar 17 hours ago

    If you use NixOS, you don’t need to do any of that anymore. A single command in the CLI, and you have everything I do.

casey2 1 day ago

Some resell group is going to have to make this easier. The shear amount of these cards otherwise heading towards the landfill is staggering. That is if Big Tech don't destroy them to prevent model weights from leaking.

  • eric__cartman 1 day ago

    How would destroying the GPUs prevent the model weights from leaking? By the time you get your hands on them the memory is powered off for a long enough time that a cold-boot style attack is impossible.

    • sethops1 1 day ago

      Would you bet your trillion dollar company on that? Or would you smash up the garbage [to you] memory chips to be sure.

      • marcosdumay 1 day ago

        It's volatile memory, not flash.

      • tymscar 16 hours ago

        I would bet my trillion-dollar company on it because I understand how RAM works.

  • Alifatisk 1 day ago

    > The shear amount of these cards otherwise heading towards the landfill is staggering.

    The thought of throwing away working cards sounds so bizarre to me. I can't believe companies would dispose them into the landfill like that, it is at least worth giving away for refuse.

    • wookmaster 1 day ago

      There’s a long history of corporations doing evil things to ensure their business model succeeds

  • Gracana 1 day ago

    Things like this have started to show up on eBay: https://www.ebay.com/itm/198383386991

      2X NVIDIA Tesla V100 32GB NVLink Water Cooled X99 E5-2686v4 AI Workstation PC
    
      Item                              Quantity
      Intel Xeon E5-2686 v4 CPU           1
      2U CPU Cooler                       1
      Jingyue X99 Motherboard             1
      DDR3 Memory                         32GB
      SSD                                 480GB
      AMD Radeon R5 240 4K Display Card   1
      NVIDIA Tesla V100 32GB SXM2 GPU     2
      NVLink SXM2 Dual-GPU Baseboard      1
      Corsair Water Cooling System        2
      850W Bronze Power Supply            1
      Dual-GPU 300G NVLink SXM2 Baseboard 1
      8654 Data Cable                     2
      8654 to PCIe Adapter Card           1
    • segmondy 1 day ago

      terrible deal

      • Gracana 1 day ago

        Yeah. Not linking as an endorsement -- I do think it's cool, but it's not worth it for that price.

  • xioxox 1 day ago

    Isn't this the same thing with 32 GB already on a PCIe socket?

    https://www.ebay.com/itm/166850431555

    • segmondy 1 day ago

      kinda, they put that on a PCIe socket, but it's passive. Meaning no fan. If you try inference on that it overheats in 1 minute unless you have it inside a server case.

  • iugtmkbdfil834 1 day ago

    I genuinely hope that is the case. The market is absolutely bananas now. I actually now own devices that went up in 'value' since purchase. This is not normal ( and a little scary ). This, on the other hand, is an invitation to properly recycle otherwise unwanted hardware.

wg0 1 day ago

Wait a few years, everyone will be able to put one at half the price.

jeffrallen 22 hours ago

Super interesting. I use data center GPUs at work, but I didn't know anything about this stuff.

I also use Qwen 3.7 27b at work and I agree with the author: it is perfectly capable of the jobs I give it.

recursivegirth 1 day ago

> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.

  • m0rde 1 day ago

    What a difficult world you must live in these days

    • peddling-brink 1 day ago

      While I don’t disagree with their sentiment, I’m far more annoyed with it than the AI writing.

      • m0rde 1 day ago

        Yeah. I get that many HN comments are just complaints (heck mine was too and just as negative and shaming). But how bad of a day must you be having to try to shame someone about how they choose to write up an experience they thought was neat. Whatever, free speech and all that. Hope OC's day gets better.

        • gsquaredxc 1 day ago

          It doesn’t read like shaming to me. It’s, in the grand scheme of HN comments, definitely on the more constructive side of the criticism. Maybe it could have been reworded, but I think the author of the post could very easily find it actionable in the future. I too had to stop reading the article at that point, so I think if the author wants more people to read, my advice for them is to just write like themselves. We’ve entered the start of a new Instagram filter age where many people feel they need to have LLMs reword their writing presumably for the same reasons as the original filter age. I share OC’s sentiment of pushing against the recent trend of implicitly shaming people for their individualistic writing styles.

      • qingcharles 1 day ago

        Every single HN post has the same comment now.

        • rafram 1 day ago

          Only because so many of the articles posted on HN now are AI-written, and badly, too. A lot of tech people are so impressed with LLMs’ capabilities in code that they fail to recognize how bad they are at writing enjoyable prose. And it feels like a chore to write out a whole blog post by hand when the machine could do it for you! But the result we get is so, so much worse and more annoying.

          • qingcharles 1 day ago

            I dislike AI prose too, the cadence of it really rubs me the wrong way, but, that said we've had a lot of great, informative articles lately, written with AI help, where you just have to grit your teeth and get through them to get the underlying knowledge.

            I don't think that commenting on every article is going to make the posters suddenly decide to go back and rewrite it by hand. Some of them probably don't even speak English natively. The comments are getting more tiresome than the AI prose at this point.

            Hopefully in a year or so the LLM output won't be so janky and obvious, so this might just be a phase everyone has to pull through.

  • fouc 1 day ago

    That line was the exact moment I also realized the post was AI written. I kept reading though, but I am left constantly guessing at which key details might be pure hallucinations.

    • SubiculumCode 1 day ago

      Honestly, the default styles are pretty bad. I use Claude in my scientific writing in a very specific way. 1. I write a paragraph. 2. I put Claude into concise style mode. I then ask Claude to revise for clarity.

      I can write competently, but it's natural direction is towards emotional rhythmic flow that can convey emotion/passion...but which for scientific writing, can get in the way of clear clean communication. So, I write what I mean,and Claude straightens it out...and these days (i.e. not last year), it doesn't lose my meaning that often. And since I wrote it first, these AI-isms appear less frequently, and if they do, I revise them away.

    • tymscar 23 hours ago

      FYI, not a single line was AI written. If there is a hallucination, it’s fully mushy brain sourced.

      • fouc 7 hours ago

        Sorry for the false positive! It's interesting that multiple people thought that line was AI generated.

        I think for me it was mainly the superlative "genuinely surprising" that made me wonder.

  • tymscar 23 hours ago

    Agree. But I have not used AI in the slightest.

    Some of us just write that. AIs had to learn it from somewhere.

gtirloni 1 day ago

> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

sigh

  • krackers 20 hours ago

    Usually that cloying pattern is reserved for "emotional" contexts to validate the user ("your struggles are real [despite others thinking it's in your head]").

    Here it doesn't even make sense, of course the VRAM is real. Is it going to tell me that my keyboard is real next?

    I wonder if this was generated with the local model, this seems to be a case where it memorized the style but not the meaning and intent.

lelanthran 1 day ago

> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

Because humans write exactly like this /s

  • postalrat 1 day ago

    Where do you think llms learned to write that way?

    • jlund-molfese 1 day ago

      You can also look at past posts by the same author (before LLM usage proliferated) if you’re curious.

      The project is still very cool, but it’s a little less enjoyable to read when everything sounds the same. It would be just as annoying for people to manually write in a corporate/marketing style, because humanity is what makes the small web interesting.

      https://blog.tymscar.com/posts/privategithubcicd/

      • iugtmkbdfil834 1 day ago

        This, setting aside the llm issue, it is dealing with hardware in ways that -- one would think - would be celebrated on HN of all places. But we focus on presentation.

      • tymscar 23 hours ago

        I’m glad I’ve started this blog before the AI wave so I can prove people I’m just weird at writing.

        It grinds my gears how so many people just talk about my writing style instead of the content.

        • lelanthran 23 hours ago

          > I’m glad I’ve started this blog before the AI wave so I can prove people I’m just weird at writing.

          Your previous blog posts didn't trigger any LLM detector (go on - check for yourself).

          • tymscar 23 hours ago

            Neither does this one. I replied in another thread. It comes out as 0% and the one from 2021 comes out as 8%. LLM detectors are all BS

            • lelanthran 22 hours ago

              GPTzero says 100% AI generated for specific paragraphs that I chose (such as `Multi-token prediction`). If you remove all the code listings, tables, etc and just paste the prose into these tools, it drops to 87% AI generated.

              None of the 3x older blogs of yours that I tried went above 5% AI generated.

              Maybe you're spending so much of time with the LLM that you are talking like it; in which case, take an old blog and a recent blog, give the prose from them both to you favourite LLM and ask them if the same author wrote both. I just did that on ChatGPT and on Gemini, and both found that it is extremely unlikely that the same author wrote both.

              Look, if all the SOTA LLMs agree that your recent blogs sounds generated, you can't blame the reader, can you?

              • tymscar 22 hours ago

                GPTzero is a joke.

                It thinks this is AI: “I bought a datacenter GPU that doesn’t even have a normal PCIe connector, stuck it in my gaming PC with an adapter, and now I have 32GB of VRAM across two GPUs running a 27 billion parameter model at 32 tokens per second.”

                There’s nothing AI about that. Not all SOTA LLMs agree, hell, none of them do. The same exact example I sent here gives me 0% in some, 10% in others, 100% in GPTzero.

                • lelanthran 21 hours ago

                  > Not all SOTA LLMs agree, hell, none of them do.

                  The ones I checked all agree: your recent writing is not the same author as your writing from 3 years ago...

                  You can check this yourself if you don't believe; make of that, what you will.

        • fouc 7 hours ago

          What's interesting about the older post is that all the sentences are long, compared to the current datacenter GPU post which contains lots of short sentences.

          But yeah, probably feels sucky to have your style analyzed for AI writing. FWIW, the datacenter GPU post was great! I went to look at the ebay postings.

          • tymscar 6 hours ago

            Thanks!

            I did get feedback on all sorts of things over the years.

            One of them was to do with sentence lengths.

    • lelanthran 1 day ago

      > Where do you think llms learned to write that way?

      Not from individual human content, that's for sure - maybe MLM marketing copy? Sleazy 4AM ads?

      I mean, every time this response comes up, I keep asking the person to point at something written prior to 2022 that gets 80%+ on the LLM detectors, and yet no one can find anything.

      Maybe you, postalrat, can find something written in this style that was published prior to 2022.

      • hattmall 1 day ago

        It's a function of the LLM "thought process"! It's not really modeled after human speech. It is in short segments but not long form, same reason you see the same rather odd nuances in LLM generated code.

        If they way you thought was to run a bunch of if statements, generate content, then feed that content back to get a "score" of what seems the most plausible, run the if statements again, and adjust / merge responses, then you would write similarly. The recognizable cadence of LLM generated content is pretty clearly the result of a lot of if statements being fused together.

      • tymscar 23 hours ago

        I have written the blog post. I know empirically that I have used 0% AI while writing it. I also know LLM detectors are total BS and they don't really work. I have tried a couple on this exact blog post, and QuillBot, for example, gave me 0% AI detected on it.

        I have then used a blog post of mine from 2021. QuillBot gave me 8%...

        The King James version of the Bible came out at almost 100% AI generated a while ago. It was the HN front page.

        Stop thinking that if someone writes in a way that is fun or looks like what you would think an AI writes, then it is AI generated. Loads of the time it is, but sometimes it's not, and it really hurts those like me.

        • lelanthran 22 hours ago

          > I have tried a couple on this exact blog post, and QuillBot, for example, gave me 0% AI detected on it.

          Don't use Quillbot; not sure why, but their model is reluctant to classify anything as AI generated. I ran into this when proof-reading a students Phd - ChatGPT, Gemini, CLaude (and others) all agreed it was AI generated, but Quillbot said it wasn't.

        • tapete1 8 hours ago

          So you are the AI?

          I mean, seriously, which human says "the compute"?

    • tgv 1 day ago

      Because their custom training data contains an emphasis on such verbiage. It doesn't come from the God-knows-how-many TB of web content the model is pre-trained on. There, such phrasing is only a drop in the sea. But the "yes, you're right" phrases, the em dash, etc., come from the later stage, for which content is created according to some (probably overprecise) guidelines.

      • rafram 1 day ago

        Right. The overuse of "genuinely" most of all. Seems like they put Claude through a few good rounds of training to always answer questions about its consciousness, thoughts, etc., with something about how it's "genuinely unsure," and as a result, the model learned to use "genuinely" as an intensifier in all sorts of inappropriate contexts.

        • iugtmkbdfil834 1 day ago

          Oi, I personally use adverbs everywhere. Genuinely, kids these days.

      • anigbrowl 17 hours ago

        It's a very specific style of condescending journalism that US media has been nurturing and recycling for decades now. I was going to write this this whole comment as a parody of it, starting with some literary hook like 'Call it Ouroboros syndrome:' but I can't bring myself to add to the pile.

        I have not done the textual and statistical analysis to verify this, but I feel like it's something you could trace back to east coast journalism schools and publishers mediated via television, which long predates mass adoption of AI. Think how many news articles you've read with titles like 'Anatomy of a murder' os 'Inside the meeting that changed everything.' The hooky, slightly pompous tone is something you can find back as far as the 1960s or 1970s; browsing through old issues of Readers Digest and you'll find tons of it. When I say it's mediated through television, I'm talking about both the dramatic and heavily conclusory style of fictional prosecutors and narrators, and the extremely shallow style of TV news reports (often transcribed to the web) which are only one or two sentences per paragraph. And this is before we consider the stylistic impact of ad copywriting on communication in general.

        And there's something else.

        The one sentence paragraph interjection, designed to refocus your attention in a surprising new direction after two paragraphs of stuff you already know. 'I never thought I'd end upere,' said Sally Nocontext, hooking you in for another paragraph or two where you try to figure out who this woman is, where she ended up, and what it has to do with the article you are already halfway through reading. After all, I've come this far, the reader through. I might as well see it through to the end.

        And that's just what publishers wanted.

        One sentence can also validate a truism that the reader already suspects, flattering their beliefs in their own analytical powers....

        ...well you get the idea. When I'm using LLMs for any sort of extended session, I find myself reaching for the same few prompts to break it of such clicheed expression; I'm especially averse to the habit of adding zippy-sounding nicknames to complex or potentially dull concepts. I don't have a favorite starting prompt, but I generally find that asking for 'a concise, academic tone' does wonders to de-fluff its output. Remember, it defaults toward being as widely accessible as possible, and much journalism is aimed at consumers with only a high school education and maybe middle-school reading comprehension, math ability, and appetite for depth over sensation.

    • krackers 20 hours ago

      This particular case seems to be an LLM trying to blindly apply the saccharine therapeutic pattern "your frustration is real" to a context where it doesn't make sense. No one is debating or questioning the fact that the card has 16 gigs of vram.

      The point of "X is real" in a therapeutic context is to make the person feel seen and acknowledged, that his struggles are real to him and really do weigh on his mind, even if it is technically "all in his head".

  • bossyTeacher 1 day ago

    X is Y. Z is Y. And Alpha is genuinely Beta.

    Classic LLM writing style.

  • driverdan 1 day ago

    There's interesting stuff in this writeup but it sure seems like most of it was written by an LLM.

  • bitwize 1 day ago

    You know what the sad bit is? Humans do write exactly like that. That's not even particularly egregious StalkedIn marketroid speak.

knollimar 1 day ago

A little bit of local copium but neat read.

Isn't a rasbpi with 16gb of RAM $300 now?

  • matja 1 day ago

    The latest Raspberry Pi 5 has one 32-bit channel (2x 16-bit subchannels) of LPDDR4X-4267 SDRAM giving 17.1GB/s of bandwidth, 52x less than this GPU. Never mind lacking the CUDA and Tensor cores, so the FP16 performance is 102x less (307 GFLOPS vs 31.4 TFLOPS). So for £200, there's absolutely no comparison for this specific use-case.

    • knollimar 1 day ago

      Yeah thats what I'm saying. How is it so cheap????

      • feisuzhu 1 day ago

        V100 GPUs are e-waste.

  • thejj100100 1 day ago

    I don't understand what point you're trying to make here? Are you talking about the price of RAM?