| Svelte Hacker News

points by xtracto 1 week ago

This, and distributed LLM inference. We are at a point where no single person can setup a rig to run a SOTA model, it is just too expensive.

So we must build and adopt frameworks that allow individuals to share resources to run SOTA models in a distributed manner. That way they will also be non-censorable by governments.

Also The only way to prevent that one entity weaponizes it, is by giving EVERYONE access to it.

pizzao 1 week ago

I wonder if there is way local small LLMs can complement each other in away that the sum-total yields a much more performant LLM

killerstorm 1 week ago

Perhaps some radical MoE where you download _exactly_ the components you need as you need them. Currently MoE is switched usually on per-token per-layer basis, so you need all weights locally. But e.g. Apple made one which pre-selects all experts based on prompt embedding. That might be further scaled up - e.g. predict exactly what's needed
- salter2 1 week ago
  
  Perhaps something similar to speculative decoding.
  Speculating Experts Accelerates Inference for Mixture-of-Experts: https://arxiv.org/abs/2603.19289
- eblanshey 1 week ago
  
  I don't understand why no labs create dedicated models per industry/expert. E.g. physics, electronics, chemistry, etc. Each model would be much smaller and better suitable for running locally. Everyone is trying to cram everything into a single model.
Flere-Imsaho 1 week ago

Sort of like how ants in a colony produce a working "society" that no individual ant could muster.

mythern 1 week ago

I propose the TokenTorrent protocol. Hoist yer pirate, erhm, clanker flags!

taylorhou 1 week ago

I built Teale.com and opensourced it. My domain contribution to society. It powers fully distributed inference on Mac, windows, Linux, android, iOS, hell even harmonyOS.

Opensource/weight models will get better and better and eventually we will have mythos level running on smartphone/eyeglass hardware.

It is stupidly tedious currently to match supply with demand though because physical hardware like a 16gb ram MacBook doesn't mean there's truly 16gb available let alone matching models and all of their settings (kvcache, context limit, temperature, etc) to demand.

Would appreciate any help cus we need ai inference by the people for the people.

WASDx 1 week ago

> distributed LLM inference

This seems extremely inefficient considering data transfer between model layers if the model is distributed. I found this project called Petals that claim up to 4 tok/s for a 180B model although its repository hasn't been updated in two years.

https://petals.dev/

stymaar 1 week ago

For token generation, yes: because current-gen LLMs are autoregressive you need to add the inter-node latency for every since token.
For prompt processing it would work though, and it could for diffusion LLMs as well.

theptip 1 week ago

> The only way to prevent that one entity weaponizes it, is by giving EVERYONE access to it

There is a middle way; the policy space also includes government regulating both access and monopoly.

I’m opposed to monopolies of this tech, but I hope the risks of giving everyone jailbroken AGI/ASI are clear.

As a toy example you could imagine a Universal Basic AI where government subcontracts to (n_quorum) labs, everyone gets a token budget, but operating the APIs comes with the safety controls.

If everyone does get to run their own jailbroken AGI, then the only stable societal norm I see is A LOT of surveillance to make sure nobody is building CBRNE threats. This doesn’t seem like a clear win from a civil liberty perspective, though I could see the argument.

airstrike 1 week ago

We have nothing anywhere near AGI/ASI so you're good for another 25 years, my friend
- iugtmkbdfil834 1 week ago
  
  That is exactly what ASI wants you to think. foil hat off

hypefi 1 week ago

yes, it also complements the geohot idea behind the tinybox

bilsbie 1 week ago

What is that? I can’t seem to figure out what the use case is vs buying off the shelf?
I think it’s a great project but the communication isn’t clear to me.
- Avicebron 1 week ago
  
  https://tinygrad.org/#tinybox
  I'm not sure exactly why you would buy through them vs rolling your own if you could afford the equivalent hardware.
  I'm a firm supporter of local inference though so good on them for doing something
  
  indigodaddy 1 week ago
  
  Lol I get nervous when I see a list of products with full specs but no prices
  
  matternous 1 week ago
  
  They have prices. Click on the links in the shipping row.
  
  indigodaddy 1 week ago
  
  And..... I was right to be nervous