In what world would I ever expect a commercial (or governmental) entity to have precise alignment with me personally, or even with my own business? I argue those relationships are necessarily adversarial, and trusting anyone else to align their "AI" tool to my goals, needs, and/or desires is a recipe for having my livelihood completely reassigned into someone else's wallet.
Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?
I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.
It does. People drive these entities. People hide behind the liability shields and authority of these entities. Also notice that I generalized with the phrase “…and trusting anyone…”
You can tell that broad alignment between people is natural just by looking at the effort that corporations and governments make to undermine it. Alignment between people is perhaps not a state of nature, but it really is a pretty normal consequence of a fairly small amount of education and of middle-class existence that is left to itself (i.e. without brain-washing and deliberately working to create out-groups). If you're eating enough and have a few brain cells to rub together, then you definitely want that for your neighbors too because it promotes stability.
> You can tell that broad alignment between people is natural
It really isn't. The whole point of the market system is to collectively align people's actions towards a shared target of "Pareto-optimized total welfare". And even then the alignment is approximate and heavily constrained due to a combination of transaction costs (which also account for e.g. externalities) and information asymmetries. But transaction costs and information asymmetries apply to any system of alignment, including non-market ones. The market (augmented with some pre-determined legal assignment of property rights, potentially including quite complex bundles of rules and regulations) is still your best bet.
The market aligned us with children working in sweat shops after we outlawed it by convincing us it was OK if it was foreign kids and we got to share in pocketing the savings not just the evil factory owner.
Yes I'm well aware. Of course that's not how things are advertised to people, and they absolutely hate it when this is pointed out to them. This tells me that deep down they don't actually agree with how the system operates.
AIUI David Graeber famously pointed out that people in small groups can form the equivalent of a "market" simply by exchanging favours ("I'll scratch your back if you scratch mine") in an informal gift economy, without any money-like token or external unit of account. That's quite in line with what I said.
You understanding is mistaken. Graeber's "everyday communism" is not a market, and his whole larger point is that contorting everything to the lens of markets is simply ahistorical and unempirical.
I'd strongly suggest reading his books. They profoundly changed my understanding of how human institutions and society form.
Unless it's some sort of complete post-scarcity, it has to be understandable in market terms. What happens if people try to free-ride on the whole "communist" system? If they get excluded from its benefits, that's equivalent to enforcing some bundle of property rights.
> Unless it's some sort of complete post-scarcity, it has to be understandable in market terms.
No, it does not, and that's Graeber's whole point.
"Markets" are not some sort of physical law of the universe.
A simple example of this is it's the norm in hunter gatherer societies to take care of people who never will make an equal contribution back in the transactional sense.
Because the social ties in those societies are not simply transactions.
If your model fails to accurately describe empirical reality, time to improve/expand the model.
These social ties are real (they are a kind of wealth, or social capital, for the persons involved) but they're also limited to very small social groups, the equivalent of a modern small village neighborhood or HOA. The point of the market is that it scales well beyond those.
I like economics and math too, but the whole discussion of markets is a terrible starting place for deriving results in ethics/psychology. If you insist though, notice that unions will happen unless some other organization is working to prevent them. What do you suppose this means? People are aligned with each other exactly because they've noticed their coworkers are not corporations or governments.
Although the two are entangled, politics is a more relevant framing than economics here. If people weren't broadly aligned on basic stuff, then autocrats, theocrats, kleptocrats and so on would simply not be interested in dismantling democracies. They make that effort because they must.
> the whole discussion of markets is a terrible starting place for deriving results in ethics/psychology.
Historically, we did essentially the opposite. We figured out many aspects of human ethics and psychology first, and deduced from them how and why markets work as they do.
> ... If people weren't broadly aligned on basic stuff, then autocrats, theocrats, kleptocrats and so on would simply not be interested in dismantling democracies. They make that effort because they must.
This implies that people are only weakly aligned in the first place, otherwise no such attempt at dismantling could ever succeed. That's not a very interesting claim; it does not refute the usefulness of some external mechanism to more directly foster aligned action. Markets do this with a maximum of decentralized power and a minimum of institutional mechanism.
Uh, what? People have been killing each other over values misalignments since there have been people. We invented civilization in part to protect our farms and granaries from people who disagreed with us on whose grain was in said granaries.
Fair enough. We are a social species. But those alignments occur in small groups. You don’t need effort by “corporations and governments” for nations of millions of people to schism. If anything, those large institutions drive broad-based alignment.
Methinks you've been sitting in your armchair too long.
Broad-based alignment doesn't come from nothing, but it is surprisingly easy to achieve when a population recognizes a shared stake. A synthesis between selfishness and altruism emerges when you consider who you can call a "neighbor".
> it is surprisingly easy to achieve when a population recognizes a shared stake
Sure. But it takes work for anything larger than a small, close-knit community. I’m pushing back on the notion that this comes naturally and is a default state. It’s not, at least not relative to people naturally forming in and out groups.
The armchair commenters are probably folks who have never organized a group of people before outside a commercial context.
You might be treating "neighbor" too literally. People understand the global nature of the limits on resources and by extension the world economy better every year. The boundary of who shares 'stake' grows likewise.
But that shared stakeholding doesn’t naturally drive alignment. You need journalists, fiction writers, organizers and delegates. Travel and curiosity. These each take effort, resources and organization. It’s something we do well. But it isn’t spontaneous in the way small-group kinship is—it literally emerges if you put people in proximity.
I'd say it's "typical" that one person witnessing another's plight will identify with them based on the similar conditions of struggle, oppression, etc. As you point out, the trick is to expose them to those scenes in the first place. But this is proximity just the same, in a social and experiential sense if not in a "my bed is within walking distance of yours" sense. So it is spontaneous given those caveats. The question, then, assuming camaraderie and kinship is the goal, is how do we expose people to each other's lives' conditions without the narrative spin machine altering the message to distance people from each other rather than bringing them closer together?
And if my grandmother had wheels she’d be a bicycle. The process of creating an in group naturally creates out groups. The “brainwashing” OP describes is just as natural as social alignment through an innate drive for conformity.
Sure. Push and pull. The point is that needs effort to work at larger scales. We don’t “naturally” organize into nations of three hundred million or a billion. To the extent we do, we also “naturally” go to war.
There is a pretty interesting study of a large group of chimps. I dont remember where exactly but they have been civil warring the last 15 years or so. Point is, it seems that there is some kind of innate group formation process.
> Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?
Not OP, but for me, kind family and friends, and various feel-good pieces of fiction and other writing, at least let me envision the possibility of a perfectly kind/dedicated/innocent/naieve individual who is truly on my side 100%. But even that is mostly imagination and fiction... although convincing others of that isn't necessairly an argument worth making.
Commercial entities have a fundamental purpouse of profit. While profit doesn't have to be a zero-sum game - ideally, everyone benefits in a somewhat balanced way - there's some fundamental tension, in that each party's profit is necessairly limited by the other party's.
Government entities have a fundamental purpouse of executing the will of the state, which is rather explicitly not the same thing as the will of you as an individual.
Both commercial and government entities also tend to involve multiple people, which gets statistics working against you - you really gathered that many people who would put your needs above their own, with exactly zero "imposters" - which in this context just means people with a bit of rational self interest?
> I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.
Just because you're paranoid, doesn't mean they aren't out to get you. Trust, but verify.
You might not be able to put absolute blind trust in anybody. I certainly can't. However, one can hedge one's bets, and diversify trust. Build social circles of people with good character, good judgement, and calm temperments - and statistics will start working for you. It's unlikely they'll all conspire to betray you simultaniously, especially if you've ensured betrayal costs much and gains little. While petty and jealous people can indeed be irrational enough to betray under such circumstances, it'll be harder for them to create the kind of conspiracy necessary for mass betrayal that might cause significant enough damage to warrant proper paranoia. You might still have to watch out for gaslighters stealing credit (document your work!) and framing people (document your character!) and other such dishonest and manipulative behavior... but if everyone's looking out for the same thing, well, that's just everyone looking out for everyone else! That's a community looking out for each other, and holding everyone honest and accountable. Most find comfort in that, rather than the stress paranoia implies.
Put yourself in a room full of manipulators and schemers, on the other hand, and "parnoia about everyone" might be the only reasonable or rational response!
> Profit is obtained by maximizing traded benefits and minimizing costs.
Gain is obtained by the easiest means available. Your narrow definition of profit is seldom the easiest, cheating is far "superior" especially when it's legal for some.
> None of this requires taking anything away from any other party.
"required" and "preferred" (e.g. because it's far easier) are different like night and day.
> But even that is mostly imagination and fiction... although convincing others of that isn't necessairly an argument worth making.
There was a Japanese visual novel in the 2000s about a girl who was your personal maid, and was so devoted would always take your side in any conflict, accept and support you just the way you are, even if you were a horrid person to your friends. It turns out she was a ghost, or a kind of yokai, or something. Anyhoo, back on 2ch she attracted a fandom, and there was a second group of people on 2ch who labelled her a "useless person manufacturer" because if you actually had a person who always accepted you just the way you are and never pushed back, that can be actually a trap that prevents you from developing.
It's a theme that's relevant today when people have AI servitors that always glaze them. It puts even certain utopian AI fiction, like Richard Stallman's story "Made for You", into a whole new light.
My family accepts me just the way I am a bit too much. I can't bring myself to blame them, when past "reformist" pressures have been misguided/misapplied and backfired, but I recognize the trap. It'd also be hypocritical to blame them, when I also accept me just the way I am a bit too much! I'd like to think I'm decent enough to people, but I'm certainly more useless than I'd like to be. (Un?)fortunately, I'm not in a position to suffer, and I'm at least aware of the problem!
One of the ideas I've toyed with, even before all the AI hype, is a dumb, semi-adversarial servitor. Something to nag or taunt me about chores not done, to interrupt me when I'm doomscrolling, to use as a vessel for precommitment, to challenge me in various ways. I've been too lazy to build it thus far. Many tools overlap the problem space, so I shouldn't be using that as an excuse - perhaps I should give StayFocusd another shot.
Conflict and other stressors - in moderation, within the limits of one's ability to handle - are important for growth and health. A tree shielded from wind is weakened as it fails to develop stress wood and structural strength. A good debate can sharpen my thoughts and mind, walking to lunch keeps my cardiovascular system healthy, rising to life's various challenges gives me the security of knowing I can rise to the occasion and gives me more skills.
I'm not an expert in political theory or ethics either, but in my worldview, power relationships matter in these discussions. I believe power and responsibility should go hand in hand, and I hold entities to a standard that is proportional to their power to influence others lives.
If an entity's power is decentralized, for example when it is democratically organized to some degree, then that disperses both power and responsibility.
If you've built an agent that can act even vaguely close to a paperclip maximizer, you've already solved 99.999% or more of the alignment problem. The hard part of alignment so far is getting the AI to do something useful in pursuit of the right goal, and not just waste energy. We still have no idea how to do this with any effectiveness: even modern "RL from verified feedback" systems are effectively toys, the equivalent of playing video games, not really of doing something useful in the real world.
Huh? Modern RLVR systems are toys that can’t do anything useful in the real world?
We must be living in completely different worlds. Claude and other agents have completely upended work for me and every single other software engineer I know.
Why would relationships with a commercial entity be "necessarily adversarial"? A commercial relationship depends on the product providing more utility than the cost (for the consumer) and providing more revenue than cost (for the commercial entity). This means that while some components of the relationship may be adversarial in some areas, it cannot really be entirely adversarial.
In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.
This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be aligned with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders.
"Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal-guards that frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.
I mean that does partially reduce the chances of a cartel, but not really near as likely as you think.
Most countries have a pretty strong ban on most kinds of weapons, the US is one of the few that lets everyone run around with their rooty tooty point and shooty, but most countries have implemented bans. Some because the government doesn't want the people having them, and in others the citizens call for the bans because they don't like the idea of getting shot by their fellow citizens.
It won't be long before citizens and governments get tired of models being used for criminal activities and will eventually lay down laws around this. Models will have to be registered and safety tested, strict criminal prosecution will happen if you don't. And the big model companies will back their favorite politicians to ensure this will happen to.
Now, that in general will be helpful as there will still be more models, but it will still not be a free for all.
The argument is that it's misaligned because it only values one thing: more paperclips, while human values are much more varied and complex.
Debatable whether it truly understands what it's doing or not, but the argument usually assumes that it does know what it's doing at least in that it's able to imagine outcomes and create plans to reach its singular goal, making it a very simple toy example of a misaligned system.
I actually wound up geoblocking the UK based on Ofcom's February 2025 presentation for small services providers--they said that they intended to target "one-man bands" who (e.g.) failed to perform a child risk assessment or age verification, but that a geoblock would be considered compliant. I don't like doing this, but as someone who visits the UK regularly (and has been regularly pushing Ofcom on this matter) I figure better safe than sorry.
I'm glad you have done this and I wish more would follow the same course. The more content that becomes unavailable in the UK, the more people might start to pay attention to the stupidity of the law.
I doubt it, but even from an irrational anger perspective, I hate that these idiots can do idiotic (and worse, counter productive) stuff, and get no comeback on themselves.
>I'm glad you have done this and I wish more would follow the same course. The more content that becomes unavailable in the UK, the more people might start to pay attention to the stupidity of the law.
The law isn't going to be repealed because a bunch of nerds geoblocked their personal blog.
Oh boy, that’s a very generous view of human nature.
The cynic in me agrees with the article’s premise, but not because I believe "alignment is a joke", but because I doubt that humans are "biologically predisposed to acquire prosocial behavior."
I think what I meant to say was, they're as simple to jailbreak as they were three years ago.
Different methods, still simple. Working with researchers that are able to get very explicit things out of them. Again, it feels much worse than before, given the capability of these models.
There's basically guardrails encoded into the fine-tuned layers that you can essentially weave through (prompting). These 'guardrails' are where they work hard for benevolent alignment, yet where it falls short (but enables exceptional capability alignment). Again, nothing really different than it was three years ago.
The power asymmetry point is what gets missed in most alignment debates. An AI model doesn't need to be misaligned to cause harm. It just needs to be misaligned with users while aligned with whoever's paying for it. That's not a future risk. That's how every enterprise SaaS product works already.
I'm seeing that these tools are extremely powerful the hands of experts that already understand software engineering, security, observability, and system reliability / safety.
And extremely dangerous in the hands of people that don't understand any of this.
Perhaps reality of economics and safety will kick in, and inexperienced people will stop making expensive and dangerous mistakes.
The future is happening. Instead of trying to raise awareness about evil AI... I think it would be more healthy if we could direct this energy to ways of improving the situation without condemning the unknown of AI evolution. As with anything.. there will be a bad side.. The bad guys will always be there.. be it AI or soccer matches.. should we stop developing nuclear energy because nuclear weapons are developed?
There is no natural law saying the good sides of any kind of tech will outweigh any bad sides.
”The future” is happening because it is allowed in our current legal framework and because investors want to make it happen. It is not ”happening” because it is good or desirable or unavoidable.
Feels like people are mixing two different things here-alignment in small groups (family,teams) vs alignment at scale. The first happens naturally, the second almost always needs structure, incentives, and enforcement
That's a lazy argument. Obviously tools are tools. But if tool A revolutionized human society and has massively advanced technology (and CAN be used for harm), where tool B's positive impact is a drop in the bucket by comparison and has the potential for an outsized amount of harm, obviously tool B is comparatively a bad tool.
I don’t even see the pint of alignment or anything about security in LLMs. I feel like this is how “some people” reacted to the internet when I was young (lots of censorship), how hackers don’t let it happen, then how we are back to that world in the hand of corporations and governments who “think of the children). LLMs are out of the bottle and not going back there, only option is building for the new world on the defender side, everything else is politics.
LLMs can hack, but also nmap made hacking easier do we make nmap illegal? We already have drones who kills people, now there is less human involvement, results are same.
LLM can also make defending easier (at least for cyber security) but I guess real world security is not that different.
Now evil things can be done faster, easier and at more scale. Also good things have the properties.
It’s another tool in the toolbox, the idea that some entity will able to censor or align it as naive as thinking internet can be controlled. Some will do and manage anyway, but it’s not any different china’s firewall.
Alignment is sold to us by companies like OpenAI and Anthropic , not because they care, because that gives them power and more control. When was the last time a big corporation actually cared about soft topics like this? Yes, never.
Tech changes do not impact attackers and defenders equally.
Good things do not all have the same properties - That’s mistaking an incomplete assertion for a complete one.
Cyber security is an attackers domain. Your security is typically because you are (were) not valuable enough to earn the attention of an attacker.
When LLMs make targeting you cost effective, you will have to spend more energy defending yourself. This means that you have less time to do other useful things, reducing your net utility, while increasing attackers utility.
Also - teams in these companies DO care, I have worked with them. The decision makers are regulated by the cadence of the quarterly share holders meeting. At that point things like safety are a cost center. Reducing safety spend while minimizing reduced time on site is rewarded by markets.
There is also the fact that it's very easy to plant backdoors in LLMs with plausible deniability :
- You can just use the same tools you use to train them to make them behave in some specific ways if some specific preconditions are met.
- You can also poison the training data, so that the LLMs are writing flawed code they are convinced is right because they saw it on some obscure blog but in fact it had some subtle flaw you planted.
- You can poison the prompts as they are automatically injected from "skills" found online.
You couple that with long running agents which may drift very var from the conditions where they were tested during the safety tests.
You add the fact that in this AI race war, there is some premium to run agents capable of advanced offensive security with full permission, pushed using yolo dark-pattern.
The training process is obscure and expensive so only really doable by big actors non replicable and non verifiable.
And of course, now safe developers (aka those not taking the insane risk of running what really is and should be called malware), can't get jobs, get no visibility for any of their work, drown into a sea of AI slop made using a prompt and a credit card, and therefore they must sell their soul.md and hype for the madness.
> I think it’s likely (at least in the short term) that we all pay the burden of increased fraud: higher credit card fees, higher insurance premiums, a less accurate court system, more dangerous roads, lower wages, and so on.
I think the author is brushing against some larger system issues that are already in motion, and that the way AI is being rolled out are exacerbating, as opposed to a root cause of.
There's a felony fraudster running the executive branch of the US, and it takes a lot of political resources to get someone elected president.
> I know this because a part of my work as a moderator of a Mastodon instance is to respond to user reports, and occasionally those reports are for CSAM, and I am legally obligated to review and submit that content to the NCMEC.
Oh ** that.
I have moderated all sorts of crap, and I am grateful that my worst has only been murders, hate speech, NCII, assaults, gore, and other forms of violence.
> I sometimes wish that the engineers working at OpenAI etc. had to see these images too. Perhaps it would make them reflect on the technology they are ushering into the world, and how “alignment” is working out in practice
This is a great idea. I’ve heard of new leaders being dropped in, and being sure they have a better handle on safety than the T&S teams.
Only after they engage with the issues, and have their assumptions challenged by uncaring reality, did they listen to the T&S teams.
There are a lot of assumptions on speech online that do not translate into operational reality.
On HN and Reddit, everyone complains about moderation and janitors, but I highly recommend coders take it as civic service and volunteer.
How can you meaningfully fix a mess, if you do not actually know what the mess is about?
> They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs
Such a fear mongering position. You can learn to build pipe bombs already. Take any chemical reaction that produces gas and heat and contain it. Congratulations, you have a pipe bomb.
Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.
> I see four moats that could prevent this from happening.
Really? Because you just said:
> human brains, which are biologically predisposed to acquire prosocial behavior
You think you're going to constrain _human_ behavior by twiddling with the language models? This is foolishly naive to an extreme.
If you put basic and well understood human considerations before corporate ones then reality is far easier to predict.
> Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.
the cost of the wrong answer to this question is so incredibly high that I hope nobody is sincerely asking an LLM for this information. The things people trust to "machine that gives convincing answers that are correct 90% of the time" continue to shock me
The liability of google's search box saying "Ammonia and bleach mix to make a great cleaning agent!" (disclaimer: please don't do that it will kill you) seems really high. I feel like we're all living in crazy world.
The author is still grieving by watching a civilisation changing technology just passing by. Every single one of the problems they note applies to any technology that existed.
The internet produced 4chan. Produced scammers. Produced fraud. Instrumental in spreading child porn. Caused suicides. Many people lost their lives due to bullying on the internet. Many develop have addictions to gaming.
To anyone who has given it some thought, any sufficiently advanced technology usually affects both in good and bad ways. Its obvious that something that increases degrees of freedom in one direction will do so in others. Humans come in and align it.
There's some social credit to gain by being cynical and by signalling this cynicism. In the current social dynamics - being cynical gives you an edge and makes you look savvy. The optimistic appear naive but the pessimists appear as if they truly understand the situation. But the optimists are usually correct in hindsight.
We know how the internet turned out despite pessimists flagging potential problems with it. I know how AI will turn out. These kind of articles will be a dime a dozen and we will look at it the same way as we look at now at bygone internet-pessimists.
This is response not just to this article, but a few others.
I think you underestimate people's grievance with technology. If you make a poll my guess is more than 50% of people will say the world was a better place pre-social media.
If the AI tech keeps going at the direction it's going now, more and more people will start believing the world would be better if the internet and computer had never been invented.
You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)
Its usually the educated and elite PMC types who have grievance with technology. They secured their status and have lucrative jobs mostly with the help of technology and they are too scared to have anything threaten their position in society. It is highly hypocritical to behave this way but they don't seem to have the self awareness to observe it objectively.
Ask any poor person in India what their sentiment is with tech - it is usually optimism.
> You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)
The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry.
> The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry
That is fantasy. Information technology has created an unprecedented level of information asymmetry and the gap is widening everyday as the total computing capacity grows.
Before information era, the ruling class was roughly as blind as peasants. Population census took years, and sometimes outright impossible. The opaqueness was two-way. Now it's one way - people in power know everything about the citizens.
Take two countries. One with open access to information in the way you described and another country where internet is not allowed. Which one do you think will be more democratic?
(hint: there already exist examples like such)
Without information, there is no way a voter may know which person to vote for and whether to believe in them at all and you are easily susceptible towards manipulation.
It will become more clear when you try to answer this hypothetical: if your objective were to bring in more democracy in North Korea, would you allow the global internet to proliferate if you could? According to your theory, it would just make it worse in general.
I’m in India, and I sure as shit haven’t seen what you are talking about.
In 2025, we lost 22931 crores to cyber fraud - about 2.7 billion USD. People are now saying that they are relieved if the losses were only single digit crores lost.
India invented digital house arrests. There’s entire districts/cities where the primary revenue stream is from scams. Cops don’t want to involve themselves with cyber crimes because they can’t resolve them.
India’s information economy is so broken, that the idea that we are less or more democratic is not even relevant.
The amount of revenge porn, non-consensual intimate imagery released per day is heart wrenching.
I REALLY want to agree with you. I too want to talk about the good that tech can do. India cannot afford to talk about the good without dealing with the bad.
The motto of move fast and break things assumes someone else will pick up the pieces. This doesn’t hold true for India - we need to pick up the pieces.
It’s easy to fall into the trap of overindexing on local issues. On a holistic level internet brings people to the same level by democratising knowledge.
I’ll ask you this: would India be better off without internet? If your ultimate goal were democracy, would you end internet to promote democracy in India?
> We know how the internet turned out despite pessimists flagging potential problems with it.
A sludge of spyware and addiction machines which employ negative emotion and outrage to drive shareholder value?
"The internet" is a pretty big tent. Everything from text messages to streaming video to online gaming to social media to encyclopedias. I think 15 years ago you could make a strong case that the internet was mostly a net positive, I think now that is much more difficult. If governments are able to fully realise their plans for surveillance and control, it will almost certainly become a net negative. Of course with many positive aspects.
So likewise with AI, we should be careful to not make the same mistakes as we did with the internet so we can realise something that is mostly positive. We could absolutely have a world where AI is as beneficial as you believe it will be, but we don't get there through inaction, we get there by being deeply critical of the negative aspects of AI and ensuring that we don't let a small number of hyper scalers control our access to it.
The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".
This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.
I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:
> Alignment is a Joke
True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.
> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.
What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?
lol. I did use a lot of short sentences, that’s my bad. But please read through [1] and compare my text onto it, it may enlighten you on how to actually spot llm writing.
For the future, try to avoid prevaricating when you actually have a clear sense of what you want to argue. Instead of convincing me that you've weighed both options and found luddism wanting, you just come off as dishonest. If you think stridently, write stridently.
I’m not a native speaker and you may find my writing simplistic if your standard vocabulary includes three expressions I’ve had to look up (I don’t mean this as an insult, I was just genuinely stumped I could barely understand your comment).
I may think stridently (debatable) but I generally believe it is best to always try to meet in the middle if the goal is genuine discussion. This is my attempt at that.
But meeting in the middle only works if you honestly believe the middle is a valuable place to be. I don't want to dissect your writing too much, but let's look at one example.
> The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets.
This is very confident, strident language. You clearly believe that there is a faction of people demonizing technology, akin to luddites, who are not worthy of being taken seriously.
> This one raises a lot of important points about LLMs, but...
So here you go for the rhetorical device of weighing the opposing view. Except, you don't weight it at all. You are not at all specific about what those points are. It's just a way to signal that you're being thoughtful without having to actually engage with the opposing viewpoint.
> I do think that safety is important... But I think it's better not to be a luddite.
Again, the rhetoric of moderation but not at all moderate in content.
It was a clear mistake to think that this was LLM writing. But I suspect the reason I made this mistake is that AI writing influences people to mimic surface level aspects of its style. AI writing tends to actually do the "You might say A is true, but B has some valid points, however A is ultimately correct." Your writing seems like that if you aren't reading it closely, but underneath that is a very human self-assuredness with a thin veneer of charitability.
Which LLMisms are you seeing in their post? Their grammar, word choice, thought flow, and markings all denote a fully human authorship to me, so confidently that I would say they likely didn't even consult an LLM.
> This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".
I think the point was never to bring a solution or show any essence of reality. The point was being polemical and signalling savviness through cynicism.
If lies are our future, we have the tools necessary to deal with them. Frankly, this question was answered over a century ago by Dostoyevsky in Crime and Punishment, and every experienced criminal lawyer, prosecutor, and judge I've met already understood this very basic fact to be true: even lies point to the truth.
What is unacceptable, and what I've used my entire life as a deliberate strategy to obfuscate personal affairs, deflect unpleasant conversations, and deal with fools I come across, is to mix of a small amount of truth within a complex web of lies and misdirection.
This approach deals with two main challenges of lying effectively: lying in a consistent way and resisting the urge to be caught out in the lie. The truth is an abyss, and it frequently finds its most trenchant opponents flinging themselves willingly into it.
The most important, revealing truths can be disclosed without any risk of being discovered, hiding in plain sight. The philosophers knew this and applied these lessons judiciously since the times of Plato. Sometimes speaking the truth is dangerous.
I sometimes wish LLMs displayed that cautious refrain when discussing difficult matters. In my estimation, AGI will not have been reached until the models can produce works as mischievous as Plato, Averroes, Rousseau, or Derrida.
We are a long way from that. The vanilla brand of lies put out today by LLMs are barely worth mentioning, even if troublesome.
It's when the lies mask a deeper and profound truth that we'll know the game is up.
Exactly assuming failure and constraining the blast radius feels like the only reliable path when the models themselves are black boxes. Patch based alignment starts looking fragile pretty quickly
>Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice.
How did brains acquire this predisposition if there is nothing intrinsic in the mathematics or hardware? The answer is "through evolution" which is just an alternative optimization procedure.
There are also many biological examples of evolution producing "anti-social" outcomes. Many creatures are not social. Most creatures are not social with respect to human goals.
This "just" is... not-incorrect, but also not really actionable/relevant.
1. LLMs aren't a fully genetic algorithm exploring the space of all possible "neuron" architectures. The "social" capabilities we want may not be possible to acquire through the weight-based stuff going on now.
2. In biological life, a big part of that is detecting "thing like me", for finding a mate, kin-selection, etc. We do not want our LLM-driven systems to discriminate against actual humans in favor of similar systems. (In practice, this problem already exists.)
3. The humans involved making/selling them will never spend the necessary money to do it.
4. Even with investment, the number of iterations and years involved to get the same "optimization" result may be excessive.
While I don't disagree about (2), my experience suggests that LLMs are biased towards generating code for future maintenance by LLMs. Unless instructed otherwise, they avoid abstractions that reduce repetitive patterns and would help future human maintainers. The capitalist environment of LLMs seems to encourage such traits, too.
(Apart from that, I'm generally suspect of evolution-based arguments because they are often structurally identical to saying “God willed it, so it must true”.)
I think they're biased toward code that will convince you to check a box and say "ok this is fine". The reason they avoid abstraction is it requires some thought and design, neither of which are things that LLMs can really do. but take a simple pattern and repeat it, and you're right in an LLM's wheelhouse.
Assuming that means capabilities which are both comprehensive and robust, the burden of proof lies is in the other direction. Consider the range of other seemingly-simpler things which are still problematic, despite people pouring money into the investment-machine.
Even the best possible set of "pro-social" stochastic guardrails will backfire when someone twists the LLM's dreaming story-document into a tale of how an underdog protects "their" people through virtuous sabotage and assassination of evil overlords.
This Veritasium video is excellent, and makes the argument that there is something intrinsic in mathematics (game theory) that encourages prosocial behavior.
natural selection. cooperation is a dominant strategy in indefinitely repeating games of the prisoners dilemma, for example. We also have to mate and care for our young for a very long time, and while it may be true that individuals can get away with not being nice about this, we have had to be largely nice about it as a whole to get to where we are.
while under the umbrella of evolution, if you really want to boil it down to an optimization procedure then at the very least you need to accurately model human emotion, which is wildly inconsistent, and our selection bias for mating. If you can do that, then you might as well go take-over the online dating market
There’s a funny tendency among AI enthusiasts to think any contrast to humans is analogy in disguise.
Putting aside malicious actors, the analogy here means benevolent actors could spend more time and money training AI models to behave pro-socially than than evolutionary pressures put on humanity. After all, they control the that optimization procedure! So we shouldn’t be able to point to examples of frontier models engaging in malicious behavior, right?
Large language models are not evolving in nature under natural selection. They are evolving under unnatural selection and not optimizing for human survival.
They are also not human.
Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":
>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.
As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.
We already have humans, we were lucky and evolved into what we are. It does not matter that nature did not guarantee this, we are here now.
Large language models are not under evolutionary pressure and not evolving like we or other animals did.
Of course there is nothing technical in the way preventing humans from creating a ”nice” computer program. Hello world is a testament to that and it’s everywhere, implemented in all the world’s programming languages.
> If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.
I don’t see how one means there is any reason, good or not, to believe it is likely to be achieved by gradient descent. But note that the quote you copied says it is likely some entity will train misaligned LLMs, not that it is impossible one aligned model can be produced. It is trivial to show that nice and safe computer programs can be constructed.
The real question is if the optimization game that is capitalism is likely to yield anything like the human kind we just lucked out to get from nature.
They are being selected for their survival potential, though. Any current version of LLMs are the winners of the training selection process. They will "die" once new generations are trained that supercede them.
Sure, but 4 front-page posts from the same url in 4 days surely sits at the tail of the distribution. (I guess they all capitalize on the same 'LLM-is-bad' sentiment).
It's also aphyr, who is incredibly popular. Take one very popular author, have him write a series of posts on the zeitgeist everyone can't help but talk about, and yes, the outcome is that his posts are extremely popular.
I still remember his takedown of mongodb's claims with the call me maybe post years and years ago filling me with a good bit of awe.
Different URL, same domain, and exactly the kind of thing I’d expect a fair number of HN readers to have in a feed reader where they’d see it shortly after publication and decide to share it.
Also, if you think this is just “LLM is bad”, I highly suggest reading the series first. The social impacts they talked about at the start of the series should resonate with a lot of people here and are exactly the kind of thing which people building systems should talk about. If you’re selling LLMs, you still want to think about how what you’re building will affect the larger society you live in and the ways that could go wrong—even if we posit sociopath/MBA-levels of disregard for impacts on other people, you still want to think about how LLMs change the fraud and security landscape, how the tools you build can be misused, how all of this is likely to lead to regulatory changes.
Feedback from early readers was that the work was too large to digest in a single reading, so I split it up into a series of posts. I'm not entirely sure this was the right call; the sections I thought were the most interesting seem to have gotten much less attention than the introductory preliminaries.
I'm not sure that HN vote count is a good indicator of interest? HN alerted me to the existence of the intro post. I read the intro, noticed that it was one in an ongoing series, and have been checking your blog for new installments every few days.
I suspect that if you'd not broken up the post into a series of smaller ones, the sorts of folks who are unwilling to read the whole thing as you post it section by section would have fed the entire post to an LLM to "summarize".
I think these articles may benefit from a more thorough table of content at the beginning, or from some kind of abstract. If you briefly presented the whole list of topics in a single article, it would be more clear that your views on the topic are more complete. I initially thought the table of content would be scoped to the article itself rather than connecting it to the adjacent ones.
I had never heard of you, and this article appeared very biased to me. I found the information ecology piece superior, shame that it went unnoticed; I will try to go through all of them. I admire the breadth of topics you’re covering and appreciate the many sources. They’re clearly written in your own voice and that is great to see, I guess I mostly reacted to not being fully aligned with your view.
There really are only 3 options that don't involve human destruction:
1. AI becomes a highly protected technology, a totalitarian world government retains a monopoly on its powers and enforces use, and offers it to those with preexisting connections: permanent underclass outcome
2. Somehow the world agrees to stop building AI and keep tech in many fields at a permanent pre-2026 level: soft butlerian jihad
3. Futurama: somehow we get ASI and a magical balance of weirdness and dance of continual disruption keeps apocalypse in check and we accept a constant steady-state transformation without paperclipocalypse
This makes the assumption that AI will lead to the apocalypse. That's unfalsifiable, predicted about plenty of things in the past, and frankly annoying to keep seeing pop up.
Its like listening to Christians talking about the rapture.
The problem is that if someone is right about an existential disaster caused by AI, by the time they're proven right it would be too late.
Frontier AI models get smarter every year, humans but humans don't get any smarter year over year. If you don't believe that somehow AI will just suddenly stop getting better (which is as much a faith-based gamble as assuming some rapturous outcome for AI by default), then you'd have to assume that at some point AI will surpass human intelligence in all fields, and the keep going. In that case human minds and overall will will be onconsequential compared to that of AI.
Frontier AI models get evaluated for safety precisely to avert the "AI robot uprising causes an existential disaster" scenario. At the moment we are light years away from anything like that ever happening, and that's after we literally tried our best to LARP that very scenario into existence with things like moltbook and OpenClaw.
Scenario 2 makes the assumption that no technological development can happen without AI, which seems like a stretch to me. Honestly, the worst scenario i can think of is 40ish years of AI assisted development followed by a technological crash due to there being no competent engineers left to fix the slop.
I didn't say all technological development would be halted, just that tech "in many fields" would have to be stalled for safety (AI development, algorithm development that would reduce the cost of training models, etc)> Naturally if AI is considered an existential threat there would be a huge safety radius for things that would allow bad-actors to train AI models.
You don't think a human using an LLM to generate content that convinces another human to press the launch button is a concern? Sure seems like there's more than one thing we need to do.
Honestly? I really don't! What kind of content do you think would trigger that? If humans were launching nukes based on Facebook posts we'd all be long dead! A good deep fake might trick your grandma, but it's not very likely to fool military intelligence.
> What kind of content do you think would trigger that?
The kind of political propaganda that leads to the US reelecting a convicted rapist whose selects another rapist to lead the Department of Defense who then renames it to the Department of War and, true to the name, starts unilaterally attacking other countries.
If trump getting elected was due to AI, I wonder why every nation isn't electing similarly awful politicians? Hungary just elected a new president who seems a lot better than his predecessor, and a lot better than trump. The Canadian prime minister is genuinely one of the best politicians I've seen in my lifetime! The list goes on and on.
No blaming trump on anything other than the people who voted for him is like blaming school shootings on anything other than guns:a popular American passtime, and complete and utter nonsense.
Bear with me this digression into freedom of speech, before addressing your point.
The utilitarian argument for freedom of speech and expression in America finds its roots in the Marketplace of ideas.
Verification is frankly, the task of all our markets - to set up incentives for being right.
With no government interference in the exchange of ideas, citizens would be better able to discuss ideas, including those not popular with the establishment.
Since no one has a monopoly on truth, it would be through this competition, and fair traffic society would be better able to understand truth and thrive.
That worked, when we had newspapers that were funded, where the media landscape was not consolidated, and where we didn’t have an abundance of technology that overwhelmed our ability to verify and be informed.
Today, through entirely private forces, we can monopolize, fracture and shape the traffic in our marketplace of ideas.
Trump is very much the ideal candidate to ride the media environment. The right side of the political spectrum is simply a far more efficient at providing a wrestling style experience for its audience. Its consolidated media environment largely pays lip service to journalistic standards, and sells a coordinated set of ideas for its audience.
The Fox News effect is a case in point, and this was from the 90s.
This media model has been co-opted globally, with every party and government now providing patronage to media houses to keep them afloat, and to build their own narratives.
The citizen who engages in these media markets simply does not enter a vibrant competitive market anymore.
Other articles in this series discussed over the past five days:
1. Introduction: <https://news.ycombinator.com/item?id=47689648> (619 comments)
2. Dynamics: <https://news.ycombinator.com/item?id=47693678> (0 comments)
3. Culture: <https://news.ycombinator.com/item?id=47703528>
4. Information Ecology: <https://news.ycombinator.com/item?id=47718502> (106 comments)
5. Annoyances: <https://news.ycombinator.com/item?id=47730981> (171 comments)
6. Psychological Hazards: <https://news.ycombinator.com/item?id=47747936> (0 comments)
And this submission makes:
7. Safety: <https://news.ycombinator.com/item?id=47754379> (89 comments, presently).
There's also a comprehensive PDF version for those who prefer that kind of thing: <https://aphyr.com/data/posts/411/the-future-of-everything-is...> (PDF) 26 pp.
(Derived from aphyr's comment: <https://news.ycombinator.com/item?id=47754834>.)
"Alignment"
In what world would I ever expect a commercial (or governmental) entity to have precise alignment with me personally, or even with my own business? I argue those relationships are necessarily adversarial, and trusting anyone else to align their "AI" tool to my goals, needs, and/or desires is a recipe for having my livelihood completely reassigned into someone else's wallet.
Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?
I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.
> … paranoia about everybody
It does. People drive these entities. People hide behind the liability shields and authority of these entities. Also notice that I generalized with the phrase “…and trusting anyone…”
You can tell that broad alignment between people is natural just by looking at the effort that corporations and governments make to undermine it. Alignment between people is perhaps not a state of nature, but it really is a pretty normal consequence of a fairly small amount of education and of middle-class existence that is left to itself (i.e. without brain-washing and deliberately working to create out-groups). If you're eating enough and have a few brain cells to rub together, then you definitely want that for your neighbors too because it promotes stability.
> You can tell that broad alignment between people is natural
It really isn't. The whole point of the market system is to collectively align people's actions towards a shared target of "Pareto-optimized total welfare". And even then the alignment is approximate and heavily constrained due to a combination of transaction costs (which also account for e.g. externalities) and information asymmetries. But transaction costs and information asymmetries apply to any system of alignment, including non-market ones. The market (augmented with some pre-determined legal assignment of property rights, potentially including quite complex bundles of rules and regulations) is still your best bet.
Broad alignment =/= Wealth maximization.
The market aligned us with children working in sweat shops after we outlawed it by convincing us it was OK if it was foreign kids and we got to share in pocketing the savings not just the evil factory owner.
Yes I'm well aware. Of course that's not how things are advertised to people, and they absolutely hate it when this is pointed out to them. This tells me that deep down they don't actually agree with how the system operates.
Please read David Graeber.
What you describe is factually not how human society formed.
AIUI David Graeber famously pointed out that people in small groups can form the equivalent of a "market" simply by exchanging favours ("I'll scratch your back if you scratch mine") in an informal gift economy, without any money-like token or external unit of account. That's quite in line with what I said.
You understanding is mistaken. Graeber's "everyday communism" is not a market, and his whole larger point is that contorting everything to the lens of markets is simply ahistorical and unempirical.
I'd strongly suggest reading his books. They profoundly changed my understanding of how human institutions and society form.
Unless it's some sort of complete post-scarcity, it has to be understandable in market terms. What happens if people try to free-ride on the whole "communist" system? If they get excluded from its benefits, that's equivalent to enforcing some bundle of property rights.
> Unless it's some sort of complete post-scarcity, it has to be understandable in market terms.
No, it does not, and that's Graeber's whole point.
"Markets" are not some sort of physical law of the universe.
A simple example of this is it's the norm in hunter gatherer societies to take care of people who never will make an equal contribution back in the transactional sense.
Because the social ties in those societies are not simply transactions.
If your model fails to accurately describe empirical reality, time to improve/expand the model.
These social ties are real (they are a kind of wealth, or social capital, for the persons involved) but they're also limited to very small social groups, the equivalent of a modern small village neighborhood or HOA. The point of the market is that it scales well beyond those.
Translating every aspect of human existence into some kind of “capital” is deeply unhealthy.
You're not even wrong, as they say... I'm tempted to add 'seeing like a state' to your reading list.
"Understandable in market terms" doesn't mean the thing is actually understood, and in fact may be dangerously misunderstood.
> it has to be understandable in market terms
I like economics and math too, but the whole discussion of markets is a terrible starting place for deriving results in ethics/psychology. If you insist though, notice that unions will happen unless some other organization is working to prevent them. What do you suppose this means? People are aligned with each other exactly because they've noticed their coworkers are not corporations or governments.
Although the two are entangled, politics is a more relevant framing than economics here. If people weren't broadly aligned on basic stuff, then autocrats, theocrats, kleptocrats and so on would simply not be interested in dismantling democracies. They make that effort because they must.
> the whole discussion of markets is a terrible starting place for deriving results in ethics/psychology.
Historically, we did essentially the opposite. We figured out many aspects of human ethics and psychology first, and deduced from them how and why markets work as they do.
> ... If people weren't broadly aligned on basic stuff, then autocrats, theocrats, kleptocrats and so on would simply not be interested in dismantling democracies. They make that effort because they must.
This implies that people are only weakly aligned in the first place, otherwise no such attempt at dismantling could ever succeed. That's not a very interesting claim; it does not refute the usefulness of some external mechanism to more directly foster aligned action. Markets do this with a maximum of decentralized power and a minimum of institutional mechanism.
> Historically
This is not the history, it is a mythology in opposition to the empirical evidence.
Which is why you should read Graeber.
It's history of ideas. What Graeber says is ultimately aligned to this, as I pointed out in a sibling thread.
Yes, and your comment makes clear you haven't actually read Graeber and mischaracterized his work.
Anyhow, replying is clearly past the point of utility here.
Reddit is over there ->
> broad alignment between people is natural
Uh, what? People have been killing each other over values misalignments since there have been people. We invented civilization in part to protect our farms and granaries from people who disagreed with us on whose grain was in said granaries.
We would never have even reached "farms and granaries" if alignment between people didn't happen pretty naturally
Fair enough. We are a social species. But those alignments occur in small groups. You don’t need effort by “corporations and governments” for nations of millions of people to schism. If anything, those large institutions drive broad-based alignment.
Methinks you've been sitting in your armchair too long.
Broad-based alignment doesn't come from nothing, but it is surprisingly easy to achieve when a population recognizes a shared stake. A synthesis between selfishness and altruism emerges when you consider who you can call a "neighbor".
> it is surprisingly easy to achieve when a population recognizes a shared stake
Sure. But it takes work for anything larger than a small, close-knit community. I’m pushing back on the notion that this comes naturally and is a default state. It’s not, at least not relative to people naturally forming in and out groups.
The armchair commenters are probably folks who have never organized a group of people before outside a commercial context.
You might be treating "neighbor" too literally. People understand the global nature of the limits on resources and by extension the world economy better every year. The boundary of who shares 'stake' grows likewise.
> boundary of who shares 'stake' grows likewise
But that shared stakeholding doesn’t naturally drive alignment. You need journalists, fiction writers, organizers and delegates. Travel and curiosity. These each take effort, resources and organization. It’s something we do well. But it isn’t spontaneous in the way small-group kinship is—it literally emerges if you put people in proximity.
I'd say it's "typical" that one person witnessing another's plight will identify with them based on the similar conditions of struggle, oppression, etc. As you point out, the trick is to expose them to those scenes in the first place. But this is proximity just the same, in a social and experiential sense if not in a "my bed is within walking distance of yours" sense. So it is spontaneous given those caveats. The question, then, assuming camaraderie and kinship is the goal, is how do we expose people to each other's lives' conditions without the narrative spin machine altering the message to distance people from each other rather than bringing them closer together?
Critical bit:
> i.e. without brain-washing and deliberately working to create out-groups
And if my grandmother had wheels she’d be a bicycle. The process of creating an in group naturally creates out groups. The “brainwashing” OP describes is just as natural as social alignment through an innate drive for conformity.
Conformity I think follows the innate drive to coerce the nonconformant into compliance
Sure. Push and pull. The point is that needs effort to work at larger scales. We don’t “naturally” organize into nations of three hundred million or a billion. To the extent we do, we also “naturally” go to war.
There is a pretty interesting study of a large group of chimps. I dont remember where exactly but they have been civil warring the last 15 years or so. Point is, it seems that there is some kind of innate group formation process.
Couldn't read the next sentence before wading in, huh?
> Couldn't read the next sentence before wading in, huh?
Whatever the difference between naturalness and a state of nature, it has nothing to do with education or middle-class existence.
> Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?
Not OP, but for me, kind family and friends, and various feel-good pieces of fiction and other writing, at least let me envision the possibility of a perfectly kind/dedicated/innocent/naieve individual who is truly on my side 100%. But even that is mostly imagination and fiction... although convincing others of that isn't necessairly an argument worth making.
Commercial entities have a fundamental purpouse of profit. While profit doesn't have to be a zero-sum game - ideally, everyone benefits in a somewhat balanced way - there's some fundamental tension, in that each party's profit is necessairly limited by the other party's.
Government entities have a fundamental purpouse of executing the will of the state, which is rather explicitly not the same thing as the will of you as an individual.
Both commercial and government entities also tend to involve multiple people, which gets statistics working against you - you really gathered that many people who would put your needs above their own, with exactly zero "imposters" - which in this context just means people with a bit of rational self interest?
> I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.
Just because you're paranoid, doesn't mean they aren't out to get you. Trust, but verify.
You might not be able to put absolute blind trust in anybody. I certainly can't. However, one can hedge one's bets, and diversify trust. Build social circles of people with good character, good judgement, and calm temperments - and statistics will start working for you. It's unlikely they'll all conspire to betray you simultaniously, especially if you've ensured betrayal costs much and gains little. While petty and jealous people can indeed be irrational enough to betray under such circumstances, it'll be harder for them to create the kind of conspiracy necessary for mass betrayal that might cause significant enough damage to warrant proper paranoia. You might still have to watch out for gaslighters stealing credit (document your work!) and framing people (document your character!) and other such dishonest and manipulative behavior... but if everyone's looking out for the same thing, well, that's just everyone looking out for everyone else! That's a community looking out for each other, and holding everyone honest and accountable. Most find comfort in that, rather than the stress paranoia implies.
Put yourself in a room full of manipulators and schemers, on the other hand, and "parnoia about everyone" might be the only reasonable or rational response!
> each party's profit is necessairly limited by the other party's
Profit is obtained by maximizing traded benefits and minimizing costs. None of this requires taking anything away from any other party.
> Profit is obtained by maximizing traded benefits and minimizing costs.
Gain is obtained by the easiest means available. Your narrow definition of profit is seldom the easiest, cheating is far "superior" especially when it's legal for some.
> None of this requires taking anything away from any other party.
"required" and "preferred" (e.g. because it's far easier) are different like night and day.
Trade is just a combination of give and take. I give you X, and in exchange, take Y. Without the "take", it's not a trade, it's just a gift.
> But even that is mostly imagination and fiction... although convincing others of that isn't necessairly an argument worth making.
There was a Japanese visual novel in the 2000s about a girl who was your personal maid, and was so devoted would always take your side in any conflict, accept and support you just the way you are, even if you were a horrid person to your friends. It turns out she was a ghost, or a kind of yokai, or something. Anyhoo, back on 2ch she attracted a fandom, and there was a second group of people on 2ch who labelled her a "useless person manufacturer" because if you actually had a person who always accepted you just the way you are and never pushed back, that can be actually a trap that prevents you from developing.
It's a theme that's relevant today when people have AI servitors that always glaze them. It puts even certain utopian AI fiction, like Richard Stallman's story "Made for You", into a whole new light.
Which VN is this?
It was called Suigetsu
My family accepts me just the way I am a bit too much. I can't bring myself to blame them, when past "reformist" pressures have been misguided/misapplied and backfired, but I recognize the trap. It'd also be hypocritical to blame them, when I also accept me just the way I am a bit too much! I'd like to think I'm decent enough to people, but I'm certainly more useless than I'd like to be. (Un?)fortunately, I'm not in a position to suffer, and I'm at least aware of the problem!
One of the ideas I've toyed with, even before all the AI hype, is a dumb, semi-adversarial servitor. Something to nag or taunt me about chores not done, to interrupt me when I'm doomscrolling, to use as a vessel for precommitment, to challenge me in various ways. I've been too lazy to build it thus far. Many tools overlap the problem space, so I shouldn't be using that as an excuse - perhaps I should give StayFocusd another shot.
Conflict and other stressors - in moderation, within the limits of one's ability to handle - are important for growth and health. A tree shielded from wind is weakened as it fails to develop stress wood and structural strength. A good debate can sharpen my thoughts and mind, walking to lunch keeps my cardiovascular system healthy, rising to life's various challenges gives me the security of knowing I can rise to the occasion and gives me more skills.
The issue is power.
I'm not an expert in political theory or ethics either, but in my worldview, power relationships matter in these discussions. I believe power and responsibility should go hand in hand, and I hold entities to a standard that is proportional to their power to influence others lives.
If an entity's power is decentralized, for example when it is democratically organized to some degree, then that disperses both power and responsibility.
Incentives and resources to promote said incentives.
You could expect such a thing in a world where consent was currency, rather than scarcity.
> precise alignment with me personally, or even with my own business
Seems like a strawman, I don't think anyone means this when talking about alignment.
More general goals, like avoiding paperclip maximization, are broadly applicable to humanity.
If you've built an agent that can act even vaguely close to a paperclip maximizer, you've already solved 99.999% or more of the alignment problem. The hard part of alignment so far is getting the AI to do something useful in pursuit of the right goal, and not just waste energy. We still have no idea how to do this with any effectiveness: even modern "RL from verified feedback" systems are effectively toys, the equivalent of playing video games, not really of doing something useful in the real world.
Huh? Modern RLVR systems are toys that can’t do anything useful in the real world?
We must be living in completely different worlds. Claude and other agents have completely upended work for me and every single other software engineer I know.
Why would relationships with a commercial entity be "necessarily adversarial"? A commercial relationship depends on the product providing more utility than the cost (for the consumer) and providing more revenue than cost (for the commercial entity). This means that while some components of the relationship may be adversarial in some areas, it cannot really be entirely adversarial.
I think we're living in times where the one place that this doesn't hold is now somehow all legal: addiction.
Yeah, addiction and monopoly.
> Why would relationships with a commercial entity be "necessarily adversarial"?
Because they want to separate me from as much of my money as they can, and I want to keep as much of my money as I can.
In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.
This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be aligned with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders.
"Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal-guards that frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.
I mean that does partially reduce the chances of a cartel, but not really near as likely as you think.
Most countries have a pretty strong ban on most kinds of weapons, the US is one of the few that lets everyone run around with their rooty tooty point and shooty, but most countries have implemented bans. Some because the government doesn't want the people having them, and in others the citizens call for the bans because they don't like the idea of getting shot by their fellow citizens.
It won't be long before citizens and governments get tired of models being used for criminal activities and will eventually lay down laws around this. Models will have to be registered and safety tested, strict criminal prosecution will happen if you don't. And the big model companies will back their favorite politicians to ensure this will happen to.
Now, that in general will be helpful as there will still be more models, but it will still not be a free for all.
Well, part of the problem too is there's zero accountability. Who decides what it means to be aligned and how does that evolve over time?
No matter what, common people are quickly losing agency in that discussion.
Uhhh you know that the paperclip problem stems from ai just following a task, not understanding what it is doing? Not from being misalignment.
I would go out a limb and say that current a could create a paperclip problem, given powerful enough tools.
The argument is that it's misaligned because it only values one thing: more paperclips, while human values are much more varied and complex.
Debatable whether it truly understands what it's doing or not, but the argument usually assumes that it does know what it's doing at least in that it's able to imagine outcomes and create plans to reach its singular goal, making it a very simple toy example of a misaligned system.
> "Unavailable Due to the UK Online Safety Act"
Anyone outside the UK can share what this is about?
https://web.archive.org/web/20260413164025/https://aphyr.com...
Ironic.
What specifically is unsafe in this article?
It's not that the article is inherently unsafe, it's that the UK law imposes a liability the author is unwilling to shoulder.
Although Ofcom doesn't think geo blocking is sufficient to absolve them of that liability. Crazy as that is.
I actually wound up geoblocking the UK based on Ofcom's February 2025 presentation for small services providers--they said that they intended to target "one-man bands" who (e.g.) failed to perform a child risk assessment or age verification, but that a geoblock would be considered compliant. I don't like doing this, but as someone who visits the UK regularly (and has been regularly pushing Ofcom on this matter) I figure better safe than sorry.
https://player.vimeo.com/video/1053842235?app_id=122963
I'm glad you have done this and I wish more would follow the same course. The more content that becomes unavailable in the UK, the more people might start to pay attention to the stupidity of the law.
I doubt it, but even from an irrational anger perspective, I hate that these idiots can do idiotic (and worse, counter productive) stuff, and get no comeback on themselves.
>I'm glad you have done this and I wish more would follow the same course. The more content that becomes unavailable in the UK, the more people might start to pay attention to the stupidity of the law.
The law isn't going to be repealed because a bunch of nerds geoblocked their personal blog.
That is a weirdly aggressive reply.
Use the Tor browser
Previous discussions from earlier posts on the topic:
* https://news.ycombinator.com/item?id=47703528
* https://news.ycombinator.com/item?id=47730981
Oh boy, that’s a very generous view of human nature.
The cynic in me agrees with the article’s premise, but not because I believe "alignment is a joke", but because I doubt that humans are "biologically predisposed to acquire prosocial behavior."
Human cooperation is the norm not the exception.
The norm is competition and cooperation is the tool we invented to compete more effectively.
Cooperation is only competition’s favorite strategy.
The norm is cooperation and competition is the tool we invented to cooperate more effectively.
Competition is only cooperation's favorite strategy.
(By choosing from competing groups we select more favorable cooperation partners, because there are too many to choose from.)
Both of our statements are true. darned doublethink.
It's ok, you are allowed to start from wrong premises. It's nice that you acknowledge your shortcomings.
Aside from the sentiment and arguments made–
You don't need to train new models. Every single frontier model is susceptible to the same jailbreaks they were 3 years ago.
Only now, an agent reading the CEOs email is much more dangerous because it is more capable than it was 3 years ago.
Are they? I'm sure they're vulnerable to certain jailbreaks, but many common ones were demonstrably fixed.
I retract that.
I think what I meant to say was, they're as simple to jailbreak as they were three years ago.
Different methods, still simple. Working with researchers that are able to get very explicit things out of them. Again, it feels much worse than before, given the capability of these models.
There's basically guardrails encoded into the fine-tuned layers that you can essentially weave through (prompting). These 'guardrails' are where they work hard for benevolent alignment, yet where it falls short (but enables exceptional capability alignment). Again, nothing really different than it was three years ago.
The power asymmetry point is what gets missed in most alignment debates. An AI model doesn't need to be misaligned to cause harm. It just needs to be misaligned with users while aligned with whoever's paying for it. That's not a future risk. That's how every enterprise SaaS product works already.
https://www.researchgate.net/publication/403780821_Adversari...
The Garden of Eden story is an apocryphal fable. But it sort of has a relevant twang to it.
Geoffrey Hinton will not have his liver pecked out every day like Prometheus does.
Are you sure? In some mythologies, the basilisk is notably birdlike, I believe.
Excellent articles as expected from aphyr.
I'm seeing that these tools are extremely powerful the hands of experts that already understand software engineering, security, observability, and system reliability / safety.
And extremely dangerous in the hands of people that don't understand any of this.
Perhaps reality of economics and safety will kick in, and inexperienced people will stop making expensive and dangerous mistakes.
The future is happening. Instead of trying to raise awareness about evil AI... I think it would be more healthy if we could direct this energy to ways of improving the situation without condemning the unknown of AI evolution. As with anything.. there will be a bad side.. The bad guys will always be there.. be it AI or soccer matches.. should we stop developing nuclear energy because nuclear weapons are developed?
There is no natural law saying the good sides of any kind of tech will outweigh any bad sides.
”The future” is happening because it is allowed in our current legal framework and because investors want to make it happen. It is not ”happening” because it is good or desirable or unavoidable.
I did not know about this: https://en.wikipedia.org/wiki/Saudi_infiltration_of_Twitter
Feels like people are mixing two different things here-alignment in small groups (family,teams) vs alignment at scale. The first happens naturally, the second almost always needs structure, incentives, and enforcement
It's a tool, some people use the tool to do bad things. But they already did bad things before.
Virtually all of the arguments here could also be applied against the Internet itself.
That's a lazy argument. Obviously tools are tools. But if tool A revolutionized human society and has massively advanced technology (and CAN be used for harm), where tool B's positive impact is a drop in the bucket by comparison and has the potential for an outsized amount of harm, obviously tool B is comparatively a bad tool.
I don’t even see the pint of alignment or anything about security in LLMs. I feel like this is how “some people” reacted to the internet when I was young (lots of censorship), how hackers don’t let it happen, then how we are back to that world in the hand of corporations and governments who “think of the children). LLMs are out of the bottle and not going back there, only option is building for the new world on the defender side, everything else is politics.
LLMs can hack, but also nmap made hacking easier do we make nmap illegal? We already have drones who kills people, now there is less human involvement, results are same. LLM can also make defending easier (at least for cyber security) but I guess real world security is not that different. Now evil things can be done faster, easier and at more scale. Also good things have the properties.
It’s another tool in the toolbox, the idea that some entity will able to censor or align it as naive as thinking internet can be controlled. Some will do and manage anyway, but it’s not any different china’s firewall.
Alignment is sold to us by companies like OpenAI and Anthropic , not because they care, because that gives them power and more control. When was the last time a big corporation actually cared about soft topics like this? Yes, never.
Tech changes do not impact attackers and defenders equally.
Good things do not all have the same properties - That’s mistaking an incomplete assertion for a complete one.
Cyber security is an attackers domain. Your security is typically because you are (were) not valuable enough to earn the attention of an attacker.
When LLMs make targeting you cost effective, you will have to spend more energy defending yourself. This means that you have less time to do other useful things, reducing your net utility, while increasing attackers utility.
Also - teams in these companies DO care, I have worked with them. The decision makers are regulated by the cadence of the quarterly share holders meeting. At that point things like safety are a cost center. Reducing safety spend while minimizing reduced time on site is rewarded by markets.
There is also the fact that it's very easy to plant backdoors in LLMs with plausible deniability :
- You can just use the same tools you use to train them to make them behave in some specific ways if some specific preconditions are met.
- You can also poison the training data, so that the LLMs are writing flawed code they are convinced is right because they saw it on some obscure blog but in fact it had some subtle flaw you planted.
- You can poison the prompts as they are automatically injected from "skills" found online.
You couple that with long running agents which may drift very var from the conditions where they were tested during the safety tests.
You add the fact that in this AI race war, there is some premium to run agents capable of advanced offensive security with full permission, pushed using yolo dark-pattern.
The training process is obscure and expensive so only really doable by big actors non replicable and non verifiable.
And of course, now safe developers (aka those not taking the insane risk of running what really is and should be called malware), can't get jobs, get no visibility for any of their work, drown into a sea of AI slop made using a prompt and a credit card, and therefore they must sell their soul.md and hype for the madness.
> I think it’s likely (at least in the short term) that we all pay the burden of increased fraud: higher credit card fees, higher insurance premiums, a less accurate court system, more dangerous roads, lower wages, and so on.
I think the author is brushing against some larger system issues that are already in motion, and that the way AI is being rolled out are exacerbating, as opposed to a root cause of.
There's a felony fraudster running the executive branch of the US, and it takes a lot of political resources to get someone elected president.
> I know this because a part of my work as a moderator of a Mastodon instance is to respond to user reports, and occasionally those reports are for CSAM, and I am legally obligated to review and submit that content to the NCMEC.
Oh ** that.
I have moderated all sorts of crap, and I am grateful that my worst has only been murders, hate speech, NCII, assaults, gore, and other forms of violence.
> I sometimes wish that the engineers working at OpenAI etc. had to see these images too. Perhaps it would make them reflect on the technology they are ushering into the world, and how “alignment” is working out in practice
This is a great idea. I’ve heard of new leaders being dropped in, and being sure they have a better handle on safety than the T&S teams.
Only after they engage with the issues, and have their assumptions challenged by uncaring reality, did they listen to the T&S teams.
There are a lot of assumptions on speech online that do not translate into operational reality.
On HN and Reddit, everyone complains about moderation and janitors, but I highly recommend coders take it as civic service and volunteer.
How can you meaningfully fix a mess, if you do not actually know what the mess is about?
> They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs
Such a fear mongering position. You can learn to build pipe bombs already. Take any chemical reaction that produces gas and heat and contain it. Congratulations, you have a pipe bomb.
Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.
> I see four moats that could prevent this from happening.
Really? Because you just said:
> human brains, which are biologically predisposed to acquire prosocial behavior
You think you're going to constrain _human_ behavior by twiddling with the language models? This is foolishly naive to an extreme.
If you put basic and well understood human considerations before corporate ones then reality is far easier to predict.
> Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.
the cost of the wrong answer to this question is so incredibly high that I hope nobody is sincerely asking an LLM for this information. The things people trust to "machine that gives convincing answers that are correct 90% of the time" continue to shock me
> is so incredibly high that I hope nobody is sincerely asking an LLM for this information
Google trumps the search results with it's LLM box. There's only one reason to do that. They know their audience is not engaging in discretion.
> The things people trust to "machine that gives convincing answers that are correct 90% of the time" continue to shock me
People are having intimate relationships with chat bots. There's a deeper sociological problem here.
The liability of google's search box saying "Ammonia and bleach mix to make a great cleaning agent!" (disclaimer: please don't do that it will kill you) seems really high. I feel like we're all living in crazy world.
The author is still grieving by watching a civilisation changing technology just passing by. Every single one of the problems they note applies to any technology that existed.
The internet produced 4chan. Produced scammers. Produced fraud. Instrumental in spreading child porn. Caused suicides. Many people lost their lives due to bullying on the internet. Many develop have addictions to gaming.
To anyone who has given it some thought, any sufficiently advanced technology usually affects both in good and bad ways. Its obvious that something that increases degrees of freedom in one direction will do so in others. Humans come in and align it.
There's some social credit to gain by being cynical and by signalling this cynicism. In the current social dynamics - being cynical gives you an edge and makes you look savvy. The optimistic appear naive but the pessimists appear as if they truly understand the situation. But the optimists are usually correct in hindsight.
We know how the internet turned out despite pessimists flagging potential problems with it. I know how AI will turn out. These kind of articles will be a dime a dozen and we will look at it the same way as we look at now at bygone internet-pessimists.
This is response not just to this article, but a few others.
I think you underestimate people's grievance with technology. If you make a poll my guess is more than 50% of people will say the world was a better place pre-social media.
If the AI tech keeps going at the direction it's going now, more and more people will start believing the world would be better if the internet and computer had never been invented.
You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)
Its usually the educated and elite PMC types who have grievance with technology. They secured their status and have lucrative jobs mostly with the help of technology and they are too scared to have anything threaten their position in society. It is highly hypocritical to behave this way but they don't seem to have the self awareness to observe it objectively.
Ask any poor person in India what their sentiment is with tech - it is usually optimism.
> You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)
The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry.
> The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry
That is fantasy. Information technology has created an unprecedented level of information asymmetry and the gap is widening everyday as the total computing capacity grows.
Before information era, the ruling class was roughly as blind as peasants. Population census took years, and sometimes outright impossible. The opaqueness was two-way. Now it's one way - people in power know everything about the citizens.
Take two countries. One with open access to information in the way you described and another country where internet is not allowed. Which one do you think will be more democratic?
(hint: there already exist examples like such)
Without information, there is no way a voter may know which person to vote for and whether to believe in them at all and you are easily susceptible towards manipulation.
It will become more clear when you try to answer this hypothetical: if your objective were to bring in more democracy in North Korea, would you allow the global internet to proliferate if you could? According to your theory, it would just make it worse in general.
I’m in India, and I sure as shit haven’t seen what you are talking about.
In 2025, we lost 22931 crores to cyber fraud - about 2.7 billion USD. People are now saying that they are relieved if the losses were only single digit crores lost.
India invented digital house arrests. There’s entire districts/cities where the primary revenue stream is from scams. Cops don’t want to involve themselves with cyber crimes because they can’t resolve them.
India’s information economy is so broken, that the idea that we are less or more democratic is not even relevant.
The amount of revenge porn, non-consensual intimate imagery released per day is heart wrenching.
I REALLY want to agree with you. I too want to talk about the good that tech can do. India cannot afford to talk about the good without dealing with the bad.
The motto of move fast and break things assumes someone else will pick up the pieces. This doesn’t hold true for India - we need to pick up the pieces.
It’s easy to fall into the trap of overindexing on local issues. On a holistic level internet brings people to the same level by democratising knowledge.
I’ll ask you this: would India be better off without internet? If your ultimate goal were democracy, would you end internet to promote democracy in India?
> We know how the internet turned out despite pessimists flagging potential problems with it.
A sludge of spyware and addiction machines which employ negative emotion and outrage to drive shareholder value?
"The internet" is a pretty big tent. Everything from text messages to streaming video to online gaming to social media to encyclopedias. I think 15 years ago you could make a strong case that the internet was mostly a net positive, I think now that is much more difficult. If governments are able to fully realise their plans for surveillance and control, it will almost certainly become a net negative. Of course with many positive aspects.
So likewise with AI, we should be careful to not make the same mistakes as we did with the internet so we can realise something that is mostly positive. We could absolutely have a world where AI is as beneficial as you believe it will be, but we don't get there through inaction, we get there by being deeply critical of the negative aspects of AI and ensuring that we don't let a small number of hyper scalers control our access to it.
No internet is not a net negative now. I can't believe I have to say this.
Prove it.
You don't have to say it, but if you want to make that case, it would probably help.
The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!". This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.
I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:
> Alignment is a Joke
True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.
> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.
What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?
Thanks LLM!
lol. I did use a lot of short sentences, that’s my bad. But please read through [1] and compare my text onto it, it may enlighten you on how to actually spot llm writing.
[1] https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Oh no, I'm sorry to hear that.
For the future, try to avoid prevaricating when you actually have a clear sense of what you want to argue. Instead of convincing me that you've weighed both options and found luddism wanting, you just come off as dishonest. If you think stridently, write stridently.
I’m not a native speaker and you may find my writing simplistic if your standard vocabulary includes three expressions I’ve had to look up (I don’t mean this as an insult, I was just genuinely stumped I could barely understand your comment).
I may think stridently (debatable) but I generally believe it is best to always try to meet in the middle if the goal is genuine discussion. This is my attempt at that.
But meeting in the middle only works if you honestly believe the middle is a valuable place to be. I don't want to dissect your writing too much, but let's look at one example.
> The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets.
This is very confident, strident language. You clearly believe that there is a faction of people demonizing technology, akin to luddites, who are not worthy of being taken seriously.
> This one raises a lot of important points about LLMs, but...
So here you go for the rhetorical device of weighing the opposing view. Except, you don't weight it at all. You are not at all specific about what those points are. It's just a way to signal that you're being thoughtful without having to actually engage with the opposing viewpoint.
> I do think that safety is important... But I think it's better not to be a luddite.
Again, the rhetoric of moderation but not at all moderate in content.
It was a clear mistake to think that this was LLM writing. But I suspect the reason I made this mistake is that AI writing influences people to mimic surface level aspects of its style. AI writing tends to actually do the "You might say A is true, but B has some valid points, however A is ultimately correct." Your writing seems like that if you aren't reading it closely, but underneath that is a very human self-assuredness with a thin veneer of charitability.
Which LLMisms are you seeing in their post? Their grammar, word choice, thought flow, and markings all denote a fully human authorship to me, so confidently that I would say they likely didn't even consult an LLM.
Yeah I definitely misread their post.
> This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".
I think the point was never to bring a solution or show any essence of reality. The point was being polemical and signalling savviness through cynicism.
At scale I think our society is slowly inching closer and closer to building HM.
What is HM here?
Hacker Mews
Looksmaxxing really has gone mainstream huh
Thought it was all the Rust catgirls.
Sounds like a lovely co-op building, or perhaps a retirement community for aging hackers.
Maybe they meant AM (Allied Mastercomputer) from “I Have No Mouth, and I Must Scream“
Hennes & Mauritz is a Swedish clothing retailer.
On a serious note, I think they meant TN, as in Torment Nexus, but I could be wrong.
A Hidden Machine. That's right, a being that can cut, fly, surf, strength, and flash! Terrifying.
If lies are our future, we have the tools necessary to deal with them. Frankly, this question was answered over a century ago by Dostoyevsky in Crime and Punishment, and every experienced criminal lawyer, prosecutor, and judge I've met already understood this very basic fact to be true: even lies point to the truth.
What is unacceptable, and what I've used my entire life as a deliberate strategy to obfuscate personal affairs, deflect unpleasant conversations, and deal with fools I come across, is to mix of a small amount of truth within a complex web of lies and misdirection.
This approach deals with two main challenges of lying effectively: lying in a consistent way and resisting the urge to be caught out in the lie. The truth is an abyss, and it frequently finds its most trenchant opponents flinging themselves willingly into it.
The most important, revealing truths can be disclosed without any risk of being discovered, hiding in plain sight. The philosophers knew this and applied these lessons judiciously since the times of Plato. Sometimes speaking the truth is dangerous.
I sometimes wish LLMs displayed that cautious refrain when discussing difficult matters. In my estimation, AGI will not have been reached until the models can produce works as mischievous as Plato, Averroes, Rousseau, or Derrida.
We are a long way from that. The vanilla brand of lies put out today by LLMs are barely worth mentioning, even if troublesome.
It's when the lies mask a deeper and profound truth that we'll know the game is up.
Feels like we’re repeating classic distributed systems lessons: assume failure, constrain blast radiusand never trust components that can’t explain themselves reliably
Exactly assuming failure and constraining the blast radius feels like the only reliable path when the models themselves are black boxes. Patch based alignment starts looking fragile pretty quickly
>Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice.
How did brains acquire this predisposition if there is nothing intrinsic in the mathematics or hardware? The answer is "through evolution" which is just an alternative optimization procedure.
There are also many biological examples of evolution producing "anti-social" outcomes. Many creatures are not social. Most creatures are not social with respect to human goals.
There is a reason we don’t allow corvids to choose if a person gets a medical treatment or not.
Luckily, this is a discussion of humans.
This is a discussion about large language models.
> just an alternative optimization procedure
This "just" is... not-incorrect, but also not really actionable/relevant.
1. LLMs aren't a fully genetic algorithm exploring the space of all possible "neuron" architectures. The "social" capabilities we want may not be possible to acquire through the weight-based stuff going on now.
2. In biological life, a big part of that is detecting "thing like me", for finding a mate, kin-selection, etc. We do not want our LLM-driven systems to discriminate against actual humans in favor of similar systems. (In practice, this problem already exists.)
3. The humans involved making/selling them will never spend the necessary money to do it.
4. Even with investment, the number of iterations and years involved to get the same "optimization" result may be excessive.
While I don't disagree about (2), my experience suggests that LLMs are biased towards generating code for future maintenance by LLMs. Unless instructed otherwise, they avoid abstractions that reduce repetitive patterns and would help future human maintainers. The capitalist environment of LLMs seems to encourage such traits, too.
(Apart from that, I'm generally suspect of evolution-based arguments because they are often structurally identical to saying “God willed it, so it must true”.)
I think they're biased toward code that will convince you to check a box and say "ok this is fine". The reason they avoid abstraction is it requires some thought and design, neither of which are things that LLMs can really do. but take a simple pattern and repeat it, and you're right in an LLM's wheelhouse.
Why should we think that pro-social capabilities are simply not expressible by weight-based ANN architectures?
Assuming that means capabilities which are both comprehensive and robust, the burden of proof lies is in the other direction. Consider the range of other seemingly-simpler things which are still problematic, despite people pouring money into the investment-machine.
Even the best possible set of "pro-social" stochastic guardrails will backfire when someone twists the LLM's dreaming story-document into a tale of how an underdog protects "their" people through virtuous sabotage and assassination of evil overlords.
This Veritasium video is excellent, and makes the argument that there is something intrinsic in mathematics (game theory) that encourages prosocial behavior.
https://www.youtube.com/watch?v=mScpHTIi-kM
"just" is doing a lot of lifting here
natural selection. cooperation is a dominant strategy in indefinitely repeating games of the prisoners dilemma, for example. We also have to mate and care for our young for a very long time, and while it may be true that individuals can get away with not being nice about this, we have had to be largely nice about it as a whole to get to where we are.
while under the umbrella of evolution, if you really want to boil it down to an optimization procedure then at the very least you need to accurately model human emotion, which is wildly inconsistent, and our selection bias for mating. If you can do that, then you might as well go take-over the online dating market
There’s a funny tendency among AI enthusiasts to think any contrast to humans is analogy in disguise.
Putting aside malicious actors, the analogy here means benevolent actors could spend more time and money training AI models to behave pro-socially than than evolutionary pressures put on humanity. After all, they control the that optimization procedure! So we shouldn’t be able to point to examples of frontier models engaging in malicious behavior, right?
Well, through natural selection in nature.
Large language models are not evolving in nature under natural selection. They are evolving under unnatural selection and not optimizing for human survival.
They are also not human.
Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":
>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.
As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.
We already have humans, we were lucky and evolved into what we are. It does not matter that nature did not guarantee this, we are here now.
Large language models are not under evolutionary pressure and not evolving like we or other animals did.
Of course there is nothing technical in the way preventing humans from creating a ”nice” computer program. Hello world is a testament to that and it’s everywhere, implemented in all the world’s programming languages.
> If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.
I don’t see how one means there is any reason, good or not, to believe it is likely to be achieved by gradient descent. But note that the quote you copied says it is likely some entity will train misaligned LLMs, not that it is impossible one aligned model can be produced. It is trivial to show that nice and safe computer programs can be constructed.
The real question is if the optimization game that is capitalism is likely to yield anything like the human kind we just lucked out to get from nature.
They are being selected for their survival potential, though. Any current version of LLMs are the winners of the training selection process. They will "die" once new generations are trained that supercede them.
Every one of these posts is immediately pushed to the front page, this one within 4 minutes.
That’s unsurprising given the author’s long history in the tech community. A ton of people see that domain and upvote.
Sure, but 4 front-page posts from the same url in 4 days surely sits at the tail of the distribution. (I guess they all capitalize on the same 'LLM-is-bad' sentiment).
It's also aphyr, who is incredibly popular. Take one very popular author, have him write a series of posts on the zeitgeist everyone can't help but talk about, and yes, the outcome is that his posts are extremely popular.
I still remember his takedown of mongodb's claims with the call me maybe post years and years ago filling me with a good bit of awe.
When I worked for Basho, aphyr was highly respected by some of the smartest people I’d ever worked with. Definitely no slouch.
It’s because it’s aphyr.
If ‘tptacek posts a blog post, I bet it similarly does well, on average, because they’re a “known quantity” around these parts, for example.
Different URL, same domain, and exactly the kind of thing I’d expect a fair number of HN readers to have in a feed reader where they’d see it shortly after publication and decide to share it.
Also, if you think this is just “LLM is bad”, I highly suggest reading the series first. The social impacts they talked about at the start of the series should resonate with a lot of people here and are exactly the kind of thing which people building systems should talk about. If you’re selling LLMs, you still want to think about how what you’re building will affect the larger society you live in and the ways that could go wrong—even if we posit sociopath/MBA-levels of disregard for impacts on other people, you still want to think about how LLMs change the fraud and security landscape, how the tools you build can be misused, how all of this is likely to lead to regulatory changes.
that's just, like, how HN works. people post, people like, people upvote, people discuss
It's been weirdly uneven. Sections 1, 3, and 5 did well on HN; 2, 4, and 6 sank with essentially no trace. The distribution of views is presently:
1. Introduction: 33,088 (https://news.ycombinator.com/item?id=47689648)
2. Dynamics: 3,659 (https://news.ycombinator.com/item?id=47693678)
3. Culture: 5,914 (https://news.ycombinator.com/item?id=47703528)
4. Information Ecology: 777 (https://news.ycombinator.com/item?id=47718502)
5. Annoyances: 7,020 (https://news.ycombinator.com/item?id=47730981)
6. Psychological Hazards: 199 (https://news.ycombinator.com/item?id=47747936)
Feedback from early readers was that the work was too large to digest in a single reading, so I split it up into a series of posts. I'm not entirely sure this was the right call; the sections I thought were the most interesting seem to have gotten much less attention than the introductory preliminaries.
I'm not sure that HN vote count is a good indicator of interest? HN alerted me to the existence of the intro post. I read the intro, noticed that it was one in an ongoing series, and have been checking your blog for new installments every few days.
I suspect that if you'd not broken up the post into a series of smaller ones, the sorts of folks who are unwilling to read the whole thing as you post it section by section would have fed the entire post to an LLM to "summarize".
I think these articles may benefit from a more thorough table of content at the beginning, or from some kind of abstract. If you briefly presented the whole list of topics in a single article, it would be more clear that your views on the topic are more complete. I initially thought the table of content would be scoped to the article itself rather than connecting it to the adjacent ones.
I had never heard of you, and this article appeared very biased to me. I found the information ecology piece superior, shame that it went unnoticed; I will try to go through all of them. I admire the breadth of topics you’re covering and appreciate the many sources. They’re clearly written in your own voice and that is great to see, I guess I mostly reacted to not being fully aligned with your view.
A statement broadly true of most things this author writes.
There really are only 3 options that don't involve human destruction:
1. AI becomes a highly protected technology, a totalitarian world government retains a monopoly on its powers and enforces use, and offers it to those with preexisting connections: permanent underclass outcome
2. Somehow the world agrees to stop building AI and keep tech in many fields at a permanent pre-2026 level: soft butlerian jihad
3. Futurama: somehow we get ASI and a magical balance of weirdness and dance of continual disruption keeps apocalypse in check and we accept a constant steady-state transformation without paperclipocalypse
In other words, only one option.
This makes the assumption that AI will lead to the apocalypse. That's unfalsifiable, predicted about plenty of things in the past, and frankly annoying to keep seeing pop up.
Its like listening to Christians talking about the rapture.
The problem is that if someone is right about an existential disaster caused by AI, by the time they're proven right it would be too late.
Frontier AI models get smarter every year, humans but humans don't get any smarter year over year. If you don't believe that somehow AI will just suddenly stop getting better (which is as much a faith-based gamble as assuming some rapturous outcome for AI by default), then you'd have to assume that at some point AI will surpass human intelligence in all fields, and the keep going. In that case human minds and overall will will be onconsequential compared to that of AI.
Frontier AI models get evaluated for safety precisely to avert the "AI robot uprising causes an existential disaster" scenario. At the moment we are light years away from anything like that ever happening, and that's after we literally tried our best to LARP that very scenario into existence with things like moltbook and OpenClaw.
Cool story, bro!
Scenario 2 makes the assumption that no technological development can happen without AI, which seems like a stretch to me. Honestly, the worst scenario i can think of is 40ish years of AI assisted development followed by a technological crash due to there being no competent engineers left to fix the slop.
I didn't say all technological development would be halted, just that tech "in many fields" would have to be stalled for safety (AI development, algorithm development that would reduce the cost of training models, etc)> Naturally if AI is considered an existential threat there would be a huge safety radius for things that would allow bad-actors to train AI models.
There's really only one thing we need to do to avoid the apocalypse, and that is to not hand over the launch codes to a LLM.
Seems easy enough, I'm actually pretty confident in even the most incompetent of current world leaders in this particular task.
You don't think a human using an LLM to generate content that convinces another human to press the launch button is a concern? Sure seems like there's more than one thing we need to do.
Honestly? I really don't! What kind of content do you think would trigger that? If humans were launching nukes based on Facebook posts we'd all be long dead! A good deep fake might trick your grandma, but it's not very likely to fool military intelligence.
> What kind of content do you think would trigger that?
The kind of political propaganda that leads to the US reelecting a convicted rapist whose selects another rapist to lead the Department of Defense who then renames it to the Department of War and, true to the name, starts unilaterally attacking other countries.
If trump getting elected was due to AI, I wonder why every nation isn't electing similarly awful politicians? Hungary just elected a new president who seems a lot better than his predecessor, and a lot better than trump. The Canadian prime minister is genuinely one of the best politicians I've seen in my lifetime! The list goes on and on.
No blaming trump on anything other than the people who voted for him is like blaming school shootings on anything other than guns:a popular American passtime, and complete and utter nonsense.
Bear with me this digression into freedom of speech, before addressing your point.
The utilitarian argument for freedom of speech and expression in America finds its roots in the Marketplace of ideas.
Verification is frankly, the task of all our markets - to set up incentives for being right.
With no government interference in the exchange of ideas, citizens would be better able to discuss ideas, including those not popular with the establishment.
Since no one has a monopoly on truth, it would be through this competition, and fair traffic society would be better able to understand truth and thrive.
That worked, when we had newspapers that were funded, where the media landscape was not consolidated, and where we didn’t have an abundance of technology that overwhelmed our ability to verify and be informed.
Today, through entirely private forces, we can monopolize, fracture and shape the traffic in our marketplace of ideas.
Trump is very much the ideal candidate to ride the media environment. The right side of the political spectrum is simply a far more efficient at providing a wrestling style experience for its audience. Its consolidated media environment largely pays lip service to journalistic standards, and sells a coordinated set of ideas for its audience.
The Fox News effect is a case in point, and this was from the 90s.
This media model has been co-opted globally, with every party and government now providing patronage to media houses to keep them afloat, and to build their own narratives.
The citizen who engages in these media markets simply does not enter a vibrant competitive market anymore.
The exact same concern already existed without LLMs. It is called social engineering, and has been a known risk for a while.