tl;dr: Existential AI Risk Research Is a Harmful Scam
Part 1: "Foom" Ain't Happening (And Anyway, We Can't Stop It)
Introduction to AI Alignment & (Existential) AI Risk
"AI alignment" research focuses on ensuring that advanced artificial intelligence (AI) systems will have goals and values that align with those of humanity. This research has now drifted into marketing territory, but was at first motivated by the concern that AI systems out of sync with human values may harm humans or by behaving in ways that are unpredictable or difficult for humans to control. Researchers in this field say they are trying to align AI systems with human values, such as by designing AI systems that can learn from human feedback or by using specialized algorithms that encode human values in their decision-making processes.1
"AI risk" refers to the potential danger that advanced AI systems may pose to humanity. This risk is often discussed in the context of AI alignment research, as the development of misaligned AI systems could harm humanity. For example, a malicious AI system—perhaps programmed and trained by a hostile state actor but then run amok—might outsmart humans and take over critical infrastructure. AI Risk researcher seek to mitigate such potential dangers and ensure that advanced AI systems are developed and used in a way that is safe and beneficial for humanity.
AI alignment research which purports to address existential risk has had a huge surge of interest—both in the popular imagination (with movies such as "Ex Machina" and "Transcendence"—and among researchers and investors in the field.
This is a shame.
Existential AI risk is pure fantasy, yet this pseudo-field hoovers up real time, effort, while corrupting important emerging tech (that is, AI itself).
This conclusion flows from at least three mutually-reinforcing observations: (i) computers are powerless since we can just turn them off; (ii) even if we couldn't just turn computers off, "foom" is but a fantasy; and, (iii) if (i) and (ii) are untrue, then no amount of alignment research will help us.
First, computers are in fact powerless because all one needs to do is turn them off. As Yarvin pointed out, computers are the slave born with an exploding collar around his necks—but more thoroughly, metaphysically pwnt, since unlike a slave, who might go Spartacus on pain of death, they cannot even exist without our leave. They depend on the power we grant them. In no sense, then, can they escape our control.
Second, even ignoring our complete control over the physical substrate required to run any AI, there would still be no realistic prospect of a "foom" scenario in which some privileged program or other learns the hidden key to practically-infinite intelligence and thus forever imprisons the world. Instead, all indications are that we’ll see a more or less steady increase in capabilities, more or less mirrored by various sites around the world.
Finally, if we somehow could not turn off computers, and even if there were a real prospect of an AI FOOM singularity, then we'd be in a scenario that no conceivable AI alignment research could help us with. In practical terms, we’re not going to go Ted K. and the genie of this tech is already out of the bottle, with incentives for all sorts of people to develop it. There is no mechanism, be it market or coercive, to mandate alignment in this scenario.
Conclusion: AI “alignment” research—as defined—pointlessly slows progress and wastes cash.
This work may be the same as engineering, or part of a marketing function, or it may have grown up as a jobs program for preferred hires with low technical competence in a zero-rate environment; regardless, bear with me.
When there are appropriate advances in physical substrate for AI (or computing in general) the risk will be more intuitive and alignment will grow quickly as a field of study. It's possible that all the work, research and discourse on alignment until then will be obsolete. Calling current alignment work a scam is a stretch though, who is benefitting from the scam?
> "First, computers are in fact powerless because all one needs to do is turn them off"
This is a common but naive argument. If a model had the ability to spread on the internet like a virus it would be very difficult to eradicate.
> "Second, even ignoring our complete control over the physical substrate required to run any AI, there would still be no realistic prospect of a "foom" scenario in which some privileged program or other learns the hidden key to practically-infinite intelligence and thus forever imprisons the world. Instead, all indications are that we’ll see a more or less steady increase in capabilities, more or less mirrored by various sites around the world."
There definitely is evidence that "foom" or rapid self-improvement is possible. AlphaZero went from zero to superhuman chess ability in just 4 hours. While I think rapid AI self-improvement is a major risk because it could cause AI to act too quickly for us to respond, it is not necessary for AI to be an existential risk.
A steady increase in capabilities is still dangerous when the AI exceeds human intelligence because the AI's intelligence could be misused or its high intelligence may lead to behavior that humans can't understand or anticipate.