Reflections on AI
Superintelligence Paths, Dangers, Strategies
May 7, 2024
Roughly a year ago (!) I read The Superintelligent Will by Nick Bostrom, an influential paper that introduced the concepts of orthogonality and instrumental convergence. At the time I expressed skepticism in these ideas, which I will write about in further depth at a later time. But I also acknowledged that they were worth taking seriously:
maybe it isn’t that Bostrom is saying, “Orthogonality is definitely true, and instrumental convergence is definitely real.” Rather, I think he is saying, “Orthogonality could be true, and instrumental convergence could be real; and we would be wise to act accordingly.”
Now having read the entire book Superintelligence (which includes the above paper as its 7th chapter), I feel remarkably the same way about the entire thing.
It’s worth noting this book was first published 10 years ago, in 2014. So many of the potential risks Bostrom describes, e.g. the “fast takeoff” version of an intelligence explosion, were likely unfamiliar to most readers when it was written. If this book were published today I would probably judge it more harshly, as Bostrom makes a number of claims that feel surprisingly real and immediate in 2024 but aren’t adequately justified. Recognizing that it’s been a decade since the book was written, I can appreciate that Bostrom probably intended for his ideas to be received in a more academic sense, raising important questions and influencing the direction of research funding and policymaking.
If Bostrom had one simple goal in writing the book (at the risk of being reductive), I think it was probably to increase the proportion of resources invested in AI research allocated to safety.
There is a remarkable section in Bostrom’s Afterword (written a year after publication) that I found to be sadly prescient:
Some AI researchers have started to worry that the public conversation is getting out of control. Those inane Terminator pictures are taking a toll. It can’t be much fun to have aspersions cast on one’s academic discipline, one’s professional community, one’s life’s work. There is a possibility that, in reaction to misguided public alarm about evil robot armies, the AI community might close ranks around a position that denigrates any real concerns about advanced machine intelligence. Insofar as such a response could be specifically targeted at media scaremongering, helping to take it down a notch, a good purpose may be served. But one fears collateral damage. Should a norm arise among AI researchers that it is uncouth to talk about superintelligence or inquire into its possible risks, for fear of “giving ammunition” to critics, fear-mongers, crazies, or would-be regulators, then the recent gains of legitimation could quickly be reversed. We could then enter an “AI safety winter,” a climate in which it would be harder to do the kind of work proposed in the book.
I don’t know whether we are entering a period that could be described as an AI safety winter; but I do see some signs of this pendulum-swing that Bostrom anticipated. I see it in online discourse, for example in the rise of the term “doomer” to describe those who worry about existential risks, or the advent of Effective Accelerationism (also known as e/acc), or simply the gradual transformation we’ve all seen taking place with OpenAI, most notably their shift from a non-profit organization to a “capped-profit” company, a structure which has proven to be resistant to non-profit oversight.
I have seen this also in everyday conversation: recently, as I talked with a group of friends, the topic of AI safety came up and we compared it with the Y2K scare back at the turn of the century. The general sentiment of the group was that sometimes technologists get worked up about things, and that concerns about AI safety will likely end up being similar to concerns about the world coming to an end due to the so-called Y2K bug. In other words, most people in the group were inclined to believe the whole AI safety question is blown out of proportion.
To summarize the book, I will say that a significant percentage of it is intended to convey a few core ideas:
- Something that is superintelligent, i.e. possessing significantly greater cognitive power than any human being, would be likely to achieve something like world domination (Bostrom calls this “decisive strategic advantage”).
- If and when this happens, there is reasonable likelihood that it would happen very rapidly, on a timescale so small that humanity would not be able to react effectively, due to the superintelligence’s ability to self-improve resulting in an “intelligence explosion”.
- The most likely path that will lead to the emergence of such a superintelligence is through the development of machine intelligence or AI.
- Knowing all this, we would be wise to start working on the “control problem” as soon as possible, to minimize the likelihood that we create superintelligent machines that quickly escape our ability to control them and whose values are not aligned with ours.
These ideas were already quite familiar to me based on everything else I have read over the past year. I can see that Bostrom deserves a lot of credit, though, for how influential these ideas have been in the years since the book was written.
What prevents me from finding Bostrom’s arguments entirely persuasive is what I’d describe as a pattern of Bostrom expressing an unjustified sentiment of completeness. His writing sometimes gives me the impression that his attitude is “I’ve thought about this for quite a while, and these are all of the things I can come up with, so these are all of the things there are.”
An example would be the 2nd chapter, on paths to superintelligence. In this chapter Bostrom enumerates many possible ways superintelligence could be created, including strategies of enhancing humans e.g. advances in education, the development of cognition-enhancing drugs, brain-computer interfaces, etc. as well as technologies to emulate brains or more novel forms of machine intelligence. I think if Bostrom were to enter a contest with the theme “Who can list and intelligently describe the highest number of ways that humanity’s efforts could lead to the creation of a superintelligence?” he could very plausibly win. But it isn’t obvious to me that he has credibly identified all possible paths. In fairness, I don’t think Bostrom actually claims that he has listed all possible paths; in fact he probably makes a note in the chapter that there are likely other paths he hasn’t considered.
The issue comes up again in places where he does seem to think he’s covered every possibility. Chapter 3, on forms of superintelligence, is another good example: Bostrom discusses three possible forms, “speed”, “collective”, and “quality” superintelligence; and his position really seems to be that this is the authoritative list. Again this comes up in the 6th chapter, on cognitive superpowers, in which Bostrom lists 6 ways that a superintelligence could demonstrate cognitive superiority to human-level intelligence. This list in particular feels quite arbitrary to me; one superpower is “hacking” i.e. exploiting security vulnerabilities in software, which seems super-specific, while another is “intelligence amplification”, which seems incredibly broad by comparison.
And once more, in the chapter on the superintelligent will, Bostrom’s hypothesized instrumental goals such as self-preservation and goal-content integrity are presented as the official list of traits we can expect to see in any superintelligent system; but I do not see how it is so clear this is THE list. My feeling is that some of these ideas may be true, but some may not be; and even if all of them were true, they are likely not comprehensive (i.e. there are other true ideas that are missing).
In the end, I definitely found the book worth reading. It is packed with a lot of compelling ideas, and Bostrom is a deep thinker and skilled communicator. Had I read the book in 2014, when it was written, I have no doubt it would have significantly influenced how I think about artificial intelligence. In 2024, while I responded skeptically to many of Bostrom’s concrete assertions, I still appreciate the direction of his thinking and believe the points he raises continue to be important.
I suspect that the problems I have with Bostrom’s arguments are ultimately the problems I would have with any set of arguments based on attempts to imagine other intelligences (super- or otherwise). This will be the subject of a future post.