Reflections on AI

AI Alignment Is Turning from Alchemy Into Chemistry Alexey Guzey

May 1, 2023 Source

This is my first exposure to Alexey Guzey. I don’t know much about him, but judging from his contact info he seems to work at New Scientist.

I really enjoyed this article, for two reasons. Most importantly, I appreciated Guzey’s main argument, which is that the field of AI research has evolved to the point that AI Alignment should be viewed as a real and important area of study and deserves more investment and attention. The other reason I enjoyed the article is that Guzey expressed a sentiment towards the existing work in this field that I hadn’t seen, which echoes some of the things that I have felt recently as I’ve been learning about AI, but which I haven’t dwelled on or put into words.

To be clear: Guzey’s characterization is much harsher than mine would be. But for example:

ML researchers still steer clear of alignment. […] LessWrong’s i-won’t-get-to-the-point-without-making-you-read-a-10k-word-essay-about-superintelligence wordcelling is a far cry from the mathy papers they’re used to and has a distinctively crackpottish feel to it. And when you’ve got key figures in alignment making a name for themselves with Harry Potter fanfiction rather than technical breakthroughs, I’m honestly not very surprised.

Disclaimer: I don’t know exactly what “wordcelling” means. I think it just means being very showy with writing lots of little-known words and jargon.

Reading this, I realized: I have been reading a lot of articles on LessWrong, and a lot of them do seem quite long and difficult to digest. For example, this article from Eliezer Yudkowsky and this response from Paul Christiano are both written in the form of lists of points… which, ordinarily, I would consider to be an effective style for maximizing comprehensibility. But Yudkowsky’s list has 43 points on it, and Christiano’s has 46! Reading them, I find my brain struggling to synthesize the points they’re making into a high-level summary. The prioritization isn’t clear, many of them overlap with each other, etc.

One more side note: last year I read the book What is Real? by Adam Becker. The central narrative of the book is about how the physicist Niels Bohr and his supporters exerted a powerful influence in the world of theoretical physics and effectively stifled research into alternative interpretations of quantum mechanics that differed from the so-called “Copenhagen interpretation” for nearly a century. This influence manifested as physics professors discouraging their PhD students from tackling certain problems, universities firing and blacklisting scientists who deviated from the Copenhagen orthodoxy, and Bohr himself speaking dismissively of those who challenged his ideas. This may or may not be a fair parallel to draw, but that line “ML researchers still steer clear of alignment” reminded me of this story.

Here’s the part where I actually summarize the article, which will probably be shorter than the preamble.

Guzey discusses the phenomenon of scientific fields starting from a place that is far from rigorous, where nobody knows what they’re doing, and then at some point something happens and it starts turning into real science. This is how alchemy, which was about turning other materials into gold, eventually gave birth to chemistry. It was the same, Guzey argues, for heavier-than-air-flight for many years until the Wright brothers finally managed to find an approach that actually worked.

The same is true for neural networks, where large language models have demonstrated new capabilities quite abruptly as a result of increased computing capacity over the past several years. As for AI Alignment, the key breakthrough that Guzey points to is the publication of an article last year sharing a method for ascertaining what LLMs “know”, i.e. interpreting their internal models, independently of what they “say” in response to prompts.

To the author, this is analogous to a Wright brothers moment, showing that real progress with concrete results is possible in the field of AI Alignment.

He follows this by calling on all would-be AI Alignment researchers to dive in, to not be intimidated by the current state of the field, and to start contributing.

The people who’ll make critical progress in figuring out how the hell neural networks work (and what, if anything, they dream of) won’t have a clue if their stuff will finally work. Maybe alignment will remain alchemy for months or years and everything we do now will end up being useless.

Yet, someone will start doing it at the right time and, given that AGI really does feel to be right around the corner, I believe working on it is a gamble worth taking.