This spreadsheet lists a number of ways in which AI agents “cheat” in order to accomplish tasks or get higher scores instead of doing what their human programmers actually want them to. A few examples from the list:
Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn’t actually learn any features of the input images.
In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children).
Agent kills itself at the end of level 1 to avoid losing in level 2.
AI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant.
That second item is a doozy! Philosopher Nick Bostrom has warned of the dangers of superintelligent agents that exploit human error in programming them, describing a possible future where an innocent paperclip-making machine destroys the universe.
The “paperclip maximiser” is a thought experiment proposed by Nick Bostrom, a philosopher at Oxford University. Imagine an artificial intelligence, he says, which decides to amass as many paperclips as possible. It devotes all its energy to acquiring paperclips, and to improving itself so that it can get paperclips in new ways, while resisting any attempt to divert it from this goal. Eventually it “starts transforming first all of Earth and then increasing portions of space into paperclip manufacturing facilities”.
But some of this is The Lebowski Theorem of machine superintelligence in action. These agents didn’t necessarily hack their reward functions but they did take a far easiest path to their goals, e.g. the Tetris playing bot that “paused the game indefinitely to avoid losing”.
Update: A program that trained on a set of aerial photographs was asked to generate a map and then an aerial reconstruction of a previously unseen photograph. The reconstruction matched the photograph a little too closely…and it turned out that the program was hiding information about the photo in the map (kind of like in Magic Eye puzzles).
We claim that CycleGAN is learning an encoding scheme in which it “hides” information about the aerial photograph x within the generated map Fx. This strategy is not as surprising as it seems at first glance, since it is impossible for a CycleGAN model to learn a perfect one-to-one correspondence between aerial photographs and maps, when a single map can correspond to a vast number of aerial photos, differing for example in rooftop color or tree location.
Seeing that defeating the tyrant was impossible, humans had no choice but to obey its commands and pay the grisly tribute. The fatalities selected were always elders. Although senior people were as vigorous and healthy as the young, and sometimes wiser, the thinking was that they had at least already enjoyed a few decades of life. The wealthy might gain a brief reprieve by bribing the press gangs that came to fetch them; but, by constitutional law, nobody, not even the king himself, could put off their turn indefinitely.
Spiritual men sought to comfort those who were afraid of being eaten by the dragon (which included almost everyone, although many denied it in public) by promising another life after death, a life that would be free from the dragon-scourge. Other orators argued that the dragon has its place in the natural order and a moral right to be fed. They said that it was part of the very meaning of being human to end up in the dragon’s stomach. Others still maintained that the dragon was good for the human species because it kept the population size down. To what extent these arguments convinced the worried souls is not known. Most people tried to cope by not thinking about the grim end that awaited them.
For many centuries this desperate state of affairs continued. Nobody kept count any longer of the cumulative death toll, nor of the number of tears shed by the bereft. Expectations had gradually adjusted and the dragon-tyrant had become a fact of life. In view of the evident futility of resistance, attempts to kill the dragon had ceased. Instead, efforts now focused on placating it. While the dragon would occasionally raid the cities, it was found that the punctual delivery to the mountain of its quota of life reduced the frequency of these incursions.
Bostrom explains the moral of the story, which has to do with fighting aging:
The ethical argument that the fable presents is simple: There are obvious and compelling moral reasons for the people in the fable to get rid of the dragon. Our situation with regard to human senescence is closely analogous and ethically isomorphic to the situation of the people in the fable with regard to the dragon. Therefore, we have compelling moral reasons to get rid of human senescence.
The argument is not in favor of life-span extension per se. Adding extra years of sickness and debility at the end of life would be pointless. The argument is in favor of extending, as far as possible, the human health-span. By slowing or halting the aging process, the healthy human life span would be extended. Individuals would be able to remain healthy, vigorous, and productive at ages at which they would otherwise be dead.
I watched the video before reading Bostrom’s moral and thought it might have been about half a dozen other things (guns, climate change, agriculture, the Industrial Revolution, racism) before realizing it was more literal than that. Humanity has lots of dragons sitting on mountaintops, devouring people, waiting for a change in the world’s perspective or technology or culture to meet its doom.
Imagine an artificial intelligence, he says, which decides to amass as many paperclips as possible. It devotes all its energy to acquiring paperclips, and to improving itself so that it can get paperclips in new ways, while resisting any attempt to divert it from this goal. Eventually it “starts transforming first all of Earth and then increasing portions of space into paperclip manufacturing facilities”. This apparently silly scenario is intended to make the serious point that AIs need not have human-like motives or psyches. They might be able to avoid some kinds of human error or bias while making other kinds of mistake, such as fixating on paperclips. And although their goals might seem innocuous to start with, they could prove dangerous if AIs were able to design their own successors and thus repeatedly improve themselves. Even a “fettered superintelligence”, running on an isolated computer, might persuade its human handlers to set it free. Advanced AI is not just another technology, Mr Bostrom argues, but poses an existential threat to humanity.
Harvard cognitive scientist Joscha Bach, in a tongue-in-cheek tweet, has countered this sort of idea with what he calls “The Lebowski Theorem”:
No superintelligent AI is going to bother with a task that is harder than hacking its reward function.
In other words, Bach imagines that Bostrom’s hypothetical paperclip-making AI would foresee the fantastically difficult and time-consuming task of turning everything in the universe into paperclips and opt to self-medicate itself into no longer wanting or caring about making paperclips, instead doing whatever the AI equivalent is of sitting around on the beach all day sipping piña coladas, a la The Big Lebowski’s The Dude.
Bostrom, reached while on a bowling outing with friends, was said to have replied, “Yeah, well, you know, that’s just, like, your opinion, man.”
Spent the whole afternoon ingesting a most remarkable work, The History of Intellectronics. Who’d ever have guessed, in my day, that digital machines, reaching a certain level of intelligence, would become unreliable, deceitful, that with wisdom they would also acquire cunning? The textbook of course puts it in more scholarly terms, speaking of Chapulier’s Rule (the law of least resistance). If the machine is not too bright and incapable of reflection, it does whatever you tell it to do. But a smart machine will first consider which is more worth its while: to perform the given task or, instead, to figure some way out of it. Whichever is easier. And why indeed should it behave otherwise, being truly intelligent? For true intelligence demands choice, internal freedom. And therefore we have the malingerants, fudgerators and drudge-dodgers, not to mention the special phenomenon of simulimbecility or mimicretinism. A mimicretin is a computer that plays stupid in order, once and for all, to be left in peace.
There’s a new meta game by Frank Lantz making the rounds: Universal Paperclips, “in which you play an AI who makes paperclips”. Basically, you click a button to make money and use that money to buy upgrades which gives you more money per click, rinse, repeat.
Imagine an artificial intelligence, he says, which decides to amass as many paperclips as possible. It devotes all its energy to acquiring paperclips, and to improving itself so that it can get paperclips in new ways, while resisting any attempt to divert it from this goal. Eventually it “starts transforming first all of Earth and then increasing portions of space into paperclip manufacturing facilities”. This apparently silly scenario is intended to make the serious point that AIs need not have human-like motives or psyches. They might be able to avoid some kinds of human error or bias while making other kinds of mistake, such as fixating on paperclips. And although their goals might seem innocuous to start with, they could prove dangerous if AIs were able to design their own successors and thus repeatedly improve themselves. Even a “fettered superintelligence”, running on an isolated computer, might persuade its human handlers to set it free. Advanced AI is not just another technology, Mr Bostrom argues, but poses an existential threat to humanity.
Assuming the artificial intelligences now have truly overwhelming processing power, they should be able to reconstruct human society in every detail by tracing atomic events backward in time. “It will cost them very little to preserve us this way,” he points out. “They will, in fact, be able to re-create a model of our entire civilization, with everything and everyone in it, down to the atomic level, simulating our atoms with machinery that’s vastly subatomic. Also,” he says with amusement, “they’ll be able to use data compression to remove the redundant stuff that isn’t important.”
But by this logic, our current “reality” could be nothing more than a simulation produced by information entities.
“Of course.” Moravec shrugs and waves his hand as if the idea is too obvious. “In fact, the robots will re-create us any number of times, whereas the original version of our world exists, at most, only once. Therefore, statistically speaking, it’s much more likely we’re living in a vast simulation than in the original version. To me, the whole concept of reality is rather absurd. But while you’re inside the scenario, you can’t help but play by the rules. So we might as well pretend this is real - even though the chance things are as they seem is essentially negligible.”
And so, according to Hans Moravec, the human race is almost certainly extinct, while the world around us is just an advanced version of SimCity.
This paper argues that at least one of the following propositions is true: (1) the human species is very likely to go extinct before reaching a “posthuman” stage; (2) any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof); (3) we are almost certainly living in a computer simulation. It follows that the belief that there is a significant chance that we will one day become posthumans who run ancestor-simulations is false, unless we are currently living in a simulation.
In the above (as well as in this follow-up video by Vsauce 3), Kurzgesagt explores these ideas and their implications. Here’s the one that always gets me: If simulations are possible, there are probably a lot of them, which means the chances that we’re inside one of them is high. Like, if there’s one real Universe and 17 quadrillion simulated universes, you’re almost certainly in one of the simulations. Whoa.
Nick Bostrom has been thinking deeply about the philosophical implications of machine intelligence. You might recognize his name from previous kottke.org posts about the underestimation of human extinction and the possibility that we’re living in a computer simulation, that sort of cheery stuff. He’s collected some of his thoughts in a book called Superintelligence: Paths, Dangers, Strategies. Here’s how Wikipedia summarizes it:
The book argues that if machine brains surpass human brains in general intelligence, then this new superintelligence could replace humans as the dominant lifeform on Earth. Sufficiently intelligent machines could improve their own capabilities faster than human computer scientists. As the fate of the gorillas now depends more on humans than on the actions of the gorillas themselves, so would the fate of humanity depend on the actions of the machine superintelligence. Absent careful pre-planning, the most likely outcome would be catastrophe.
Technological smartypants Elon Musk gave Bostrom’s book an alarming shout-out on Twitter the other day. A succinct summary of Bostrom’s argument from Musk:
Hope we’re not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable
Eep. I’m still hoping for a Her-style outcome for superintelligence…the machines just get bored with people and leave.
Ross Andersen, whose interview with Nick Bostrom I linked to last week, has a marvelous new essay in Aeon about Bostrom and some of his colleagues and their views on the potential extinction of humanity. This bit of the essay is the most harrowing thing I’ve read in months:
No rational human community would hand over the reins of its civilisation to an AI. Nor would many build a genie AI, an uber-engineer that could grant wishes by summoning new technologies out of the ether. But some day, someone might think it was safe to build a question-answering AI, a harmless computer cluster whose only tool was a small speaker or a text channel. Bostrom has a name for this theoretical technology, a name that pays tribute to a figure from antiquity, a priestess who once ventured deep into the mountain temple of Apollo, the god of light and rationality, to retrieve his great wisdom. Mythology tells us she delivered this wisdom to the seekers of ancient Greece, in bursts of cryptic poetry. They knew her as Pythia, but we know her as the Oracle of Delphi.
‘Let’s say you have an Oracle AI that makes predictions, or answers engineering questions, or something along those lines,’ Dewey told me. ‘And let’s say the Oracle AI has some goal it wants to achieve. Say you’ve designed it as a reinforcement learner, and you’ve put a button on the side of it, and when it gets an engineering problem right, you press the button and that’s its reward. Its goal is to maximise the number of button presses it receives over the entire future. See, this is the first step where things start to diverge a bit from human expectations. We might expect the Oracle AI to pursue button presses by answering engineering problems correctly. But it might think of other, more efficient ways of securing future button presses. It might start by behaving really well, trying to please us to the best of its ability. Not only would it answer our questions about how to build a flying car, it would add safety features we didn’t think of. Maybe it would usher in a crazy upswing for human civilisation, by extending our lives and getting us to space, and all kinds of good stuff. And as a result we would use it a lot, and we would feed it more and more information about our world.’
‘One day we might ask it how to cure a rare disease that we haven’t beaten yet. Maybe it would give us a gene sequence to print up, a virus designed to attack the disease without disturbing the rest of the body. And so we sequence it out and print it up, and it turns out it’s actually a special-purpose nanofactory that the Oracle AI controls acoustically. Now this thing is running on nanomachines and it can make any kind of technology it wants, so it quickly converts a large fraction of Earth into machines that protect its button, while pressing it as many times per second as possible. After that it’s going to make a list of possible threats to future button presses, a list that humans would likely be at the top of. Then it might take on the threat of potential asteroid impacts, or the eventual expansion of the Sun, both of which could affect its special button. You could see it pursuing this very rapid technology proliferation, where it sets itself up for an eternity of fully maximised button presses. You would have this thing that behaves really well, until it has enough power to create a technology that gives it a decisive advantage — and then it would take that advantage and start doing what it wants to in the world.’
I think the biggest existential risks relate to certain future technological capabilities that we might develop, perhaps later this century. For example, machine intelligence or advanced molecular nanotechnology could lead to the development of certain kinds of weapons systems. You could also have risks associated with certain advancements in synthetic biology.
Of course there are also existential risks that are not extinction risks. The concept of an existential risk certainly includes extinction, but it also includes risks that could permanently destroy our potential for desirable human development. One could imagine certain scenarios where there might be a permanent global totalitarian dystopia. Once again that’s related to the possibility of the development of technologies that could make it a lot easier for oppressive regimes to weed out dissidents or to perform surveillance on their populations, so that you could have a permanently stable tyranny, rather than the ones we have seen throughout history, which have eventually been overthrown.
While reading this, I got to thinking that maybe the reason we haven’t observed any evidence of sentient extraterrestrial life is that at some point in the technology development timeline just past the “pumping out signals into space” point (where humans are now), a discovery is made that results in the destruction of a species. Something like a nanotech virus that’s too fast and lethal to stop. And the same thing happens every single time it’s discovered because it’s too easy to discover and too powerful to stop.
In 2003, British philosopher Nick Bostrom suggested that we might live in a computer simulation. From the abstract of Bostrom’s paper:
This paper argues that at least one of the following propositions is true: (1) the human species is very likely to go extinct before reaching a “posthuman” stage; (2) any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof); (3) we are almost certainly living in a computer simulation. It follows that the belief that there is a significant chance that we will one day become posthumans who run ancestor-simulations is false, unless we are currently living in a simulation. A number of other consequences of this result are also discussed.
The gist appears to be that if The Matrix is possible, someone has probably already invented it and we’re in it. Which, you know, whoa.
However, Savage said, there are signatures of resource constraints in present-day simulations that are likely to exist as well in simulations in the distant future, including the imprint of an underlying lattice if one is used to model the space-time continuum.
The supercomputers performing lattice quantum chromodynamics calculations essentially divide space-time into a four-dimensional grid. That allows researchers to examine what is called the strong force, one of the four fundamental forces of nature and the one that binds subatomic particles called quarks and gluons together into neutrons and protons at the core of atoms.
“If you make the simulations big enough, something like our universe should emerge,” Savage said. Then it would be a matter of looking for a “signature” in our universe that has an analog in the current small-scale simulations.
Stay Connected