homeaboutarchivenewslettermembership!
aboutarchivemembership!
aboutarchivemembers!

kottke.org posts about ChatGPT

The Octopus Test for Large Language Model AIs

posted by Jason Kottke   Mar 02, 2023

In 2020, before the current crop of large language models (LLM) like ChatGPT and Bing, Emily Bender and Alexander Koller wrote a paper on their limitations called Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In the paper, Bender and Koller describe an “octopus test” as a way of thinking about what LLMs are capable of and what they aren’t. A recent profile of Bender by Elizabeth Weil for New York magazine (which is worth reading in its entirety) summarizes the octopus test thusly:

Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other.

Meanwhile, O, a hyperintelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances.

Soon, the octopus enters the conversation and starts impersonating B and replying to A. This ruse works for a while, and A believes that O communicates as both she and B do — with meaning and intent. Then one day A calls out: “I’m being attacked by an angry bear. Help me figure out how to defend myself. I’ve got some sticks.” The octopus, impersonating B, fails to help. How could it succeed? The octopus has no referents, no idea what bears or sticks are. No way to give relevant instructions, like to go grab some coconuts and rope and build a catapult. A is in trouble and feels duped. The octopus is exposed as a fraud.

The paper’s official title is “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” NLU stands for “natural-language understanding.” How should we interpret the natural-sounding (i.e., humanlike) words that come out of LLMs? The models are built on statistics. They work by looking for patterns in huge troves of text and then using those patterns to guess what the next word in a string of words should be. They’re great at mimicry and bad at facts. Why? LLMs, like the octopus, have no access to real-world, embodied referents. This makes LLMs beguiling, amoral, and the Platonic ideal of the bullshitter, as philosopher Harry Frankfurt, author of On Bullshit, defined the term. Bullshitters, Frankfurt argued, are worse than liars. They don’t care whether something is true or false. They care only about rhetorical power — if a listener or reader is persuaded.

The point here is to caution against treating these AIs as if they are people. Bing isn’t in love with anyone; it’s just free-associating from an (admittedly huge) part of the internet.

This isn’t an exact analogue, but I have a car that can drive itself under certain circumstances (not Tesla’s FSD) and when I turn self-drive on, it feels like I’m giving control of my car to a very precocious 4-year-old. Most of the time, this incredible child pilots the car really well, better than I can really — it keeps speed, lane positioning, and distance to forward traffic very precisely — so much so that you want to trust it as you would a licensed adult driver. But when it actually has to do something that requires making a tough decision or thinking, it will either give up control or do something stupid or dangerous. You can’t ever forget the self-driver is like a 4-year-old kid mimicking the act of driving and isn’t capable of thinking like a human when it needs to. You forget that and you can die. (This has the odd and (IMO) under-appreciated effect, when self-drive is engaged, of shifting your role from operator of the car to babysitting the operator of the car. Doing a thing and watching something else do a thing so you can take over when they screw up are two very different things and I think that until more people realize that, it’s going to keep causing unnecessary accidents.)

Ted Chiang: “ChatGPT Is a Blurry JPEG of the Web”

posted by Jason Kottke   Feb 09, 2023

This is a fantastic piece by writer Ted Chiang about large-language models like ChatGPT. He likens them to lossy compression algorithms:

What I’ve described sounds a lot like ChatGPT, or most any other large-language model. Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.

Reframing the technology in that way turns out to be useful in thinking through some of its possibilities and limitations:

There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large-language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large-language models and lossy compression is useful. Repeatedly resaving a jpeg creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.

Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model. If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either.

Chiang has previously spoken about how “most fears about A.I. are best understood as fears about capitalism”.

I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.

Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.

Now if the entire world operates according to — is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now.

See also Why Computers Won’t Make Themselves Smarter. (via @irwin)