Their codas could be orders of magnitude more ancient than Sanskrit. We don’t know how much meaning they convey, but we do know that they’ll be very difficult to decode. Project CETI’s scientists will need to observe the whales for years and achieve fundamental breakthroughs in AI. But if they’re successful, humans could be able to initiate a conversation with whales.
This would be a first-contact scenario involving two species that have lived side by side for ages. I wanted to imagine how it could unfold. I reached out to marine biologists, field scientists who specialize in whales, paleontologists, professors of animal-rights law, linguists, and philosophers. Assume that Project CETI works, I told them. Assume that we are able to communicate something of substance to the sperm whale civilization. What should we say?
One of the worries about whale/human communication is the potential harm a conversation might cause.
Cesar Rodriguez-Garavito, a law professor at NYU who is advising Project CETI, told me that whatever we say, we must avoid harming the whales, and that we shouldn’t be too confident about our ability to predict the harms that a conversation could cause.
The sperm whales may not want to talk. They, like us, can be standoffish even toward members of their own species-and we are much more distant relations. Epochs have passed since our last common ancestor roamed the Earth. In the interim, we have pursued radically different, even alien, lifeways.
Due to the eruption of Mount Vesuvius in 79 AD, bundles of scrolls were carbonized by the intense heat of the pyroclastic flows. This intense parching took place over an extremely short period of time, in a room deprived of oxygen, resulting in the scrolls’ carbonization into compact and highly fragile blocks. They were then preserved by the layers of cement-like rock.
Using high-resolution CT scans of the scrolls, machine learning, and computer vision techniques, the team was able to read the text inside one of the scrolls without actually unrolling it. I am stunned by how much text they were able to recover from these blackened documents — take a look at this image:
There was one submission that stood out clearly from the rest. Working independently, each member of our team of papyrologists recovered more text from this submission than any other. Remarkably, the entry achieved the criteria we set when announcing the Vesuvius Challenge in March: 4 passages of 140 characters each, with at least 85% of characters recoverable. This was not a given: most of us on the organizing team assigned a less than 30% probability of success when we announced these criteria! And in addition, the submission includes another 11 (!) columns of text - more than 2000 characters total.
If you’re interested, it’s fascinating to read through the whole thing to see just how little they were working with compared to how much they were able to recover. And the best part is, all the contest submissions are open source, so researchers will be able to build each other’s successes. (via waxy.org)
In June 2021 (pre The Bear), New Yorker cartoonist Zoe Si coached Ayo Edebiri through the process of drawing a New Yorker cartoon. The catch: neither of them could see the other’s work in progress. Super entertaining.
I don’t know about you, but Si’s initial description of the cartoon reminded me of an LLM prompt:
So the cartoon is two people in their apartment. One person has dug a hole in the floor, and he is standing in the hole and his head’s poking out. And the other person is kneeling on the floor beside the hole, kind of like looking at him in a concerned manner. There’ll be like a couch in the background just to signify that they’re in a house.
Just for funsies, I asked ChatGPT to generate a New Yorker-style cartoon using that prompt. Here’s what it came up with:
Oh boy. And then I asked it for a funny caption and it hit me with: “I said I wanted more ‘open space’ in the living room, not an ‘open pit’!” Oof. ChatGPT, don’t quit your day job!
Over the weekend, I listened to this podcast conversation between the psychologist & philosopher Alison Gopnik and writer Ted Chiang about using children’s learning as a model for developing AI systems. Around the 23-minute mark, Gopnik observes that care relationships (child care, elder care, etc.) are extremely important to people but is nearly invisible in economics. And then Chiang replies:
One of the ways that conventional economics sort of ignores care is that for every employee that you hire, there was an incredible amount of labor that went into that employee. That’s a person! And how do you make a person? Well, for one thing, you need several hundred thousand hours of effort to make a person. And every employee that any company hires is the product of hundreds of thousands of hours of effort. Which, companies… they don’t have to pay for that!
They are reaping the benefits of an incredible amount of labor. And if you imagine, in some weird kind of theoretical sense, if you had to actually pay for the raising of everyone that you would eventually employ, what would that look like?
It’s an interesting conversation throughout — recommended!
Labyrinth and its many variants generally consist of a box topped with a flat wooden plane that tilts across an x and y axis using external control knobs. Atop the board is a maze featuring numerous gaps. The goal is to move a marble or a metal ball from start to finish without it falling into one of those holes. It can be a… frustrating game, to say the least. But with ample practice and patience, players can generally learn to steady their controls enough to steer their marble through to safety in a relatively short timespan.
CyberRunner, in contrast, reportedly mastered the dexterity required to complete the game in barely 5 hours. Not only that, but researchers claim it can now complete the maze in just under 14.5 seconds — over 6 percent faster than the existing human record.
CyberRunner was capable of solving the maze even faster, but researchers had to stop it from taking shortcuts it found in the maze. (via clive thompson)
The four members of the Beatles, assisted by machine learning technology, come together one last time to record a song together, working off of a demo tape recorded by John Lennon in the 70s.
The long mythologised John Lennon demo was first worked on in February 1995 by Paul, George and Ringo as part of The Beatles Anthology project but it remained unfinished, partly because of the impossible technological challenges involved in working with the vocal John had recorded on tape in the 1970s. For years it looked like the song could never be completed.
But in 2022 there was a stroke of serendipity. A software system developed by Peter Jackson and his team, used throughout the production of the documentary series Get Back, finally opened the way for the uncoupling of John’s vocal from his piano part. As a result, the original recording could be brought to life and worked on anew with contributions from all four Beatles.
Ok, this is a little bit bonkers: HeyGen’s Video Translate tool will convert videos of people speaking into videos of them speaking one of several different languages (incl. English, Spanish, Hindi, and French) with matching mouth movements. Check out their brief demo of Marques Brownlee speaking Spanish & Tim Cook speaking Hindi or this video of a YouTuber trying it out:
The results are definitely in the category of “indistinguishable from magic”.
Photographs have always been an imperfect reproduction of real life — see the story of Dorothea Lange’s Migrant Mother or Ansel Adams’ extensive dark room work — but the seemingly boundless alterations offered by current & future AI editing tools will allow almost anyone to turn their photos (or should I say “photos”) into whatever they wish. In this video, Evan Puschak briefly explores what AI-altered photos might do to our memories.
I was surprised he didn’t mention the theory that when a past experience is remembered, that memory is altered in the human brain — that is, “very act of remembering can change our memories”. I think I first heard about this on Radiolab more than 16 years ago. So maybe looking at photos extensively altered by AI could extensively alter those same memories in our brains, actually making us unable to recall anything even remotely close to what “really” happened. Fun!
But also, one could imagine this as a powerful way to treat PTSD, etc. Or to brainwash someone! Or an entire populace… Here’s Hannah Arendt on constantly being lied to:
If everybody always lies to you, the consequence is not that you believe the lies, but rather that nobody believes anything any longer. This is because lies, by their very nature, have to be changed, and a lying government has constantly to rewrite its own history. On the receiving end you get not only one lie — a lie which you could go on for the rest of your days — but you get a great number of lies, depending on how the political wind blows. And a people that no longer can believe anything cannot make up its mind. It is deprived not only of its capacity to act but also of its capacity to think and to judge. And with such a people you can then do what you please.
This is the incredible and interesting and dangerous thing about the combination of our current technology, the internet, and mass media: “a lying government” is no longer necessary — we’re doing it to ourselves and anyone with sufficient motivation will be able to take advantage of people without the capacity to think and judge.
P.S. I lol’d too hard at his deadpan description of “the late Thanos”. RIP, big fella.
Trillo demonstrated the process to me during a Zoom call; in seconds, it was possible to render, for example, a tracking shot of a woman crying alone in a softly lit restaurant. His prompt included a hash of S.E.O.-esque terms meant to goad the machine into creating a particularly cinematic aesthetic: “Moody lighting, iconic, visually stunning, immersive, impactful.” Trillo was enthralled by the process: “The speed in which I could operate was unlike anything I had experienced.” He continued, “It felt like being able to fly in a dream.” The A.I. tool was “co-directing” alongside him: “It’s making a lot of decisions I didn’t.”
I know, I know. Too much Wes Anderson. Too much AI. But there is something in my brain, a chemical imbalance perhaps, and I can’t help but find this reimagining of the Lord of the Rings in Anderson’s signature style funny and charming. Sorry but not sorry.
So, I would like to propose another metaphor for the risks of artificial intelligence. I suggest that we think about A.I. as a management-consulting firm, along the lines of McKinsey & Company. Firms like McKinsey are hired for a wide variety of reasons, and A.I. systems are used for many reasons, too. But the similarities between McKinsey — a consulting firm that works with ninety per cent of the Fortune 100 — and A.I. are also clear. Social-media companies use machine learning to keep users glued to their feeds. In a similar way, Purdue Pharma used McKinsey to figure out how to “turbocharge” sales of OxyContin during the opioid epidemic. Just as A.I. promises to offer managers a cheap replacement for human workers, so McKinsey and similar firms helped normalize the practice of mass layoffs as a way of increasing stock prices and executive compensation, contributing to the destruction of the middle class in America.
A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place.
No matter which side you come down on in the debate about using AI tools like Stable Diffusion and Midjourney to create digital art, this video of an experienced digital artist explaining how he uses AI in his workflow is worth a watch. I thought this comment was particularly interesting:
I see the overall process as a joint effort with the AI. I’ve been a traditional artist for 2 decades, painting on canvas. And in the last five years I’ve been doing a lot of digital art. So from that part of myself, I don’t feel threatened at all.
I feel this is an opportunity. An opportunity for many new talented people to jump on a new branch of art that is completely different from the one that we have already in digital art and just open up new way of being creative.
I’m not going to make a habit of posting AI generated video and photography here (mainly because most of it is not that interesting) but Pepperoni Hug Spot is just too perfect a name for a pizza place to pass up. And it’s got Too Many Cooks vibes.
[Yesterday I spent all day answering reader questions for the inaugural Kottke.org Ask Me Anything. One of them asked my opinion of the current crop of AI tools and I thought it was worth reprinting the whole thing here. -j]
Q: I would love to know your thoughts on AI, and specifically the ones that threaten us writers. I know you’ve touched on it in the past, but it seems like ChatGPT and the like really exploded while you were on sabbatical. Like, you left and the world was one way, and when you returned, it was very different. —Gregor
A: I got several questions about AI and I haven’t written anything about my experience with it on the site, so here we go. Let’s start with two facts:
ChatGPT moved me to tears.
I built this AMA site with the assistance of ChatGPT. (Or was it the other way around?)
Ok, the first thing. Last month, my son skied at a competition out in Montana. He’d (somewhat inexplicably) struggled earlier in the season at comps, which was tough for him to go through and for us as parents to watch. How much do we let him figure out on his own vs. how much support/guidance do we give him? This Montana comp was his last chance to get out there and show his skills. I was here in VT, so I texted him my usual “Good luck! Stomp it!” message the morning of the comp. But I happened to be futzing around with ChatGPT at the time (the GPT-3.5 model) and thought, you know, let’s punch this up a little bit. So I asked ChatGPT to write a good luck poem for a skier competing at a freeski competition at Big Sky.
In response, it wrote a perfectly serviceable 12-line poem with three couplets that was on topic, made narrative sense, and rhymed. And when I read the last line, I burst into tears. So does that make ChatGPT a soulful poet of rare ability? No. I’ve thought a lot about this and here’s what I think is going on: I was primed for an emotional response (because my son was struggling with something really important to him, because I was feeling anxious for him, because he was doing something potentially dangerous, because I haven’t seen him too much this winter) and ChatGPT used the language and methods of thousands of years of writing to deliver something a) about someone I love, and b) in the form of a poem (which is often an emotionally charged form) — both of which I had explicitly asked for. When you’re really in your feelings, even the worst movie or the cheesiest song can resonate with you and move you — just the tiniest bit of narrative and sentiment can send you over the edge. ChatGPT didn’t really make me cry…I did.
But still. Even so. It felt a little magical when it happened.
I’ve also been using ChatGPT for some other programming projects — we whipped the Quick Links into better shape (it can write Movable Type templating code…really!) and set up direct posting of the site’s links to Facebook via the API rather than through Zapier (saving me $20/mo in the process). It has really turbo-charged my ability to get shit done around here and has me thinking about all sorts of possibilities.
I keep using the word “we” here because coding with ChatGPT — and this is where it starts to feel weird in an uncanny valley sort of way — feels like a genuine creative collaboration. It feels like there is a “someone” on the other side of that chat, a something that’s really capable but also needs a lot of hand-holding. Just. Like. Me. There’s a back and forth. We both screw up and take turns correcting each other’s mistakes. I ask it please and tell it thank you. ChatGPT lies to me; I gently and non-judgmentally guide it in a more constructive direction (as you would with a toddler). It is the fucking craziest weirdest thing and I don’t really know how to think about it.
There have only been a few occasions in my life when I’ve used or seen some new technology that felt like magic. The first time I wrote & ran a simple BASIC program on a computer. The first time I used the web. The first time using a laptop with wifi. The first time using an iPhone. Programming with ChatGPT over the past few weeks has felt like magic in the same way. While working on these projects with ChatGPT, I can’t wait to get out of bed in the morning to pick up where we left off last night (likely too late last night), a feeling I honestly have not consistently felt about work in a long time. I feel giddy. I feel POWERFUL.
That powerful feeling makes me uneasy. We shouldn’t feel so suddenly powerful without pausing to interrogate where that power comes from, who ultimately wields it, and who it will benefit and harm. The issues around these tools are complex & far-reaching and I’m still struggling to figure out what to think about it all. I’m persuaded by arguments that these tools offer an almost unprecedented opportunity for “helping humans be creative and express themselves” and that machine/human collaboration can deepen our understanding and appreciation of the world around us (as has happened with chess and go). I’m also persuaded by Ted Chiang’s assertion that our fears of AI are actually about capitalism — and we’ve got a lot to fear from capitalism when it comes to these tools, particularly given the present dysfunction of US politics. There is just so much potential power here and many people out there don’t feel uneasy about wielding it — and they will do what they want without regard for the rest of us. That’s pretty scary.
Powerful, weird, scary, uncanny, giddy — how the hell do we collectively navigate all that?
(Note: ChatGPT didn’t write any of this, nor has it written anything else on kottke.org. I used it once while writing a post a few weeks ago, basically as a smart thesaurus to suggest adjectives related to a topic. I’ll let you know if/when that changes — I expect it will not for quite some time, if ever. Even in the age of Ikea, there’s still plenty of handcrafted furniture makers around and in the same way, I suspect the future availability of cheap good-enough AI writing/curation will likely increase the demand and value for human-produced goods.)
In a piece about how the pace of improvement in the current crop of AI products is vastly outstripping the ability of society to react/respond to it, Ezra Klein uses this cracker of a phrase/concept: “the difficulty of living in exponential time”.
I find myself thinking back to the early days of Covid. There were weeks when it was clear that lockdowns were coming, that the world was tilting into crisis, and yet normalcy reigned, and you sounded like a loon telling your family to stock up on toilet paper. There was the difficulty of living in exponential time, the impossible task of speeding policy and social change to match the rate of viral replication. I suspect that some of the political and social damage we still carry from the pandemic reflects that impossible acceleration. There is a natural pace to human deliberation. A lot breaks when we are denied the luxury of time.
But that is the kind of moment I believe we are in now. We do not have the luxury of moving this slowly in response, at least not if the technology is going to move this fast.
Covid, AI, and even climate change (e.g. the effects we are seeing after 250 years of escalating carbon emissions)…they are all moving too fast for society to make complete sense of them. And it’s causing problems and creating opportunities for schemers, connivers, and confidence tricksters to wreck havoc.
It’s capitalism that wants to reduce costs and reduce costs by laying people off. It’s not that like all technology suddenly becomes benign in this world. But it’s like, in a world where we have really strong social safety nets, then you could maybe actually evaluate sort of the pros and cons of technology as a technology, as opposed to seeing it through how capitalism is going to use it against us.
I agree with Ferguson that these AI image generators are, outside the capitalist context, useful and good for helping humans be creative and express themselves. Tools like Midjourney, DALL-E, and Stable Diffusion allow anyone to collaborate with every previous human artist that has ever existed, all at once. Like, just think about how powerful this is: normal people who have ideas but lack technical skills can now create imagery. Is it art? Perhaps not in most cases, but some of it will be. If the goal is to get more people to be able to more easily express and exercise their creativity, these image generators fulfill that in a big way. But that’s really scary — power always is.
In 2020, before the current crop of large language models (LLM) like ChatGPT and Bing, Emily Bender and Alexander Koller wrote a paper on their limitations called Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In the paper, Bender and Koller describe an “octopus test” as a way of thinking about what LLMs are capable of and what they aren’t. A recent profile of Bender by Elizabeth Weil for New York magazine (which is worth reading in its entirety) summarizes the octopus test thusly:
Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other.
Meanwhile, O, a hyperintelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances.
Soon, the octopus enters the conversation and starts impersonating B and replying to A. This ruse works for a while, and A believes that O communicates as both she and B do — with meaning and intent. Then one day A calls out: “I’m being attacked by an angry bear. Help me figure out how to defend myself. I’ve got some sticks.” The octopus, impersonating B, fails to help. How could it succeed? The octopus has no referents, no idea what bears or sticks are. No way to give relevant instructions, like to go grab some coconuts and rope and build a catapult. A is in trouble and feels duped. The octopus is exposed as a fraud.
The paper’s official title is “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” NLU stands for “natural-language understanding.” How should we interpret the natural-sounding (i.e., humanlike) words that come out of LLMs? The models are built on statistics. They work by looking for patterns in huge troves of text and then using those patterns to guess what the next word in a string of words should be. They’re great at mimicry and bad at facts. Why? LLMs, like the octopus, have no access to real-world, embodied referents. This makes LLMs beguiling, amoral, and the Platonic ideal of the bullshitter, as philosopher Harry Frankfurt, author of On Bullshit, defined the term. Bullshitters, Frankfurt argued, are worse than liars. They don’t care whether something is true or false. They care only about rhetorical power — if a listener or reader is persuaded.
The point here is to caution against treating these AIs as if they are people. Bing isn’t in love with anyone; it’s just free-associating from an (admittedly huge) part of the internet.
This isn’t an exact analogue, but I have a car that can drive itself under certain circumstances (not Tesla’s FSD) and when I turn self-drive on, it feels like I’m giving control of my car to a very precocious 4-year-old. Most of the time, this incredible child pilots the car really well, better than I can really — it keeps speed, lane positioning, and distance to forward traffic very precisely — so much so that you want to trust it as you would a licensed adult driver. But when it actually has to do something that requires making a tough decision or thinking, it will either give up control or do something stupid or dangerous. You can’t ever forget the self-driver is like a 4-year-old kid mimicking the act of driving and isn’t capable of thinking like a human when it needs to. You forget that and you can die. (This has the odd and (IMO) under-appreciated effect, when self-drive is engaged, of shifting your role from operator of the car to babysitting the operator of the car. Doing a thing and watching something else do a thing so you can take over when they screw up are two very different things and I think that until more people realize that, it’s going to keep causing unnecessary accidents.)
What I’ve described sounds a lot like ChatGPT, or most any other large-language model. Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
Reframing the technology in that way turns out to be useful in thinking through some of its possibilities and limitations:
There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large-language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large-language models and lossy compression is useful. Repeatedly resaving a jpeg creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.
Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model. If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either.
I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.
Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.
Now if the entire world operates according to — is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now.
Just about everything on the web is on TikTok, and going viral there too, so it shouldn’t be a surprise that people who’ve been laid off are there too, trying to figure out what it all means.
Part of me is cynical about this. You mean that as people, we’re so poorly defined without our jobs that our only resource is to grind out some content about it? But on the other side of the coin, making content is what human beings do. Other animals use tools, but do they make content? Apart from some birds, probably not.
My favorite TikTok layoff video is by Atif Memon, a cloud engineer who offers a clear-eyed appraisal of her situation:
“At the company offsite, we celebrated our company tripling its revenue in a year. A month later, we are so poor! Who robbed us?”
“Even if ChatGPT can take away our jobs, they’ll have to get in line behind geopolitics and pandemic and shareholders and investors. I lost my job because the investors of the company were not sure will become 400x in the coming year. ‘How will we go to Mars?’ Someone else lost their job because the investors thought ‘Hmm, if this other company can lay off 12k people and still work as usual, shouldn’t we also try?”
“Artificial intelligence can never overtake human paranoia and human curiosity. AI can only do what human beings have been doing. Only humans can do what no human has done before.”
A lot to chew on in four minutes.
Update: Apparently this is not native to TikTok, but was posted to YouTube by a comedian, Aiyyo Shraddha. It really is a perfect TikTok story! The video is a ripoff.
Google Research has released a new generative AI tool called MusicLM. MusicLM can generate new musical compositions from text prompts, either describing the music to be played (e.g., "The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls") or more emotional and evocative ("Made early in his career, Matisse's Dance, 1910, shows a group of red dancers caught in a collective moment of innocent freedom and joy, holding hands as they whirl around in space. Simple and direct, the painting speaks volumes about our deep-rooted, primal human desire for connection, movement, rhythm and music").
As the last example suggests, since music can be generated from just about any text, anything that can be translated/captioned/captured in text, from poetry to paintings, can be turned into music.
It may seem strange that so many AI tools are coming to fruition in public all at once, but at Ars Technica, investor Haomiao Huang argues that once the basic AI toolkit reached a certain level of sophistication, a confluence of new products taking advantage of those research breakthroughs was inevitable:
To sum up, the breakthrough with generative image models is a combination of two AI advances. First, there's deep learning's ability to learn a "language" for representing images via latent representations. Second, models can use the "translation" ability of transformers via a foundation model to shift between the world of text and the world of images (via that latent representation).
This is a powerful technique that goes far beyond images. As long as there's a way to represent something with a structure that looks a bit like a language, together with the data sets to train on, transformers can learn the rules and then translate between languages. Github's Copilot has learned to translate between English and various programming languages, and Google's Alphafold can translate between the language of DNA and protein sequences. Other companies and researchers are working on things like training AIs to generate automations to do simple tasks on a computer, like creating a spreadsheet. Each of these are just ordered sequences.
The other thing that’s different about the new wave of AI advances, Huang says, is that they’re not especially dependent on huge computing power at the edge. So AI is rapidly becoming much more ubiquitous than it’s been… even if MusicLM’s sample set of tunes still crashes my web browser.
Neural Radiance Fields (NeRFs) is a relatively new technique that generates well-lit, complex 3D views from 2D images. If you’ve seen behind-the-scenes looks at how image/motion capture is traditionally done, you know how time-consuming and resource intensive it can be. As this video from Corridor Crew shows, NeRFs changes the image capture game significantly. The ease with which they play around with the technology to produce professional-looking effects in very little time is pretty mind-blowing. (via waxy)
I’m still trying to wrap my mind around it all. There seems to be a correlation between how Alejandro’s work was absorbed and referred to by subsequent filmmakers and how his work was ingested and metabolized by computer programming. But these two things are not the same. I want to say that influence is not the same thing as algorithm. But looking at these images, how can I be sure?
It’s hard to find many shortcomings in the software. It can’t render text. And like many painters and sculptors throughout history, it has trouble getting hands right. I’m nitpicking here. The model contains multitudes. It has scanned the collected works of thousands upon thousands of photographers, painters and cinematographers. It has a deep library of styles and a facility with all kinds of image-making techniques at its digital fingertips. The technology is jaw-dropping. And it concerns me greatly.
Using AI image processing software, Hidreley Diao creates photorealistic portraits of familiar cartoon characters. The one of Moe from The Simpsons is kind of amazing — he’s got the look of a long-time character actor who’s developed so much depth over the years that he starts getting bigger roles and everyone’s like, this guy is actually kind of enigmatic and attractive and fantastic.
With nearly instant reaction times, superhuman button tapping frequency, and an inability to fatigue, an AI called StackRabbit can play Tetris better than any human player. But how much better? Well, it can play all the way to the end of the game, which…did you know Tetris ended? I didn’t. But before that happens, it plays flawlessly through hundreds of levels while the game itself is throwing up weirdo color schemes and scores from random places in its memory — the game’s creators didn’t imagine anyone or anything would get anywhere close to these levels. Also, I got surprisingly anxious watching this — it was just so fast with so much constant peril! (via waxy)
There is a moment at the end of the film’s second act when the artist David Choe, a friend of Bourdain’s, is reading aloud an e-mail Bourdain had sent him: “Dude, this is a crazy thing to ask, but I’m curious” Choe begins reading, and then the voice fades into Bourdain’s own: “…and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” I asked Neville how on earth he’d found an audio recording of Bourdain reading his own e-mail. Throughout the film, Neville and his team used stitched-together clips of Bourdain’s narration pulled from TV, radio, podcasts, and audiobooks. “But there were three quotes there I wanted his voice for that there were no recordings of,” Neville explained. So he got in touch with a software company, gave it about a dozen hours of recordings, and, he said, “I created an A.I. model of his voice.” In a world of computer simulations and deepfakes, a dead man’s voice speaking his own words of despair is hardly the most dystopian application of the technology. But the seamlessness of the effect is eerie. “If you watch the film, other than that line you mentioned, you probably don’t know what the other lines are that were spoken by the A.I., and you’re not going to know,” Neville said. “We can have a documentary-ethics panel about it later.”
Per this GQ story, Neville got permission from Bourdain’s estate:
We fed more than ten hours of Tony’s voice into an AI model. The bigger the quantity, the better the result. We worked with four companies before settling on the best. We also had to figure out the best tone of Tony’s voice: His speaking voice versus his “narrator” voice, which itself changed dramatically of over the years. The narrator voice got very performative and sing-songy in the No Reservation years. I checked, you know, with his widow and his literary executor, just to make sure people were cool with that. And they were like, Tony would have been cool with that. I wasn’t putting words into his mouth. I was just trying to make them come alive.
As a post hoc ethics panel of one, I’m gonna say this doesn’t appeal to me, but I bet this sort of thing becomes common practice in the years to come, much like Errol Morris’s use of reenactment in The Thin Blue Line. A longer and more nuanced treatment of the issue can be found in Justin Hendrix’s interview of Sam Gregory, who is an “expert on synthetic media and ethics”.
There’s a set of norms that people are grappling with in regard to this statement from the director of the Bourdain documentary. They’re asking questions around consent, right? Who consents to someone taking your voice and using it? In this case, the voiceover of a private email. And what if that was something that, if the person was alive, they might not have wanted. You’ve seen that commentary online, and people saying, “This is the last thing Anthony Bourdain would have wanted for someone to do this with his voice.” So the consent issue is one of the things that is bubbling here. The second is a disclosure issue, which is, when do you know that something’s been manipulated? And again, here in this example, the director is saying, I didn’t tell people that I had created this voice saying the words and I perhaps would have not told people unless it had come up in the interview. So these are bubbling away here, these issues of consent and disclosure.