© Financial Times

This is an audio transcript of the Tech Tonic podcast: ‘Can AI help us speak to animals? Part one’

John Thornhill
Listen (chirping birds). What can you hear? Breathe in and listen more closely. If we listen closely to the sounds of the natural world, we can hear a lot more than we first realise. But human hearing is limited, and outside the range of our ears, the world can be a noisy place. If we could expand our hearing to the lower ranges to what’s called the infrasound, we might hear icebergs splitting halfway across the world, even the rhythmic pulsing of the Earth’s crust, as waves crash across its continental shelves. And here’s something else you’d pick up below the human hearing range, the sound of species communicating — elephants, tigers and peacocks interacting in the infrasound. Then in the ultrasound, there’s chatter on coral reefs. Corn plants are clicking (clicking), and mice and beetles (animal sounds) are emitting sound waves at frequencies too high for the human ear. Well, as microphones have got better, this world of sound is opening up to us. But we’re not just hearing things we could never hear before. Scientists are also using the latest AI to process and make sense of the sounds of the animal and plant world. And some now believe we could one day understand what they’re saying, that in fact, we might be on the brink of a Google Translate for the non-human world.

[MUSIC PLAYING] 

This is Tech Tonic from the Financial Times. I’m John Thornhill, the FT’s innovation editor. Generative AI like ChatGPT has been opening up all kinds of possibilities. It can compose music in the style of Mozart. It can copy our voices, write essays and simple stories. And it can instantly translate human languages. So could this same technology also enable us to speak to animals? Could it teach us one day to speak whale? In this episode, I’ll speak to the people trying to make that a reality.

(Waves crashing)

Shane Gero
I’ve known these whales for longer than all of my kids. So when I say I’m at home in Ottawa with my human family, it’s to distinguish the fact that, you know, I spend a lot of time off Dominica with the whale families.

John Thornhill
That’s Shane Gero. He’s a marine biologist and the founder of a sperm whale project off the coast of Dominica, a small island in the Caribbean. He’s so familiar with the whale families there that he’s given them names based on the way they look, names like Pinchy, Fingers, Drop, Can Opener and Double Band.

Shane Gero
The sperm whale is this amazing, awe-inspiring animal of extremes. It has the largest brain that has maybe existed ever in the universe, but also sperm whales have been a part of the human life for a really long time as well. You know, we hunted them and use their oil to power our economy before fossil fuel did. You know, they’re the biblical Leviathan. They’re Melville’s Moby-Dick. So these animals are sort of being awe-inspiring and a part of human culture for a long, long time.

John Thornhill
And when was the first one you saw in real life.

Shane Gero
The first sperm whale I saw in real life? It would have been the late 2003, sort of early 2004. So at that point, I’m 23 or 24 years old. But that was really my first opportunity to do field work on the high seas on a sailboat. That sort of stereotypical picture that we have of whale biologists, you know, following whales around on a boat with a team of researchers.

John Thornhill
Shane studies all different aspects of sperm whale behaviour. But one thing he has spent his career trying to decipher is their communication. Now, when we think about the sounds that whales make (humpback’s warbling sound), we might think of the low warbling sounds of whales song like this humpback recording. But actually the sounds sperm whales make a completely different (sperm whale’s clicking sound). The sperm whale makes this clicking sound by pushing air through the space in its big hollow head. It makes these sounds for two reasons. The first is for echolocation, finding its way around the depths of the ocean. And the second is for communication. Shane spends a lot of time listening to the families he studies at the surface of the water. He goes out on a boat and throws a hydrophone, an underwater microphone, into the water to get recordings of these patterns of clicks known as codas that the whales use to communicate.

Shane Gero
Sperm whale codas are very much more like hands clapping. And those families talk to each other in this sort of Morse code-like sequence of clicks with specific rhythm and tempo.

John Thornhill
Researchers have already learned a lot about some of the information that these codas contain, like the fact that sperm whales even have dialects.

Shane Gero
So animals that spoke the same patterns would spend time together and animals that spoke different codas would not. And so there’s really this social divide between what became called vocal clans that these animals seem to label by using specific codas.

John Thornhill
Off the coast of Dominica, the Caribbean island where Shane works, there are two main distinct sperm whale clans. There’s what researchers call the Eastern Caribbean clan.

Shane Gero
So that’s the set of families that makes this identity coda, which is called the 1+1+3 (clicking sound). And that’s unique to the Caribbean. And it actually takes the calves about two or three years to learn to make it right. They actually babble and make a bunch of errors before learning to make these patterns in a specific way. But there are also other families that live in the Caribbean and that pass through Dominica’s waters but are less common. And we tend to find them more off the southern islands like Martinique and Saint Lucia. And we call that clan the EC2, and they’ll never make that 1+1+3 coda, but instead they make a different, longer, slower coda. We call the 5R because it has five regularly spaced clicks (clicking sound). And that allows us to very reliably, even while listening to the whales in the wild, figure out which clan that family belongs to simply by hearing one or the other of those two codas. But those samples of one of their coda types doesn’t really give you a good picture of what sperm whale conversations sound like (faster, clicking sound following a rhythm).

John Thornhill
Shane recorded these off the coast of Dominica.

Shane Gero
When families come together and socialise where their bodies are touching and they’re rolling around and they often will sort of open their mouths and run their teeth along each other’s bodies, you know, they make lots and lots of codas. You know, it’s not rude as a sperm whale to talk at the same time. In fact, we think, you know, the ability to know which coda you’re going to say next, and for me to say it at the same time, is one way that they sort of reinforce their social bonds.

John Thornhill
But even though Shane is sure that the sperm whales are communicating with each other, he can’t make sense of what it is that they are saying.

Shane Gero
I remember specifically this story of in the early years of the Dominica Sperm Whale project where we were sitting there on a very small boat with a simple hydrophone thrown overboard a few miles offshore of Dominica, and there were two young calves from Unit D — Double Bend and Drop — and they were talking and talking and talking for what must be, you know, 40 minutes in the same way that, you know, maybe a family dinner that the two little kids run off into their bedroom and mess about. You know, all of the females were down feeding and here were two, you know, beings having fun with one another. And that’s one of those moments that’s really sort of propelled me to say, OK, well, we need to figure out what is so important to them? What do they need to share with one another by making these sounds?

John Thornhill
So here’s the challenge: Shane is trying to figure out what the whales are saying, but that’s much easier said than done. How can you understand what a creature might be saying when the context of its life is so different to that of humans? Well, this is where the AI comes in. (Bustling room) It’s 9am on a weekday morning and about 40 well-dressed people are helping themselves to a vegan breakfast bar, chop figs, mini blueberry pancakes, fruit compote and croissant. Everyone’s here to hear Aza Raskin talk about how he’s using artificial intelligence. (Applause) Aza is a small, softly spoken man from San Francisco. He’s a tech guy through and through. In fact, his dad is actually one of the initiators of the Mac computer. Aza’s latest venture is called Earth Species Project. Its aim is to use artificial intelligence to understand what animals are saying.

Aza Raskin
So Earth Species Project, quite simply, our goal is to talk to animals. Actually, I’ll put it a different way, it’s to learn how to listen to animals.

John Thornhill
The day after the talk, Aza dropped by the FT studio in London to tell me how the latest evolution of AI could help him achieve that. Great. Persis, you happy with the sound volumes and all that? Wonderful. So welcome. Do you pronounce Ahza or Ayza?

Aza Raskin
Ayza. Yeah. Right.

John Thornhill
Aza got the idea for Earth Species Project when he was listening to the radio. A scientist came on. She had been researching Gelada monkeys in Ethiopia.

Aza Raskin
And they have, according to her, one of the richest vocabularies of any primate except, of course, humans. But she was out there with a hand recorder and transcribing what they’re saying. And the thought was, could we use machine learning and microphone arrays to understand a language we’ve never understood before?

John Thornhill
Well, actually, machine learning is now at a stage where people think it could achieve this. And the secret ingredient, it’s something called a transformer. That’s the t in ChatGPT. And Aza says transformers have found a hidden underlying structure that underpins all communication.

Aza Raskin
So imagine English as a galaxy where every star is a word. Stars that have similar meanings, words that mean similar things, are near each other. And then words that share semantic relationship share geometric relationship. So king is to man as woman is to queen. So king is the same distance and direction in this shape as woman is to queen. So you just take king minus man, that gives you the distance and direction that represents regalityness. You add that to boy and will equal little prince. You add that to girl, it’ll equal princess.

John Thornhill
These relationships go on and on and tell a whole language, English in this case, is represented in a multi-dimensional shape. And then if you take another of these shapes, a map on to the first, then you can start translating.

Aza Raskin
You build a different shape that represent a different language, say, German, and even though they have different cosmologies and they have different ways of seeing the world and verbs are gendered, you can still rotate one shape on top of the other and blur your eyes. And while there are words in one that don’t appear in the other, the overall shapes are the same. And the point, which is dog ends up in the same spot in both. And that works not just in English and German, but Aramaic and Urdu and Finnish, which is a weird language, Turkish. Like every known human language seems to fit in kind of this universal human meaning shape.

John Thornhill
This is the breakthrough technology that is powering the latest AI revolution, the same technology that ChatGPT is harnessing. And it’s not just languages that transform as construct as these geometric shapes. It’s everything. In fact, the key thing to understand about transformers, Aza says, is that it treats everything as a language. That’s to say it quantifies everything like images, movement, DNA in terms of the relationships between its composite parts.

Aza Raskin
And what these technologies let you do is translate to and from any of these different languages. So let’s take human faces, if you have a data set of human faces and you build a shape out of it, now imagine a galaxy every star is now a human face. So if you take a face and a smiling version of that face that gives you a distance and direction, you subtract and you get the relationship of smilingness, you add that distance and direction that vector to any other face, and it’ll turn it into a smiling version of the face. There’s a direction which is oldness, a direction which is youngness. This every internal relationship is represented in these shapes.

John Thornhill
This is what allows us to do amazing things like turn text into image. It’s a reason why you can now generate an image of a skyscraper in the style of Picasso within seconds, or that meme that went viral earlier this year of Pope Francis in a white Balenciaga puffer jacket. It’s all possible because the images have been transcribed by transformers as these geometric galaxies which can then be rotated to align and overlap with the galaxies of human textual language.

Aza Raskin
And that was the core insight, the inspiration for starting Earth Species is if that’s true for human languages, might it be true for animal communication? And before your listener says, well, maybe human languages align because we all share the same physiology, we have the same brain. Roughly speaking, we have the same sensorium eyes, ears. So that’s why they align. Well, this is why I gave the example of images in human faces. It turns out you can align not just languages, but pretty much anything.

John Thornhill
We’re having a separate debate whether there are the dangers of us anthropomorphising machines and imagining machine intelligence is the same as human intelligence. Is there a danger that we do that with animals and that in fact human communication is just, works in a completely different way to animal communication?

Aza Raskin
Well, this is one of, I think, the most exciting things is that by being able to look at patterns at scale and sort of see the rich complexity, then we can take off the human glasses and just ask what’s there rather than assuming that whatever they communicate in looks like a human language. Like, I would expect that there are some concepts which are conserved. You know, dolphins and elephants look in a mirror and recognise themselves in that mirror. So if they’re communicating, maybe they’re communicating about a rich sense of interiority. Also, lemurs and lemurs get high. They will take bites of centipedes and enter a trance like state. Dolphins will intentionally get puffer fish to inflate and get high off of their venom and pass it around that oil puff puff pass. So transcendent states of consciousness also seems to be a thing that’s shared. So if we communicate about it, maybe there’s enough shared structure that we can translate some of those things. But then there’s a lot about the world of a chimpanzee or a whale that is so completely different to our world that maybe we can never directly translate it, but we will be able to see the richness and the complexity of the communication. And so maybe the translation ends up not looking like a Dr. Doolittle or Google Translate where you get specific words, but maybe it ends up as flashes of colour and some sound and you get a sort of a felt sense of what maybe they mean.

John Thornhill
So does this mean that we will one day be able to understand dogs and cats? Well, Aza says it’s not enough to just record their vocalisations.

Aza Raskin
Anyone that has a dog or a cat knows a lot of communication might not be verbal. It might be in body pose or the way the tail flicks. So it’s not enough to just do audio. You want to do audio, body pose, video. The whole suite eventually would be amazing to get to, to smell or I think that’s a little a little harder. It’s the ability to fluently translate between any modality that gives us hope that this is possible.

John Thornhill
So could we start looking to a future where we can communicate with our pets, translating their tail wags and scents into English in order to understand whether they really want to go out and take that walk in the rain. Well, let’s not get ahead of ourselves. This area of research is only just starting out, but it is happening. Take Shane Gero, for example, the sperm whale researcher we spoke to earlier. After two decades of working with whales off the coast of Dominica, he has helped to set up a new organisation that is using AI to understand sperm whalish. It’s called Project Ceti with a C. Shane hopes that the AI will help him spot communication patterns where humans haven’t been able to.

Shane Gero
(Sperm whale sound) Have we just been looking at letters when really there’s words or have we been just looking at words when really there are letters and sentences at different levels of the hierarchy? That’s something that’s always been a challenge through the sort of standard statistical biological approach.

John Thornhill
But in order for these algorithms to work well, they need more data, a lot more. The thing with humans is that we generate our own data. Every day we are writing stuff down, uploading pictures and collectively contributing to the mass of data available online. Sperm whales, on the other hand, are not. And if Shane has been just gathering his data around the world where his boat is, he might only be skimming the surface of what the whales are actually saying.

Shane Gero
I call it the dentist’s office problem. You know, if you only study English speaking society and you’re only recording in a dentist’s office, you’re going to think the word root canal and cavity are like critically important to English speaking culture, right? When really it’s just a really biased way to collect the English recordings that you have.

John Thornhill
Project Ceti is trying to get rid of that bias and change the scale and the method of data collection. The Dominica Sperm Whale Project is getting a new set-up that aims to record sperm whale coders 24/7, 365 days a year. This will give the researchers an insight into what is happening outside of that dentist’s waiting room.

Shane Gero
And that’s where we start, we start being able to see a much clearer picture than just what’s happening at the surface.

John Thornhill
What happens when all this data has been processed and we know a lot more? Does Shane think there could be enough crossover between the scientists and the whales for them to interact one day.

Shane Gero
I mean, most of the time we’re trying to let the whales be whales and do whale things. At least right now we’re not looking for interactive, acoustic or behavioural moments at all. And sometimes that happens. I mean, there’s no doubt in my mind that they know the research boat and there are these amazing moments where one in particular, this animal, Can Opener, she recognised the system that the research boat has, which is when they come to the surface, we get behind them and we start recording and we take pictures when she flukes and then we move up to that soft circle of water that’s left behind when a whale dives, which we call a fluke print, and then we collect skin and faecal matter and make more recordings. And we do that over and over and over every time Can Opener would surface or any time any whale would surface. But she sort of figured it out and realised that it was a repeating pattern and kind of predicted the future because she would start faking dives so that when we would come to the fluke print to record, she would come up to the surface and look at us. And by look at us, I don’t mean use her sound to echolocate on us to look at the hull of the boat that’s underwater. She would come to the surface and roll her eye out of the water and look at the people on the boat and follow the people as we, you know, giddily, as a bunch of biologist would, run up and down the boat looking at her.

John Thornhill
Is this an example of human to animal communication, the whales looking at the scientists and the scientists looking at the whales? Maybe not quite, but perhaps it shows an overlap in consciousness between us land creatures and these great sea creatures. Within the ocean there are already examples of interspecies communication happening off the coast of Norway. There’s a group of killer whales and dolphins who come together and hunt in a pod. Then on land, humans and birds work together to find honey. In Niassa park in Mozambique, some of the population make their living collecting honey. These honey hunters have a specific call they use to attract a bird known as a honey guide (sound of a person calling a bird). When the people make the call, the honey guide comes to them and then flies from tree to tree, calling to the humans and directing them towards the bees nest. The humans harvest the honey, and the birds eat the wax. It’s a win-win. So if humans and birds can work together and orcas can talk to dolphins, why can’t we talk to whales? Well, Aza thinks that with the help of AI, that should be possible.

Aza Raskin
So something really interesting has happened in the last couple of months, which is, you know, AI has progressed to where it just takes 3 seconds of your voice to continue speaking in your voice. Terrifying. Sort of rad, though. What this means is that we are going to be able to do this with animal communication in the next couple of years. In fact, we already have some tests where we can do this with chiffchaff, which are a type of bird that we can put in 3 seconds of them to vocalising and we can continue speaking.

John Thornhill
Aza is talking about this: (bird chirping). The first second of that clip is real, and the rest is AI generated. So somewhere in the third chip, AI takes over and starts to speak chiffchaff.

Aza Raskin
Which means that, again in the next couple of years, we’re going to be able to build a synthetic whale or synthetic beluga or a synthetic bird that will be able to speak in a way that they cannot determine that it’s not one of their own speaking. Now, here’s the plot twist: we will be able to get, we believe, to two-way communication before we understand what we’re saying.

John Thornhill
This is why AI is so powerful and potentially dangerous. The world view of a whale may be so different from ours that while these computer programs that we have fed data could develop the capacity to synthesise whale voices and speak back to the animals, we humans might not even understand what is being said. Would this really be inter-species communication or is this animal to machine communication?

Aza Raskin
Our ability to understand is limited by our ability to perceive. What AI does is that it opens the aperture of what we, as a species, can perceive and hence extends what we can understand.

John Thornhill
I asked Shane what he thought of the idea of AI synthesised voices speaking to sperm whales.

Shane Gero
For me, the two-way conversation that may or may not be at the end of Project Ceti is kind of secondary. For me, this is a big listening project and trying to understand what’s important to the whales themselves. Because I think the things that we share in terms of the value of what’s important to human society and animal society probably speaks to the fundamentals of what really matters overall. You know, will we get to a point where we can interact back and forth? That’s an awesome question that I think we’ll find out as Ceti moves forward. You know, but I kind of like the larger question of, you know, whale life is different from human life. And there will undoubtedly be things that we don’t understand because we’re not whales. And how much of that sort of domain gap between the human experience and whale experience will prevent us from having meaningful conversations, right? It’s a great question, but we don’t know is the answer. And I think, you know, the fact that we’re launching this big initiative to listen to someone else living on our planet is kind of the bigger message here.

John Thornhill
So while AI may enable us to speak to animals, some researchers like Shane think the focus here should be on understanding them first. And Shane is not the only one with doubts about the prospect of speaking to animals. One of the leading voices in the field of bio acoustics is Karen Bakker. She’s a professor at Canada’s University of British Columbia and the author of a book called The Sounds of Life. It’s about how digital technology is bringing us closer to the natural world. She, too, has doubts about how soon we’ll be talking with animals via AI.

Karen Bakker
So I don’t think we should overestimate our capacity to have complex conversations in real time with other species mediated by digital tech. We may be able to do simple things like issue better alarm calls or better interpret the sounds of other species so that we can ensure less interference at certain critical moments like nesting or mating. But I don’t think we’re going to have a zoological version of Google Translate available, you know, in the next decade. I think that’s much further off.

John Thornhill
For the time being, at least then, our dream of conversing with animals may remain just that, a dream. But they say, don’t they, to be careful what you wish for. We may want to speak with animals, but who has the right to speak to them on behalf of humanity? Can we trust ourselves to do so responsibly? And what would we even say to them?

Yossi Yovel
The thing is, if animals don’t talk about these things themselves, there’s no way to talk with them about it. How can I talk with this creature about, I don’t know, peace in the Mediterranean? It’s impossible.

John Thornhill
Angry bats, distressed grapes and AI sceptics.

[MUSIC PLAYING]

John Thornhill
That’s coming up in episode two of this series of Tech Tonic. You’ve been listening to Tech Tonic from the Financial Times with me, John Thornhill. I’ve put some free links related to this episode in the show notes. Do check them out. And do leave us a review. It helps spread the word. This series of Tech Tonic is produced by Persis Love, with thanks to Edwin Lane and Josh Gabert-Doyon. Manuela Saragosa is the executive producer. Sound Design by Breen Turner and Samantha Giovinco. Original music by Metaphor Music, and the FT’s head of audio is Cheryl Brumley.

[MUSIC PLAYING]

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Comments

Comments have not been enabled for this article.