This is an audio transcript of the FT Weekend podcast episode: ‘AI hits the music industry

Lilah Raptopoulos
Hi, listeners. A quick correction before we start the show. Last week we ran a rerun about women spies. And in the intro we said that Britain recently appointed one of the women in the story to director of GCHQ. That was wrong. Britain did recently appoint the first woman director of GCHQ, but it wasn’t one of the women interviewed for this story.

Also a note on this episode, which is about AI and music. We want to be clear that it does include material that’s been generated by FT journalists using AI tools. We’ve done it to illustrate just how much AI is influencing the music industry. OK, on with the show.

[MUSIC PLAYING]

The legendary musician Tom Waits has always been known for this growling voice and for songs that tell stories. He’s one of those musicians that other musicians idolise because he’s gritty and experimental and hard to pin down. To get a sense, here’s a beat from his song Downtown Train.

[‘DOWNTOWN TRAIN’ BY TOM WAITS PLAYING]

Recently our pop music critic here at the FT, Ludo Hunter-Tilney, did something very un-Tom Waits. He got him to sing Abba.

[AI-GENERATED ‘DANCING QUEEN’ IN THE STYLE OF TOM WAITS PLAYING]

Lilah Raptopoulos
You may be wondering, did Ludo call Tom Waits and convince him to sing Dancing Queen by Abba? He did not. What Ludo did was use an AI algorithm to create a voice that sounded like Tom Waits singing Abba.

Ludo Hunter-Tilney
I mean, it sounded wobbly at the beginning, but then suddenly the voice clicked. It clicked in as Abba really got into their stride. Waits in Abba for a brief but deliriously exciting moment as one in harmony.

Lilah Raptopoulos
AI technology is now sophisticated enough that you can use a series of different programs to make up an entire deepfake song. And that means that the music industry is actively trying to figure out what that means as we speak. Here are few examples. Recently, a deepfake song came out that featured the voices of Drake and The Weeknd, and it started charting. And Universal Music filed a complaint and got it removed very swiftly from Spotify and Apple. Meanwhile, the musician Grimes is encouraging fans to make deepfake songs that use her voice if they split royalties with her. But legal enforcement is murky here across the board because there’s a funny legal loophole on voices.

Ludo Hunter-Tilney
Well, one thing I leverage interestingly, was that we can’t copyright the sound of our voices. We can copyright recordings. But we can’t copyright the sound to them. The sonic frequency at which you and I speak, or the frequencies that which we speak, we don’t own those. They’re not . . . we can’t own a sonic frequency. So I found that I was interested about the implications it has for the idea of ownership.

Lilah Raptopoulos
Today Ludo comes on the show to talk about how these songs are getting made and the implications. He also wrote a whole new song in the actual style of Tom Waits for an AI Tom Waits performance. We get to hear that too. But I am here to say it did not go great.

This is FT Weekend. I’m Lilah Raptopoulos.

[MUSIC PLAYING]

Hi, Ludo. Welcome back to the show.

Ludo Hunter-Tilney
Hello, Lilah. Thank you very much for having me on the show.

Lilah Raptopoulos
I’m so happy to have you here. So, Ludo, you have been playing with music generated by AI. And before we get into what you’ve been making, I have to tell you that late last night, I got a text from my producer, Katya, with a piece of audio that she made for you. And it’s amazing. (Chuckles)

Ludo Hunter-Tilney
OK, I look forward to this.

Lilah Raptopoulos
And we’re going to play it for you.

Clip of AI-generated Snoop Dogg voice clone
What if you could make Snoop Dogg say or sing anything you wanted? I’m doing right now. I’m using an AI-generated version of Snoop Dogg to say that my colleague Ludo Hunter-Tilney is top dog. Snoop Dogg has never seen this. What an algorithm has trained to sound like Snoop. Snoop, if you’re out there, we love to have you on the show. This is FT Weekend. I’m Lilah Raptopoulos.

Ludo Hunter-Tilney
(Laughter) Brilliant. I love that. A shout-out from Snoop. From an AI Snoop.

Lilah Raptopoulos
We did our best.

Ludo Hunter-Tilney
Well, I thought that that Snoop, I mean, was sounding very convincing. I mean, perhaps because I was sort of so hoping that I would one day hear him say those words, say my name. (Laughter) For me, that sounded very Snoopy.

Lilah Raptopoulos
You know, there’s nothing that I wanted more than having Snoop Dogg pretend to be me and welcome in the show.

Ludo Hunter-Tilney
So what we have there is the voice clone of Snoop Dogg, and it’s become this year — the technology to make this has been around for a bit — but this year it has become so mass the mass availability because this has happened and it’s just this out there, it’s become more powerful. It’s become easier to use, and suddenly the world is rushing in to go and create their own celebrity voices.

Lilah Raptopoulos
Right.

Ludo Hunter-Tilney
There’s millions. There is one which I did like, which was Biggie Smalls and Tupac. They did a version of Kanye West and Jay-Z’s song, I won’t say it in Paris.

Lilah Raptopoulos
Right, right, right. (Chuckles) 

Ludo Hunter-Tilney
That was one. That was one. Which was, I enjoyed that.

Lilah Raptopoulos
They like sort of made peace together in a song. (Laughter)

[AI-GENERATED ‘NIGGAS IN PARIS’ IN THE STYLE OF BIGGIE SMALLS AND TUPAC PLAYING]

Lilah Raptopoulos
My producers got really into this and my producer Lulu found one of Kurt Cobain singing Vanessa Carlton’s A Thousand Miles.

[AI-GENERATED ‘A THOUSAND MILES’ IN THE STYLE OF KURT COBAIN PLAYING]

Ludo Hunter-Tilney
(Laughter) It’s a wormhole. It’s a wormhole. Once you get down to it. You’ll find yourself just like, sort of creating increasingly strange and bizarre sort of correspondences. It is. It’s a wormhole.

Lilah Raptopoulos
Totally. To understand how this all worked, Ludo decided to make an AI song in the style of Tom Waits. Not just have the voice of Tom Waits sing an Abba song, but to actually make a whole new song that sounded like something Tom Waits would put out using only AI. It became clear that you need three building blocks to create a new song. You need a voice, you need instrumentals and you need lyrics. So Ludo’s first step was to create the voice. That’s that voice you heard at the beginning. Ludo had to make what’s called a deepfake vocal clone of Tom Waits.

Can you tell me what exactly a deepfake vocal clone is?

Ludo Hunter-Tilney
Well, a deepfake vocal clone won’t . . . . it would essentially be like Snoop at the beginning of this. It would be a voice which has been created through machine learning, into which a whole bunch of audio is being fed. So if we would take this voice, my voice is speaking right now. If I had actually been deepfaked, you weren’t really talking to Ludo but you were talking to some other person who had been using my voice, then they would have taken recordings of my voice and they would have fed all of these recordings into the computer, which then by some process which I, alas, I am ill-equipped to be able to go and describe to tedious details for all of our listeners. By that process, the computer would be able to go and train a voice. A voice which sounded like my voice, which then you could go an input text. You could text in, “Hello, this is Ludo’s fake voice”. And that would then come out in the voice, with which, as it were, I’m speaking that would sound like that. That’s what a deepfake vocal clone is.

Lilah Raptopoulos
Right. There are databases full of deepfake vocal clones of famous people that are online. You can download them. There’s Taylor Swift and Jay-Z, Snoop Dogg, all the big names. Ludo tried that first, but the voice in the database for Tom Waits was not great. So he had to make his own. It turns out there are programs for that, too. So this is what he did. He took a Tom Waits album. He extracted the music so only the a cappella voice remained. And then he fed those a cappella voice files into a program that learned from it and then made a deepfake voice of Tom Waits for him.

Ludo Hunter-Tilney
And it’s then spends hours they’re training. They literally call it training like it is a sort of pet. They train that this will then go through those a capellas, vocals, and from that it will create the voice.

Lilah Raptopoulos
Right. So it learns from his a cappella voice the way I mean, they used to take all of FT Weekend podcasts and put it through a system and then create my version of . . . a version of my voice and then use it to . . . 

Ludo Hunter-Tilney
It creates your . . . create your voice, Lilah. It creates your voice. Your voice could be created.

Lilah Raptopoulos
Yeah. This is my biggest fear.

Ludo Hunter-Tilney
All of your . . . everything you said and all of these FT podcasts, you will be able to create a very, very realistic AI Lilah. 

Lilah Raptopoulos
Let’s not give anyone any ideas. (chuckles)

Ludo Hunter-Tilney
Someone could just like pop a few words into that, and that’s it. Very straightforward.

Lilah Raptopoulos
So Ludo had the Tom Waits voice. The second step was to create the actual music for his song. There are a ton of AI music generators on the internet. But Ludo used one called Boomy. It basically uses keywords to generate music. So you type in some descriptive words and some parameters like the tempo you want, and it creates a track for you that it thinks will fit.

Ludo Hunter-Tilney
The Boomy thing, you can go on to and you can make a song, which is what I did. It’s a text- based musical generator. You would write in, for instance, “rainy night”, and it creates a rainy night-style music. I think that’s the idea behind it. So you could go and write something like that. I don’t know. What I don’t know about it is, and I don’t think you could write in something as bold as Tom Waits style. I got a feeling that might be a bit of a whole copyright issue.

Lilah Raptopoulos
So Ludo fiddled with it and he made this track that sounded kind of jazzy with this twangy bass, and he was happy with it. Then finally, at step three to making his new fake Tom Waits song, Ludo needed to create the lyrics. For that he went to the omnipresent AI chatbot, the one that’s all over the news: ChatGPT.

Ludo Hunter-Tilney
ChatGPT, you can go on to and then type in, “Please . . .”, you know, “Please . . . ”

Lilah Raptopoulos
“Please . . .” You are British after all. 

Ludo Hunter-Tilney
The voice would be very polite with the AI. You got to be, to our future overlords, you got to be polite. They will remember this. (Laughter) They’ve got some sort of little thing that they’ll recall. So “Please”, I said, “ . . . ChatGPT, could you go and could you please go and do me a song in the style of Tom Waits?” Which it then did, and it spat out this song in sort of like real time, which in a sense it was very impressive, but the lyrics were just like . . . the lyrics, they did things like they rhymed and, you know, they were . . . it told a story of a sort. I was impressed by all of that. But the actual pastiche was just like, really, it was a proper pas . . . it was just like a pretty low-grade pastiche of Tom Waits.

Lilah Raptopoulos
ChatGPT named this Tom Waits song that it made up Gritty Troubadour’s Backstreet, and Ludo was suddenly faced with a new problem, which is how to put all these parts together. This is where the story starts to get stressful for him and funny for us because this song didn’t exist before. He couldn’t just feed the Tom Waits voice into a song like Abba’s Dancing Queen. There was no previous template for this song. He realised that he was going to have to do it himself.

Ludo Hunter-Tilney
That was when the full horror of what I’ve done dawned on me that I was going to have to sing the thing. I was actually going to have to create the vocal melody.

Lilah Raptopoulos
OK, so Ludo, I know that you are a little . . . you know, it doesn’t sound like you would be thrilled to have us hear you singing as Tom Waits.

Ludo Hunter-Tilney
Lilah, how correct you are there.

Lilah Raptopoulos
These are the things we do for journalism. Lit up. But I . . . You did, you were kind enough to send us a clip of the actual music.

Ludo Hunter-Tilney
(Inaudible) because most of you can’t, because you’re not going to get any more. There’s no more ever going to be heard. The rest is going to digital grave.

Lilah Raptopoulos
Amazing. Well, let’s play it. Let’s play the song itself.

[AI-GENERATED ‘GRITTY TROUBADOUR’S BACKSTREET’ IN THE STYLE OF TOM WAITS PLAYING]

Ludo Hunter-Tilney  
Oh, that’s horrible. I literally have my head in my hands like that.

Lilah Raptopoulos
But Ludo, that wasn’t . . . Was that . . . That wasn’t you. That was you through the filter of him, right?

Ludo Hunter-Tilney
That’s me through the filter of him.

Lilah Raptopoulos
Oh, right, right, right. 

Ludo Hunter-Tilney
This was what I discovered, that when I did this, that AI wasn’t able to sing properly. I had to adopt this gruffy voice. And then I just sounded like some strange, sort of odd, warbling, posh, Englishman (inaudible).

Lilah Raptopoulos
Very gravelly (laughter). 

Ludo Hunter-Tilney  
Gravelly. It was gravelly. Yeah, that I will . . . I’ll grant you that. 

Lilah Raptopoulos
You got a little gravelly. Yeah, (laughter) it was good, Ludo.

OK, so Ludo’s song did not go the way he’d hoped, but it was still in the spirit of experimentation. And that’s the spirit that most of these meme songs are coming from right now on the internet. We’re really at the beginning.

[MUSIC PLAYING]

Ludo, you know, you’re music critic. I’m kind of curious what your initial reaction was as you were hearing about this phenomenon and as you started learning more? Like, what did you feel? Did you feel indignation? Did you feel just curious? Did you feel excited or you worried?

Ludo Hunter-Tilney
Well, I think that I felt initially a sort of feeling of just like having in the past. I have to confess, having found AI itself a slightly sort of like a bit like eating your vegetables. You know, it’s like one of these topics that, you know, is really, really important and that you need to get your head around. But this idea that the voice would be something that could actually be imitated and then used in any way that one wants. I did find very interesting, I suppose, and then actually, rather than finding it sort of concerning, I think that what it really made me think was about how . . . Well, in truth, how we’ve surrounded ourselves with copies of our voices, which we ceased to really notice. And because we’ve lost the ability to hear that, because we’re so used to doing it, we do it every time we make a telephone call. We do it every time. You know, when you listen to a song and the vocals, the way the vocals are presented to you, that they’re actually sort of hyper-realistic because no one could ever sing in your ear like that, you know, the whisper singing which would . . . Let’s take the Billie Eilish song was a great example of that, about how she’ll sing in a very withheld manner. But also really at the forefront of your consciousness is these curious ways in which the voice is manipulated and treated and presented to us very artfully clever, which we become to a great deal deaf to. And that these copies for me are actually making us hear again the fact that we’ve created an entire world of imitated voices, voices which are imitations.

Lilah Raptopoulos
Right. And is that why, like seeing a musician live in a small room feels so different?

Ludo Hunter-Tilney
Well, I think it’s not so much that small room in what for me makes a really big difference. And I think I’ve become more and more aware of the more and more gigs I go to. Is the difference between amplification and unamplified singing. So that the singing by the person in a room without any amplification is such a different experience. You hear that voice in such a different way and that the amplification is one in which people tend to use in a way which is not actually, in the end, very artful. Sonakshi used to talk about using the microphone as a musical instrument. For the singer, the microphone was like a saxophone or the equivalent of a saxophone for saxophonist. We . . . that’s . . . for me going to gigs that’s become less and less apparent because what you get are people basically just like using their microphone to be louder. And that’s (inaudible) . . .

Lilah Raptopoulos
Right. And that’s it.

Ludo Hunter-Tilney
That’s they’re not using it in any way to go and do things without amplification, which is still interesting. And that I found, I suppose that when you then hear the voice, which is not being amplified, there’s something very refreshing about that.

Lilah Raptopoulos
This technology — AI technology — is advancing fast. It’s not just about amplification anymore. Now it’s getting easier to pass fake voices and fake songs off as real. That puts us in an ethical grey area. And it reminds Ludo of when sampling music first became popular.

Ludo Hunter-Tilney
When sampling really took off in the eighties, it was a Wild West, you know. It was just like people would help themselves to whatever they wanted. And you would take a piece of song and you would go and use it in your song and you would transform it or you wouldn’t. And it was a very good song, but nonetheless, often it would be, you know, they would be really inventive uses of this, of these other musical source materials. And then there was a landmark case involving Gilbert O’Sullivan and Biz Markie, I think it was in 1990 or thereabouts, which Gilbert O’Sullivan’s doing the suing for a sample of his market use, which he won, and that that’s what led to samples becoming a business which had to be licensed, which transformed the whole use of samples so that now you have to go and pay if you want to use a sample. And I wonder whether these voice clones are going to be somehow similar to that in so much as it might be that we’re in the Wild West stage right now when anyone can make use of them, but they’ll come to a point that there’ll be a landmark legal case which will then go and establish some way to be able to license the sound of your voice.

Lilah Raptopoulos
Yeah, interesting. Ludo, what’s going on right now with the legal side of AI music. Can you tell me more? Like, what are the copyright concerns?

Ludo Hunter-Tilney
So, yes, I said the copyright. Well, for the different . . . the three different things that you had mentioned before: lyrics, instrumental and voice. The lyrics and the instrumental are covered by copyrights. Those are copyrightable. The voice bit isn’t. Now what Universal, the biggest record company in the world, wants to try to make it. It basically wants to make the a capella recordings that I use to create my Wait’s voice, which are copyrighted. It wants to use that as a way to try to control the creation of these voices.

Lilah Raptopoulos
Interesting. 

Ludo Hunter-Tilney
In other words, to say that the data which is being used to make them, it comes from copyrighted material. Therefore, to use it, you need to have a license, which, as far as I can tell, talking to legal expert people, that doesn’t sound like it’s a very . . . . that doesn’t sound a very sturdy ground. Because it would disallow an awful lot of machine learning. Machine learning would become almost impossible if you had to go and basically license everything in the world. Then all of these machine learning tools become sort of stymied.

Lilah Raptopoulos
Yeah. Right, right, right. OK, that seems unlikely. All right, Ludo, this has been really fascinating. I have one more question for you. If we look back about 10 years, Spotify was launching. Today, people are playing with old songs and they’re making them go viral on places like TikTok. And I’m curious if you were to look into the future like 10 years from now, what do you think is going to be happening?

Ludo Hunter-Tilney
Ten years’ time, ten years’ time, what will the music industry look like? I think that the idea of actually a sort of immersed . . . I think that these what we’re discussing right now, which links in with things like karaoke, it links them with fan culture. It links in with the idea of people actually sort of becoming involved in the music that they are very attracted to as practitioners in some sort of a way. I think that will get stronger and stronger. So the idea of TikTok with people making videos, lip synching along to songs will be something that we can now see. But these voice clones, assuming that that’s the case, then I think we’re going to see this way in which the sort of fans or the listeners can go and sort of like create the songs and use all of these AI tools together and sort of just like do twists and interesting things to the music that will get built into each other. So Spotify will go and have ways that I would have thought something like Spotify, if you can imagine amalgamation of Spotify and TikTok and all of these voice cloning things, AI usage. One thing delivering all of that together, these are the sorts of things that I would come to imagine.

Lilah Raptopoulos
Yeah, totally. And fans are like more involved actively. They’re like part of the creation process. They’re more like, entwined.

Ludo Hunter-Tilney
Yeah, I think the fan well, I mean, the fans, the way that fans are being able to attract people’s attention and passions in that way, keep hold of them is so crucial to it.

Lilah Raptopoulos
Yeah. Ludo, this was awesome. It was so interesting.

Ludo Hunter-Tilney
Well, Lilah, thank you very much. Thank you.

Lilah Raptopoulos
Yeah. Thanks for being on the show.

[MUSIC PLAYING]

That’s the show this week. Thank you for listening to FT Weekend, the Life and Arts podcast of the Financial Times. A link to Ludo’s piece and other relevant stories on this topic are in the show notes. Also, I’m not sure if you knew this, but all of our links in the show notes are free to read for a period of time. So if you click through there to the stories that we link to, you’re not going to hit the paywall.

As you know, we love hearing from you. You can email us at ftweekendpodcast@ft.com. The show is on Twitter, @ftweekendpod, and I am on Instagram and Twitter, but mostly talking to you about culture on Instagram, @lilahrap.

Next week we are talking to the legendary writer Lorrie Moore. She’s one of the best short story writers of our time, and she just came out with her first novel in many, many years. It is called I Am Homeless If This Is Not My Home.

I’m Lilah Raptopoulos, and here is my talented team. Katya Kumkova is our senior producer. Lulu Smyth is our producer. Mollie Nugent is our contributing producer. Our sound engineers are Breen Turner and Sam Giovinco, with original music by Metaphor Music. Topher Forhecz is our executive producer and our global head of audio is Cheryl Brumley. Have an incredible weekend and we’ll find each other again next week.

[MUSIC PLAYING]

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Comments

Comments have not been enabled for this article.