Listen to this article
We talk far more than we type. Podcasts, online video, internet radio, recordings of meetings and phone conversations – so much information today is contained in audio files. But how to index it, search it and access it?
Far from perfect, voice recognition is good enough to make audio searching a possibility.
The Blinkx search engine uses technology developed by Autonomy to index audio and video from podcasts, internet sites, broadcast television and radio; more than 4m hours of audio. As well as generating a phonetic transcript, it uses information including metadata (like the page title), words already recognised and other contextual details to improve the index. The search engine can also take the context of your search terms into account, to see if they suggest a new interpretation of a recording.
According to Suranga Chandratillake, Blinkx chief executive, accuracy varies: “It’s anything from 60 per cent to 95 per cent depending less on the accent and more on the quality of the signal. Background music is bad, fuzzy recordings are bad. Our best results are from professionally recorded, professional speakers in a studio (for example, a newscaster on bbc.co.uk), our worst from amateurs with poor audio quality and music playing in the background.”
Microsoft’s OneNote application already records audio and synchronises it with typed or handwritten notes, so users can click on what they have written to hear what was going on when they wrote it (ideal for checking whether they wrote something down correctly).
OneNote 2007, due out early next year, allows searching of audio recordings directly. It does not do voice recognition, although the speech recognition built into Windows XP Tablet Edition can be used to transcribe recordings into OneNote. Instead it converts audio to phonemes, converts search terms into phoneme equivalents and looks for a match.
The technology is not perfect, especially with low-fidelity recordings, but lead program manager Owen Braun believes it is good enough to be useful. “People have meetings and conversations every day where having an exact record would be hugely valuable, but most assume audio is too hard to capture, too big to store, and too hard to mine later for the valuable bits. With OneNote 2007 and a recent-model computer, these things simply aren’t true anymore.”
As well as making recordings in OneNote, people can import existing audio. All they need to record phone conversations is a cheap cable (and the permission of the people they are talking to). They can add call recording to a company PBX for around £2,000 with a system such as Storacall’s Intro. And if they use Skype they can record calls as MP3 files that are stored in Outlook folders with an add-in called Skylook (http://www.skylook.biz/).
Recording in OneNote means carrying a laptop or tablet PC (although Braun hopes to be able to search audio recorded on mobile phones in a future release). Luis Elizalde, a design engineer with IBM Design Consulting Services, thinks users will want much more portable devices to record almost everything.
“With all the information I have to keep up with every day, with e-mail and meetings, my social life is shrinking. Wouldn’t it be nice if I could go back in time and replay what somebody said in a meeting, replay that telephone number somebody gave me at the club the other night that I didn’t write down.”
His concept audio recorder, Magic Block, would record weeks or months of conversations on to flash memory and recognise it with IBM’s ViaVoice software. On top of that, it can learn individual voices to search for who was speaking, Elizalde suggests: “You could ask it to find everything that Jane said between June 3 and July 5, then look for keywords and keep dissecting that information.” Recordings would also preserve the emphasis and emotions lost in transcriptions.
Such a device would raise issues of privacy and confidentiality, so the Magic Block concept has a fingerprint scanner to keep recordings secure. IBM does not currently plan to manufacture the Magic Block, but Elizalde believes recording, playback, recognition and search technologies are mature enough to produce a record or to add recording and recognition features to something we already carry: the mobile phone.