Audio Analytic: 'The Shazam of real-world sounds'
Building a library of sounds is Quayside firm's mission
Audio Analytic is developing a library of sonic events which will allow machines to understand context through sound.
“This is machine listening,” says CEO Chris Mitchell of the firm’s sound recognition technology. “It’s a sub-group of artificial intelligence (AI), in fact it’s a new AI area in itself – getting machines to understand sound.”
The fast-evolving soundbank will make life easier for personal assistants like Amazon’s Alexa, Google’s Home and Microsoft’s Cortana. They need to know what a baby crying sounds like, or breaking glass, or a smoke alarm going off, for their decision-making algorithms. If they can hear and understand the sound of dog barking they’ll be able to develop an appropriate response. If they hear a fire they will play loud music until their owner responds. If a gun goes off (this probably applies more to America than the UK), your virtual help will require a strategy to assist you. So they need to recognise every single possible sonic event – and how they sound mixed together.
“No one’s explored the value of sounds to date,” adds Dr Mitchell. “Music yes, but sounds no.”
The project, which began when Quayside-based Audio Analytic was founded in 2010, has made some intriguing discoveries.
“There’s a bird in the south of France that sounds identical to a north American smoke alarm...”
There’s been significant progress in terms of building a speech database, of course, not least from Cambridge-based Speechmatics, which is looking to make speech recognition available in as many of the world’s 7,000 languages as possible. But reading the whole acoustic environment is an equally tall order.
“With AI speech recognition there’s a restriction with what you’re trying to get the machine to understand because of language constraints, but for instance the sound of glass breaking doesn’t have a structured language component so all that AI research is not relevant,” points out Dr Mitchell.
There’s a dearth of language to describe the sonic environment we inhabit, so Audio Analytic has coined or sequestered existing terms to describe the new technology.
“Ideophones are the building blocks that sound is made up from.”
So there we have it, a sonic pixel – an ideophone.
“It’s phonemes for speech, ideophones for sound – at least that’s our word,” says Dr Mitchell, who has a PhD in sound information systems and signal processing from Anglia Ruskin University. “It’s in the Oxford English Dictionary but for a slightly different meaning – for artistic impressions. We define it as the building blocks that make up all definitions of sound as opposed to speech.
“At an engineering level ideophones are translatable into onomatopoeic words like ‘bang’, ‘crash’ or ‘oink’ – actually Japan has far more words in this sphere...”
Audio Analytic involves a Cambridge crew of 35, plus one other based in California. There are four engineering teams: product, laboratory, data and IT/support. A quick look round the busy studio and yes, there are people who have audio clips on their screen, comparing them to others. I meet Neil Cooper, VP marketing communications. A year ago Neil was a brand director at Arm. “I left because this was too good an opportunity,” he says.
Audio Analytic sells into 65 countries. The firm’s software framework is called ai3: it allows gadgets to make sense of the audio environment around them in real time. Partners include Arm, Intel, Hive (which is being developed by Centrica, formerly British Gas) and Bragi.
“Our goal is to build a world of sounds,” Dr Mitchell says. “When children learn to talk they get some words to start with, and they add to that. The same is true for sound – after 40 or 50 sounds we then think they have a sense of hearing. Bloomberg called it ‘the Shazam for real-world sounds’ and I think that’s fair.”
After his PhD, Dr Mitchell received a prestigious Kauffman/NCGE Fellowship to investigate the commercial implications of his research, which included attending Harvard Business School and a short spell with Cisco Systems in San Jose, USA. How did he find Silicon Valley? “Very enjoyable.”
Smart computers are being embedded in the home, in the car, and in every other aspect of everyday life: Audio Analytic will ensure it’s a meaningful experience for all parties each time.