Presently the major market software operators of Voice Recognition are Google Voice and Siri. Bother systems are always listening but Siri does currently offers more functionality / control features.
On Android, the software is normally active by voice command on the home screen saying, "Ok Google" followed by a command such as; Call, Search, Find, Tell me.
For example "Ok, Google should I bring an umbrella today" and the command response is either in text format or by a computer generated voice, if you should or should not bring an umbrella with you.
Apple software works very similar but its operational methodology tends to favour a button to be pressed before issuing a command. However depending on the phones functionality and user configurations it is possible for Siri to respond directly to voice commands.
Windows Cortana phone uses a software platform with similar functionality and like Google and Siri it is always listening for its opening phrase to become active, so in theory of the design it could also be attacked using the same proof of concept.
The Software Used in this Attack
The only tool really needed for this attack was audacity for recording the Google voice command, and combining the audio with the right portion of the song needed. To get the best result I used the windows text to speech function, but this is not required.
The Hardware Used in this Attack
For this attack it required an Mp3 player with repeat function and a small portable speaker.
The Theory behind the Attack
The general idea of my attack was to echo its own technology against itself. The potential market of opportunity is huge. It is estimated that 500,000 people pass through Dublin City centre each day and many of them will have a smart phone. Statistics tells us that Android accounts for 49.58% of the market share and Apple the majority of what remains. So based off that information we can roughly estimate that there are 247,900 android mobile devices passing through Dublin City Centre every day and if even 200,000 devices fall victim to this attack it could net a potential attacker €200,000 for a single days attack. Since this attack is unique and the method is not know it could take 3 week or more of surveillance for a law enforcement agency to get clothes to catching the attacker, of our above estimate netting a possible €4,200,000 money for 7 days of work for 3 weeks.
The Attack Idea
So let’s assume you want to call a phone number from an android mobile device without pressing keys, they made it simple for and user friendly, that you would only have to say "Ok Google, call 1234567890", The phone will then dial the given number for you to be connected, very simple.
Hypothetically if you set up a premium phone number, and each incoming call had a charge of €1 on the premium rate per minute, you could then ask your Google device to call this number, Of course it would not be practicable nor advisable to walk up to someone and shout loudly, “OK Google, call ### ### ####”.
What is required to make this attack viable is a system to covertly and simultaneously instruct multiple devices, to call your premium number that you have set up, well the simple solution would be, you can create an audio track using text to speech and play it via a speaker. It will work when within hearing distance of listening devices, but again the device owners are aware of your actions.
My solution was to embed the instruction within a song. To overcome the music confusing the Voice Recognition the instruction needs to be placed at a null point of low base or beats, and no audible vocal tones.
Further research and I found the perfect song by singer AronChupa titled “I'm an Albatraoz” because between the times of 1:37 - 1:44 there is a null in the song. Furthermore psychology studies have shown that after 1 minute of listening to a common song most people tend to drown out the lyrics and lose focus on what is being said, only focusing on the beat. So decided to add the text to speech clip to the song at the low beat section where you would expect lyrics. It Looked Like This