Voice technology is a growing technology that involves a computer’s ability to recognize spoken words and put out appropriate responses. It uses artificial intelligence and voice recognition technology.
Voice technology is often integrated into assistant-like platforms such as Amazon Echo and Apple’s Siri. These are smart devices that can communicate with and serve consumers through speech-based interactions. They receive verbal commands and reply with an automated verbal response or by performing an action, overall intending to meet the consumer’s demand.
It should be noted that recent voice technology innovations have lead to consumer privacy issues. The speech recognition capabilities of voice technology devices concern consumers due to the possibility of their conversations being recorded and used by businesses without their awareness.
Recording Conversations Is the Norm for Voice Technology, But Where Are the Boundaries?
Google said that it listens to 0.2% of the audio snippets taken from Google Assistants as a way for its language experts to get a better understanding of different languages and accents as a “critical part” in building speech technology. Google is investigating the contractor that leaked some of these audio clips to a news source that brought to light Google’s practice of listening to recorded audio snippets.
Google Assistant is only one of many virtual assistants (Amazon Alexa, Apple’s Siri, etc.), and it is not uncommon for makers of voice technology to listen to audio snippets to identify any errors the technology may be making. If customers were made aware that the conversations they have with Google Assistant could potentially be recorded and used for technology building, the issue would not be so great.
The problem is that some of these conversations that end up being recorded and then listened to by contractors are private conversations that consumers are not aware are being recorded. Google Assistant is only supposed to be activated when someone says “Ok Google,” or taps a button on their phone; but somehow, Assistant is being activated and recording conversations that are not meant for it.
Are Voice Biometrics Ready for Primetime Authentication
First, we need to be careful with terminology. Synthetic speech is not inherently bad. It simply refers to a process of generating natural-sounding speech like a human voice. Typically, synthetic speech applications allow a user to specify a particular gender with a regional accent, or they may attempt to sound like one individual. (Google WaveNet does this well.) The benign uses include making interaction with computers more natural and convenient for customers. There are questions about the ethics of using synthetic speech in applications when the user may not know that they are conversing with an application—like Google Duplex—but that is a different issue from perpetrating fraud using synthetic speech.
Now, let’s separate voice from speech. When humans learn to recognize the voice of a friend or loved one, or business associate, they learn in context. The human voice has enough physical attributes that can be captured in digital form to identify individuals, similar to fingerprints.
A model of an individual’s voice obtained by analyzing samples of actual speech should include a representation of the sounds (the physical components), the way the person speaks (behavior, or speech patterns with pauses, cadence, phrasing, vocabulary, etc.) and perhaps, passively obtained behavioral data such as how they hold their telephone. For example, the angle of the phone and which ear is used can be determined by sensors on the phone.
A Look at Fraud: Intentional Deception of Voice Authentication Technology
As a rule:
Every system designed to authenticate a user to protect real or intellectual property, including data, will be attacked for sport or profit.
Using synthesized speech to impersonate an individual in order to thwart authentication—fraud—requires an understanding of the system you’re trying to fool and construction of a model of the user’s voice that includes all of the attributes the authentication system monitors. The sound of a voice without the user’s speech pattern should be useless in trying to fool a good security system.
As the value of access increases, fraudsters are more motivated to invest in technology to successfully impersonate an authorized user. Identifying patterns that more completely model an individual’s voice and speech is an ideal application for some deep learning technologies to identify patterns that a human might not detect, but that may be used by the authentication algorithm. An authentication system that uses a voiceprint captured with state-of-the-art voice biometrics is still vulnerable to attack, although it is significantly more secure than one that relies solely on passwords (which have a finite number of possibilities, vulnerable to a brute force attack).
What Can Call Centers and Potential Victims Do to Protect Against Synthetic Speech Fraud?
As the same technologies that enable the creation of voiceprints (deep learning, audio and behavioral analytics, etc.) are also available to those who seek to break the authentication systems, the best solution is to use a combination of voiceprints and additional data that will be difficult for the fraudster to obtain.
Multi-factor authentication for a voice-first access system should include some non-voice components, such as an IP address from a registered device, or behavioral attributes.
Leading edge solutions today also maintain a database of voiceprints from known fraudsters, similar to directories of fraudulent websites.
For individuals who use voice authentication solutions, it is important to recognize that the weakest link is often a careless user. Insist on multi-factor authentication for any application that has access to your sensitive data. If your service provider uses a catchphrase that you are required to speak to gain access, never utter that phrase outside the application. High-quality audio recorders and digital audio editors are inexpensive, and it gets easier every day to create a realistic audio impersonation of any individual, anywhere.
Companies Need to Take Responsibility When It Comes to Privacy
While it is understandable that some of the conversations users have with Google Assistant be recorded and used for product development (given that users are aware of this when they purchase and use the product), a line has to be drawn when it comes to how conversations are recorded and how it is possible for Google Assistant to be activated unintentionally. Google needs to do its do-diligence and ensure that if it’s necessary for it to record conversations, then the privacy of its users needs to be respected and ensured above all else.
Although Google says that audio snippets are decoupled from the users they are recorded from, it is still unacceptable that these devices that people keep in their home could potentially be recording their private conversations, and this issue needs to be addressed and fixed.