Voice interface is definitely gaining more momentum. While google was demonstrating google duplex in this years IO, Microsoft was also demonstrating smart meetings with smart transcripts at their build. This clearly indicates the race for winning the voice wars. Google have moved from Mobile first to AI first approach and their products clearly indicate that. While we were all busy building mobile applications so actively in the past half a decade, this shift towards AI first approach is huge. However, this shift in AI first approach also opens up new interface of ‘voice first’ approach in many cases.
Based on my experience with the voice assistants, I have tried to summarize the key elements in porting the mobile app to voice based app. Let’s see one by one.

Authentication – Enabling the user to use the voice assistant app has 2 flows.
- Initial auth – The initial auth is nothing but linking your account using the app in your phone. Even for devices like smart speakers we link the account using the mobile phone.
- Continuous auth – This is a significant piece in the authentication process; once the initial auth is done, the user should be recognized on making future requests.
Voice assistants on Wearables/phones – While researching on this area, found vAuth from a research paper by university of Michigan interesting to match the requirements that I was looking for. Match users voice with additional channels that provides physical assurance. vAuth makes sure that it executes the command of the intended user by collecting the body vibrations using accelerometer. This may be the most suitable cases with wearables/smart phones.
Smart assistants like google home and amazon echo – Enable smart security settings on crucial transactions that may involve purchases. For instance, Alexa has voice purchase settings where you can decide whether or not to enable purchases using voice. And if you decide so, then you can even set up the PIN that you will say it to fulfill the transaction.
Concerns that still needs to be addressed –
1) Replayed attacks
2) Mangled voice attacks
3) Impersonation attacks
Authorization – Granting authorization is another crucial part in porting the mobile apps use case to voice assistant apps. Now that you have authenticated your account to work with smart assistants, further the devices should filter the true valid requests from the user. Voice pattern matching can recognize different users, however there is a big scope in this area to improve.
Security concerns with the authorization –
1) Researchers from UC Berkeley have demonstrated that they can embed stealthy commands within songs and can carry out actions on platforms like Siri, Alexa, Google assistant without humans noticing it.
2) Similarly, researchers from Princeton university and China’s Zhejiang university have proved that voice assistants could be activated using frequencies inaudible to human ear (Referred as Dolphin Attack).
Understanding context – Since it is a natural language that is being received as request from the user, this opens up lots of opportunity to understand the context when compared to an app in smart phone. Derive context and meaning and match the right action that can cater the request of the user. The real challenge is not just linguistic understanding of what the customer is speaking, but the ability to intuitively decipher emotions or expressions or linguistic peculiarities.
Enable Continued conversation – Unlike mobile apps, the user do not have anything to visualize with voice assistant apps, hence a continued conversation is needed to engage the customer more with the voice app. Guide the user all through to finish a transaction that the user was intending to do. This can be an opportunity for up-sell/cross-sell in some business cases or receive feedback and so on.
Always online only – Currently the popular smart speakers and other voice assistants do not have offline functionality. So while porting the feature from mobile app to voice assistant app this should be considered. For example, there might be a case where a simple CMS might have been used in mobile app and those should be handled appropriately for voice based apps.
Making Human Machine Interface for Voice work – Gartner has stated the below interesting facts,
1) By 2020, 30 percent of web browsing sessions will be done without a screen.
2) By 2019, 20 percent of brands will abandon their mobile apps.
So this clearly means that there are new approaches emerging. Clearly google and Alexa have made new interface and platforms based on ‘voice-first’ interactions. Voice first approach eliminates the usage of ones hands and eyes for browsing and extends the web session to activities like cooking, driving, walking, exercising and so on.
More personalized results – Advanced voice print identification will help voice assistants to understand individual voices better. And as we speak often to voice assistants it also gets trained with different voices. Your application should cater unique results for different users on the same voice assistants.
We are definitely just in the voice 1.0 revolution. We can see many adopting and moving towards voice as the first interface for their use cases. As the industry moves forward we can see lots of improvements in voice assistants.
Happy learning!