Tag Archives: voice recognition

Voice control in Space!

20 Nov

I recently attended The Association for Conversational Interaction Design (ACIXD) Brown bag “Challenges of Implementing Voice Control for Space Applications” presented by the NASA Authority in the field, George Salazar. George Salazar is Human Computing Interface Technical Discipline Lead at NASA with over 30 years of experience and innovation in Space applications. Among a long list of achievements, he was involved in the development of the International Space Station internal audio system and has been awarded several awards, including a John F. Kennedy Astronautics Award, a NASA Silver Achievement Medal and a Lifetime Achievement Award for his service and commitment to STEM. His acceptance speech for that last one brought tears to my eyes! An incredibly knowledgeable and experienced man with astounding modesty and willingness to pass his knowledge and passion to younger generations.

George Salazar’s Acceptance Speech

Back to Voice Recognition.

Mr Salazar explained how space missions slowly migrated over the years from ground control (with dozens of engineers involved) to vehicle control and from just 50 to 100s of buttons. This put the onus of operating all those buttons to the 4-5 person space crew, which in turn brought in speech recognition as an invaluable interface that would make good sense in such a complex environment. 

Screenshot from George Salazar’s ACIxD presentation

Factors affecting ASR accuracy in Space

He described how they have tested different Speech Recognition (ASR) software to see which fared the best, both speaker-independent and speaker-dependent. As he noted, they all claim 99% accuracy officially but that is never the case in practice! He listed many factors that affect recognition accuracy, including:

  • background noise (speaker vs background signal separation)
  • multiple speakers speaking simultaneously (esp. in such a noisy environment)
  • foreign accent recognition (e.g. Dutch crew speaking English)
  • intraspeaker speech variation due to psychological factors (as being in space can, apparently, make you depressed, which in turn affects your voice!), but presumably also to physiological factors (e.g. just having a cold)
  • Astronaut gender (low pitch in males vs high pitch in females): ASR software was designed for males, so male astronauts always had better error rates!
  • The effects of microgravity (physiological effects) on the voice quality, as already observed on the first flight (using templates from ground testing as the baseline), are impossible to separate from the environment and crew stress and can lead to a 10-30% error increase!
Screenshot from George Salazar’s ACIxD presentation

  • Even radiation can affect the ASR software, but also the hardware (computing power). As a comparison, AMAZON Alexa uses huge computer farms, whereas in Space they rely on slow “radiation-hardened” processors: they can handle the radiation, but are actually 5-10 times slower than commercial processors!
Screenshot from George Salazar’s ACIxD presentation

Solutions to Space Challenges

To counter all these negative factors, a few different approaches and methodologies have been employed:

  • on-orbit retrain capability: rendering the system adaptive to changes in voice and background noise, resulting in up to 100% accuracy
  • macro-commanding: creating shortcuts to more complex commands
  • redundacy as fallback (i.e. pressing a button as a second modality)
Screenshot from George Salazar’s ACIxD presentation

Critical considerations

One of the challenges that Mr Salazar mentioned in improving ASR accuracy is overadaptation or skewing the system to a single astronaut.

In addition, he mentioned the importance of Dialog Design in NASA’s human-centered design (HCD) Development approach. The astronauts should always be able to provide feedback to the system, particularly for error correction (Confusability leads to misrecognitions).

Screenshot from George Salazar’s ACIxD presentation
Screenshot from George Salazar’s ACIxD presentation

In closing, Mr Salazar stressed that speech recognition for Command and Control in Space applications is viable, especially in the context of a small crew navigating a complex habitat.

Moreover, he underlined the importance of trust that the ASR system needs to inspire in its users, as in this case the astronauts may literally be placing their lives onto its performance and accuracy.

Screenshot from George Salazar’s ACIxD presentation

Q & A

After Mr Salazar’s presentation, I couldn’t help but pose a couple of questions to him, given that I consider myself to be a Space junkie (and not in the sci-fi franchise sense either!).

So, I asked him to give us a few examples of the type of astronaut utterances and commands that their ASR needs to be able to recognise. Below are some such phrases:

  • zoom in, zoom out
  • tilt up, tilt down
  • pan left
  • please repeat

and their synonyms. He also mentioned the case of one astronaut who kept saying “Wow!” (How do you deal with that!)

I asked whether the system ever had to deal with ambiguity in trying to determine which component to tilt, pan or zoom. He answered that, although they do carry out plenty of confusability studies, the context is quite deterministic: the astronaut selects the monitor by going to the monitor section and speaking the associated command. Thus, there is no real ambiguity as such.

Screenshot from George Salazar’s ACIxD presentation

My second question to Mr Salazar was about the type of ASR they have gone for. I understood that the vocabulary is small and contained / unambiguous, but wasn’t sure whether they went for speaker-dependent or speaker-independent recognition in the end. He replied that the standard now is speaker-independent ASR, which however has been adapted to a small group of astronauts (i.e. “group-dependent“). Hence, all the challenges of distinguishing between different speakers with different pitch and accents, all against the background noise and the radiation and microgravity effects! They must be really busy!

It was a great pleasure to listen to the talk and an incredible and rare honour to get to speak with such an awe-inspiring pioneer in Space Engineering!

Ad Astra!

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!

An Amazon Echo in every hotel room?

16 Dec

The Wynn Las Vegas Hotel just announced that it will be installing the Amazon Echo device in every one of its 4,748 guest rooms by Summer 2017. Apparently, hotel guests will be able to use Echo, Amazon’s hands-free voice-controlled speaker, to control room lights, temperature, and drapery, but also some TV functions.

 

CEO Steve Wynn:  “I have never, ever seen anything that was more intuitively dead-on to making a guest experience seamlessly delicious, effortlessly convenient than the ability to talk to your room and say .. ‘Alexa, I’m here, open the curtains, … lower the temperature, … turn on the news.‘ She becomes our butler, at the service of each of our guests”.

 

The announcement does, however, also raise security concerns. The Alexa device is always listening, at least for the “wake word”. This is, of course, necessary for it to work when you actually need it. It needs to know when it is being “addressed” to start recognising what you say and hopefully act on it afterwards. Interestingly, though, according to the Alexa FAQ:

 

When these devices detect the wake word, they stream audio to the cloud, including a fraction of a second of audio before the wake word.

That could get embarrassing or even dangerous, especially if the “wake word” was actually a “false alarm“, i.e. something the guest said to someone else in the room perhaps that sounded like the wake word.

All commands are saved on the device’s History. The question is: Will the hotel automatically wipe the device’s history once a guest has checked out? Or at least before the next guest arrives in the room! Can perhaps every guest have access to their own history of commands, so that they can delete it themselves just before check-out? These are crucial security aspects that the Hotel needs to consider, because it would be a shame for this seamlessly delicious and effortlessly convenient experience to be cut short by paranoid guests switching the Echo off as soon as they enter the room!

Meet META, the Meta-cognitive skills Training Avatar!

16 Jun

METALOGUE logo

EU FP7 logo

 

Since November 2013, I’ve had the opportunity to participate in the EU-funded FP7 R & D project, METALOGUE, through my company DialogCONNECTION Ltd, one of 10 Consortium Partners. The project aims to develop a natural, flexible, and interactive Multi-perspective and Multi-modal Dialogue system with meta-cognitive abilities; a system that can:

  • monitor, reason about, and provide feedback on its own behaviour, intentions and strategies, and the dialogue itself,
  • guess the intentions of its interlocutor,
  • and accordingly plan the next step in the dialogue.

The system tries to dynamically adapt both its strategy and behaviour (speech and non-verbal aspects) in order to influence the dialogue partner’s reaction, and, as a result, the progress of the dialogue over time, and thereby also achieve its own goals in the most advantageous way for both sides.

The project is in its 3rd and final year (ending in Oct 2016) and has a budget of € 3,749,000 (EU contribution: € 2,971,000). METALOGUE brings together 10 Academic and Industry partners from 5 EU countries (Germany, Netherlands, Greece, Ireland, and UK).

 

METALOGUE focuses on interactive and adaptive training situations, where negotiation skills play a key role in the decision-making processes. Reusable and customisable software components and algorithms have been developed, tested and integrated into a prototype platform, which provides learners with a rich and interactive environment that motivates them to develop meta-cognitive skills, by stimulating creativity and responsibility in the decision-making, argumentation, and negotiation process. The project is producing a virtual trainer, META, a Training Avatar capable of engaging in natural interaction in English (currently, with the addition of German and Greek in the future), using gestures, facial expressions, and body language.

METALOGUE Avatar

Pilot systems have been developed for 2 different user scenarios: a) debatingand b) negotiation, both tested and evaluated by English-speaking students at the Hellenic Youth Parliament. We are currently targeting various industry verticals, in particular Call Centres, e.g. to semi-automate and enhance Call Centre Agent Training.

 

And here’s META in action!

 

In this video, our full-body METALOGUE Avatar is playing the role of a business owner, who is negotiating a smoking ban with a local Government Counsellor.   Still imperfect (e.g. there is some slight latency before replying – and an embarrassing repetition at some point!), but you can also see the realistic facial expressions, gaze, gestures, and body language, and even selective and effective pauses. It can process natural spontaneous speech in a pre-specified domain (smoking ban, in this case) and it has reached an ASR error rate below 24% (down from almost 50% 2 years ago!). The idea is to use such an Avatar in Call Centres to provide extra training support on top of existing training courses and workshops. It’s not about replacing the human trainer, but rather empowering and motivating Call Centre Trainee Agents who are trying to learn how to read their callers and how to successfully negotiate deals and even complaints with them in an optimal way.

IMG_20151218_143348

 

My company, DialogCONNECTION, is charged with the task of attracting interest and feedback from industry to gauge the relevance and effectiveness of the METALOGUE approach in employee training contexts (esp. negotiation and decision-making). We are looking in particular for Call Centres;both small and agile (serving multiple small clients) and large (and probably plagued by the well-known agent burn-out syndrome). Ideally, you would give us access to real-world Call Centre Agent-Caller/Customer recordings or even simulated Trainer – Trainee phone calls that are used for situational Agent training (either already available or collected specifically for the project). A total of just 15 hours of audio (and video if available) would suffice to train the METALOGUE speech recognisers and the associated acoustic and language models, as well as its metacognitive models.

However, if you don’t want to commit your organisation’s data, any type of input and feedback would make us happy! As an innovative pioneering research project, we really need guidance, evaluation and any input from the real world of industry! So, if we have sparked your interest in any way and you want to get involved and give it a spin, please get in touch!

The 2015 stats are in!

20 Jan

The WordPress.com stats monkeys prepared an annual report for this blog.

Top blog posts in 2015 were: “A.I.: from Sci-Fi to Science reality” and the ever popular older “Speech Recognition for Dummies” and the classic “Voice-activated lift won’t do Scottish! (Burnistoun S1E1 – ELEVEN!“.

Scottish Elevator – Voice Recognition – ELEVEN!

(YouTube – Burnistoun – Series 1 , Episode 1 [ Part 1/3 ])

Voice recognition technology? …  In a lift? … In Scotland? … You ever TRIED voice recognition technology? It don’t do Scottish accents!

🙂

 

So, we had 4,500 unique visitors in 2015! Thank you!

A New York City subway train holds 1,200 people. This blog was viewed about 4,500 times in 2015. If it were a NYC subway train, it would take about 4 trips to carry that many people.

Check out some more stats in the full WordPress report.

Happy 2016! 🙂

Develop your own Android voice app!

26 Dec

Voice application Development for Android

My colleague Michael F. McTear has got a new and very topical book out! Voice Application Development for Android, co-authored with Zoraida Callejas. Apart from a hands-on step-by-step but still condensed guide to voice application development, you get the source code to develop your own Android apps for free!

Get the book here or through Amazon. And have a look at the source code here.

Exciting times ahead for do-it-yourself Android speech app development!

The AVIxD 49 VUI Tips in 45 Minutes !

6 Nov

Image

 

 

The illustrious Association for Voice Interaction Design (AVIxD) organised a Workshop in the context of SpeechTEK in August 2010, whose goal was “to provide VUI designers with as many tips as possible during the session“. Initially the goal was 30 Tips in 45 minutes. But they got overexcited and came up with a whooping 49 Tips in the end! The Session was moderated by Jenni McKienzie, and the panelists were David Attwater, Jon Bloom, Karen Kaushansky, and Julie Underdahl. This list dates back 3 years now, but it’s by no means outdated. This is the most sound advice you will find in designing better voice recognition IVRs and I hated it being buried in a PDF!

So I am audaciously plagiarising and bringing you here: the 49 VUI Tips for Better Voice User Interface Design! Or go and read the .PDF yourselves here:

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

And finally ….

Image

 

Have you got a VUI Tip you can’t find in this list that you’d like to share? Tell us here!