Tag Archives: Ubiquitous Voice

Voice control in Space!

20 Nov

I recently attended The Association for Conversational Interaction Design (ACIXD) Brown bag “Challenges of Implementing Voice Control for Space Applications” presented by the NASA Authority in the field, George Salazar. George Salazar is Human Computing Interface Technical Discipline Lead at NASA with over 30 years of experience and innovation in Space applications. Among a long list of achievements, he was involved in the development of the International Space Station internal audio system and has been awarded several awards, including a John F. Kennedy Astronautics Award, a NASA Silver Achievement Medal and a Lifetime Achievement Award for his service and commitment to STEM. His acceptance speech for that last one brought tears to my eyes! An incredibly knowledgeable and experienced man with astounding modesty and willingness to pass his knowledge and passion to younger generations.

George Salazar’s Acceptance Speech

Back to Voice Recognition.

Mr Salazar explained how space missions slowly migrated over the years from ground control (with dozens of engineers involved) to vehicle control and from just 50 to 100s of buttons. This put the onus of operating all those buttons to the 4-5 person space crew, which in turn brought in speech recognition as an invaluable interface that would make good sense in such a complex environment. 

Screenshot from George Salazar’s ACIxD presentation

Factors affecting ASR accuracy in Space

He described how they have tested different Speech Recognition (ASR) software to see which fared the best, both speaker-independent and speaker-dependent. As he noted, they all claim 99% accuracy officially but that is never the case in practice! He listed many factors that affect recognition accuracy, including:

  • background noise (speaker vs background signal separation)
  • multiple speakers speaking simultaneously (esp. in such a noisy environment)
  • foreign accent recognition (e.g. Dutch crew speaking English)
  • intraspeaker speech variation due to psychological factors (as being in space can, apparently, make you depressed, which in turn affects your voice!), but presumably also to physiological factors (e.g. just having a cold)
  • Astronaut gender (low pitch in males vs high pitch in females): ASR software was designed for males, so male astronauts always had better error rates!
  • The effects of microgravity (physiological effects) on the voice quality, as already observed on the first flight (using templates from ground testing as the baseline), are impossible to separate from the environment and crew stress and can lead to a 10-30% error increase!
Screenshot from George Salazar’s ACIxD presentation

  • Even radiation can affect the ASR software, but also the hardware (computing power). As a comparison, AMAZON Alexa uses huge computer farms, whereas in Space they rely on slow “radiation-hardened” processors: they can handle the radiation, but are actually 5-10 times slower than commercial processors!
Screenshot from George Salazar’s ACIxD presentation

Solutions to Space Challenges

To counter all these negative factors, a few different approaches and methodologies have been employed:

  • on-orbit retrain capability: rendering the system adaptive to changes in voice and background noise, resulting in up to 100% accuracy
  • macro-commanding: creating shortcuts to more complex commands
  • redundacy as fallback (i.e. pressing a button as a second modality)
Screenshot from George Salazar’s ACIxD presentation

Critical considerations

One of the challenges that Mr Salazar mentioned in improving ASR accuracy is overadaptation or skewing the system to a single astronaut.

In addition, he mentioned the importance of Dialog Design in NASA’s human-centered design (HCD) Development approach. The astronauts should always be able to provide feedback to the system, particularly for error correction (Confusability leads to misrecognitions).

Screenshot from George Salazar’s ACIxD presentation
Screenshot from George Salazar’s ACIxD presentation

In closing, Mr Salazar stressed that speech recognition for Command and Control in Space applications is viable, especially in the context of a small crew navigating a complex habitat.

Moreover, he underlined the importance of trust that the ASR system needs to inspire in its users, as in this case the astronauts may literally be placing their lives onto its performance and accuracy.

Screenshot from George Salazar’s ACIxD presentation

Q & A

After Mr Salazar’s presentation, I couldn’t help but pose a couple of questions to him, given that I consider myself to be a Space junkie (and not in the sci-fi franchise sense either!).

So, I asked him to give us a few examples of the type of astronaut utterances and commands that their ASR needs to be able to recognise. Below are some such phrases:

  • zoom in, zoom out
  • tilt up, tilt down
  • pan left
  • please repeat

and their synonyms. He also mentioned the case of one astronaut who kept saying “Wow!” (How do you deal with that!)

I asked whether the system ever had to deal with ambiguity in trying to determine which component to tilt, pan or zoom. He answered that, although they do carry out plenty of confusability studies, the context is quite deterministic: the astronaut selects the monitor by going to the monitor section and speaking the associated command. Thus, there is no real ambiguity as such.

Screenshot from George Salazar’s ACIxD presentation

My second question to Mr Salazar was about the type of ASR they have gone for. I understood that the vocabulary is small and contained / unambiguous, but wasn’t sure whether they went for speaker-dependent or speaker-independent recognition in the end. He replied that the standard now is speaker-independent ASR, which however has been adapted to a small group of astronauts (i.e. “group-dependent“). Hence, all the challenges of distinguishing between different speakers with different pitch and accents, all against the background noise and the radiation and microgravity effects! They must be really busy!

It was a great pleasure to listen to the talk and an incredible and rare honour to get to speak with such an awe-inspiring pioneer in Space Engineering!

Ad Astra!

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!