Tag Archives: AI

My baby, DialogCONNECTION, is 11!

4 Dec

This week, my company, DialogCONNECTION Limited, turned 11 years old! ๐ŸŽ‰ ๐Ÿฅ‚ ๐Ÿ˜

It feels like yesterday, when in December 2008 I registered it with Companies House and became Company Director (with multiple hats).

My very first client project was for the NHS Business Authority on their EHIC Helpline (which hopefully will survive the Brexit negotiations). Back then, whenever I was telling anyone what my company does (VUI Design for Speech IVRs), I was greeted by blank stares of confusion or incomprehension. It did feel a bit lonely at times!

Many more clients and thousands of long hours, long days and working weekends since, here we are in December 2019 and I suddenly find myself surrounded by VUI Designers and Voice Strategists who have now seen the potential and inescapable nature of speech interfaces and have followed on my footsteps. I feel vindicated, especially since I started in Voice back in 1996 with my Post-Doc in Spoken Dialogue Management at the University of Erlangen! ๐Ÿ˜Ž (Yet another thing I’m hugely grateful to the EU for!)

We started with Voice-First VUI Design back in 1996, well before Samsung’s BIXBY (2017), Google’s ASSISTANT (2016), Amazon’s ALEXA (2014), Apple’s SIRI (2010) and even before the world started using GOOGLE for internet searches (1998)!

http://dialogconnection.com/who-designs-for-you.html

It’s quite frustrating when I realise that many of these newcomers have never heard of an IVR (Interactive Voice Response) system before, but they will eventually learn. ๐Ÿค“ In the past 25 years it was the developers who insisted could design conversational interfaces without any (Computational) Linguistics, Natural Language Processing (NLP) or Speech Recognition (ASR) background and didn’t need, therefore, a VUI Designer. And we were an allegedly superfluous luxury and rarity in those times. In the past couple of years it’s the shiny Marketing people, who make a living from their language mastery, and the edgy GUI Designers, who excell in visual design and think they can design voice interfaces too, but still know nothing about NLP or ASR.

What they don’t know is that, by modifying, for instance, just the wording of what your system says (prompt tuning), you can achieve dramatically better speech recognition and NLU accuracy, because the user is covertly “guided” to say what we expect (and have covered in the grammar). The same holds for tuned grammars (for out-of-vocabulary words), word pronunciations (for local and foreign accents), tuned VUI designs (for error recovery strategies) and tuned ASR engine parameters (for timeouts and barge-ins). It’s all about knowing how the ASR software and our human brain language software works.

Excited to see what the next decade is going to bring for DialogCONNECTION and the next quarter of a century for Voice! Stay tuned!

Towards EU collaboration on Conversational AI, Data & Robotics

22 Nov

I was really interested to read the BDVA – Big Data Value Association‘s and euRobotics‘ recent report on “Strategic Research, Innovation and Deployment Agenda for an AI PPP: A focal point for collaboration on Artificial Intelligence, Data and Robotics“, which you can find here.

Of particular relevance to me was the Section on Physical and Human Action and Interaction (pp. 39-41), which describes the dependencies, challenges and expected outcome of coordinated action on NLP, NLU and multimodal dialogue processing. The associated challenges are:

  • Natural interaction in unstructured contexts, which is the default in the case of voice assistants for instance, as they are expected to hold a conversation on any of a range of different topics and act on them
  • Improved natural language understanding, interaction and dialogue covering all European languages and age ranges, thus shifting the focus from isolated recognition to the interpretation of the semantic and cultural context, and the user intention
  • Development of verbal and non-verbal interaction models for people and machines, underlining the importance of gestures and emotion recognition and generation (and not only in embodied artificial agents)
  • Co-development of technology and regulation to assure safe interaction in safety-critical and unstructured environments, as the only way to assure trust and, hence, widespread citizen and customer adoption
  • The development of confidence measures for interaction and the interpretation of actions, leading to explanable AI and, hence, improved and more reliable decision-making
No alt text provided for this image

You can find the excellent and very comprehensive report here.

Voice control in Space!

20 Nov

I recently attended The Association for Conversational Interaction Design (ACIXD) Brown bag “Challenges of Implementing Voice Control for Space Applications” presented by the NASA Authority in the field, George Salazar. George Salazar is Human Computing Interface Technical Discipline Lead at NASA with over 30 years of experience and innovation in Space applications. Among a long list of achievements, he was involved in the development of the International Space Station internal audio system and has been awarded several awards, including a John F. Kennedy Astronautics Award, a NASA Silver Achievement Medal and a Lifetime Achievement Award for his service and commitment to STEM. His acceptance speech for that last one brought tears to my eyes! An incredibly knowledgeable and experienced man with astounding modesty and willingness to pass his knowledge and passion to younger generations.

George Salazar’s Acceptance Speech

Back to Voice Recognition.

Mr Salazar explained how space missions slowly migrated over the years from ground control (with dozens of engineers involved) to vehicle control and from just 50 to 100s of buttons. This put the onus of operating all those buttons to the 4-5 person space crew, which in turn brought in speech recognition as an invaluable interface that would make good sense in such a complex environment.ย 

Screenshot from George Salazar’s ACIxD presentation

Factors affecting ASR accuracy in Space

He described how they have tested different Speech Recognition (ASR) software to see which fared the best, both speaker-independent and speaker-dependent. As he noted, they all claim 99% accuracy officially but that is never the case in practice! He listed many factors that affect recognition accuracy, including:

  • background noise (speaker vs background signal separation)
  • multiple speakers speaking simultaneously (esp. in such a noisy environment)
  • foreign accent recognition (e.g. Dutch crew speaking English)
  • intraspeaker speech variation due to psychological factors (as being in space can, apparently, make you depressed, which in turn affects your voice!), but presumably also to physiological factors (e.g. just having a cold)
  • Astronaut gender (low pitch in males vs high pitch in females): ASR software was designed for males, so male astronauts always had better error rates!
  • The effects of microgravity (physiological effects) on the voice quality, as already observed on the first flight (using templates from ground testing as the baseline), are impossible to separate from the environment and crew stress and can lead to a 10-30% error increase!
Screenshot from George Salazar’s ACIxD presentation

  • Even radiation can affect the ASR software, but also the hardware (computing power). As a comparison, AMAZON Alexa uses huge computer farms, whereas in Space they rely on slow “radiation-hardened” processors: they can handle the radiation, but are actually 5-10 times slower than commercial processors!
Screenshot from George Salazar’s ACIxD presentation

Solutions to Space Challenges

To counter all these negative factors, a few different approaches and methodologies have been employed:

  • on-orbit retrain capability: rendering the system adaptive to changes in voice and background noise, resulting in up to 100% accuracy
  • macro-commanding: creating shortcuts to more complex commands
  • redundacy as fallback (i.e. pressing a button as a second modality)
Screenshot from George Salazar’s ACIxD presentation

Critical considerations

One of the challenges that Mr Salazar mentioned in improving ASR accuracy is overadaptation or skewing the system to a single astronaut.

In addition, he mentioned the importance of Dialog Design in NASA’s human-centered design (HCD) Development approach. The astronauts should always be able to provide feedback to the system, particularly for error correction (Confusability leads to misrecognitions).

Screenshot from George Salazar’s ACIxD presentation
Screenshot from George Salazar’s ACIxD presentation

In closing, Mr Salazar stressed that speech recognition for Command and Control in Space applications is viable, especially in the context of a small crew navigating a complex habitat.

Moreover, he underlined the importance of trust that the ASR system needs to inspire in its users, as in this case the astronauts may literally be placing their lives onto its performance and accuracy.

Screenshot from George Salazar’s ACIxD presentation

Q & A

After Mr Salazar’s presentation, I couldn’t help but pose a couple of questions to him, given that I consider myself to be a Space junkie (and not in the sci-fi franchise sense either!).

So, I asked him to give us a few examples of the type of astronaut utterances and commands that their ASR needs to be able to recognise. Below are some such phrases:

  • zoom in, zoom out
  • tilt up, tilt down
  • pan left
  • please repeat

and their synonyms. He also mentioned the case of one astronaut who kept saying “Wow!” (How do you deal with that!)

I asked whether the system ever had to deal with ambiguity in trying to determine which component to tilt, pan or zoom. He answered that, although they do carry out plenty of confusability studies, the context is quite deterministic: the astronaut selects the monitor by going to the monitor section and speaking the associated command. Thus, there is no real ambiguity as such.

Screenshot from George Salazar’s ACIxD presentation

My second question to Mr Salazar was about the type of ASR they have gone for. I understood that the vocabulary is small and contained / unambiguous, but wasn’t sure whether they went for speaker-dependent or speaker-independent recognition in the end. He replied that the standard now is speaker-independent ASR, which however has been adapted to a small group of astronauts (i.e. “group-dependent“). Hence, all the challenges of distinguishing between different speakers with different pitch and accents, all against the background noise and the radiation and microgravity effects! They must be really busy!

It was a great pleasure to listen to the talk and an incredible and rare honour to get to speak with such an awe-inspiring pioneer in Space Engineering!

Ad Astra!

Human-Machine Interaction in Translation (NLPCS 2011)

21 Aug

For a few years now I have been in the Programme Committee of the International Workshop on Natural Language Processing and Cognitive Science (NLPCS), organised by a long-time colleague and friend, Dr. Bernadette Sharp from Staffordshire University. The aim of this annual workshop is “to bring together researchers and practitioners in Natural Language Processing (NLP) working within the paradigm of Cognitive Science (CS)“.

The overall emphasis of the workshop is on the contribution of cognitive science to language processing, including conceptualisation, representation, discourse processing, meaning construction, ontology building, and text mining.”

There have been NLPCS  Workshops in Porto (2004), Miami (2005), Paphos (2006), Funchal (2007), Barcelona (2008), Milan (2009) and Funchal (2010).

Copenhagen Business School

Copenhagen Business School

This year’s 8th International NLPCS Workshop just took place this weekend in Copenhagen, Denmark (20-21 Aug 2011). The Workshop topic was: “Human-Machine Interaction in Translation“, focussing on all aspects of human and machine translation, and human-computer interaction in translation, including:  translatorsโ€™ experiences with CAT tools, human-machine interface design, evaluation of interactive machine translation, user simulation and human factors. Thus, the topics were approached from a number of different perspectives:

  • from full automation by machines for machine (traditional NLP or HLT)
  • semi-automated processing, i.e. machine-mediated processing (programs assisting people in their tasks),
  • but also simulation of human cognitive processes

I had the opportunity once again to review a few of the paper submissions and can therefore highly recommend reading the full Proceedings of the NLPCS 2011 Workshop that have just been made available.

I found particularly interesting the following 3 contributions:

  • Valitutti, A. “How Many Jokes are Really Funny? A New Approach to the Evaluation of Computational Humour Generators”
  • Nilsson, M. and J. Nivre. “Entropy-Driven Evaluation of Models of Eye Movement Control in Reading” 

and

  • Finch, A., Song, W., Tanaka-Ishii, K. and E. Sumita. “Source Language Generation from Pictures for Machine Translation on Mobile Devices”

Enjoy!

FutureEverything 2011 – The Future is now (here in Manchester!)

12 May

Today saw the launch of the very interdisciplinary (some would say “transdisciplinary” even) FutureEverything Festival (previously Futuresonic) , a long-running and world-renowned annual Conference and Festival of Technology and Innovation, Art and Music running from the 11th to the 14th May in Manchester , UK (@FuturEverything #futr).  Apart from the annual May events,

FutureEverything creates year-round Digital Innovation projects that combine creativity, participation and new technologies to deliver elegant business and research solutions.   In 2010 we launched the FutureEverything Award, an international prize for artworks, social innovations or software and technology projects that bring the future into the present.

I have always made a point to attend at least one music or art event every year since 2007 (when the Festival was still called Futuresonic) and I have always been particularly interested in the forward-thinking Digital Technologies Conference.  So I was over the moon when I was invited to participate in the Conference and informally share my words of wisdom on speech and language technologies for emotional computing. Armed with my complimentary Festival Pass, I am now really looking forward to 2 days (Thu 12 – Fri 13 May 2011) packed with presentations, discussions and debates on: Urban Games and Virtual Identities, Robots  and Smart Cities, open data and participatory democracy. community-serving Geeks and Hackers, Open source software and citizen inclusion, and one of my favourites, emotional computing: making human-computer interfaces personable, engaging and persuasive and interaction with them more intuitive and even fun.

The FutureEverything Conference is brainstorming on a massive scale. Combined with all the live Twitter updates and feeds, it is going to have once again viral impact worldwide with the novel, brave and infectious ideas that will be coming out of it and around it. At the same time, the use of dynamic and democratic microblogging will allow massive participation to the Conference by people on both sides of the Atlantic who are not physically present but are still listening and virtually and remotely contributing their feedback and ideas. In fact, the FutureEverything Festival and the Conference are quintessential instantiations of the perfect balance of online – offline, virtual and real, local and remote, one-to-many / many-to-one broadcasting. And I’m right in the middle of this awesome time-space continuum (May 2011 in Manchester UK)! ๐Ÿ™‚

Update (Sun 15 May):

There is now a FutureEverything Festival Portal with a compilation of blog posts, photos, audio, video and more related to the 4 days of the Festival and Conference. Check it out here: http://www.fe-2011.org/

I will also be adding my feedback on what I heard at the Conference in the next couple of days.