Archive | Voice system Tuning & Optimisation RSS feed for this section

How to Design & Optimise Conversational AI Dialogue Systems

26 Jun
Data Futurology EP122 with Dr Maria Aretoulaki

The latest episode of Data Futurology features myself talk with Felipe Flores about how I got into AI, Machine Learning and Deep Learning (“When I discovered Artificial Neural Networks back in 1993, I thought I had discovered God!”) and later Data Science and Big Data Speech Analytics for Voice User Interface (VUI) and Voice First Design (for Speech IVRs and Voice Assistants).

In the podcast, I give an overview of all the different steps involved in VUI Design, the often tricky interaction with the different stakeholders (Coders, Business Units, Marketeers) and the challenges of working in an Agile PM framework, when skipping detailed Voice Design is suicide (and my worst nightmare!). I showcase some of the interesting and outright funny things you discover, when you analyse real-world human-machine dialogues taken from Speech IVR systems (hint, most original swearwords!) and I pinpoint the main differences between a Speech IVR and a Voice Assistant skill / action / capsule. I also contemplate on where Voice is heading now with the ubiquity of Voice First and the prevalance of skills that were developed by software engineers, web designers and Marketing copywriters, without VUI Design expertise, Linguistics Training or knowledge of how Speech Recognition works.

Data Futurology Podcast EP122 – Optimising Conversational AI Dialogue Systems with Dr. Maria Aretoulaki
Episode list of contents

I even provide tips on how to survive working for a Start-up and strategies on how to stay focused and strong when you run your own SME Business. Below is a brief checklist of some of the core qualities required and lessons I have learned working for Start-ups:

No alt text provided for this image

You can listen to the whole episode on Data Futurology , on Apple Podcasts, on Google Podcasts, on Spotify, or wherever you get your Podcasts or you can watch it on YouTube:

#122 – Optimising Conversational AI Dialogue Systems with Dr. Maria Aretoulaki

It was a delight speaking to Felipe! It made me think about and consolidate some of the – occasionally hard – lessons I have learned along the way about Voice Design, NLP, AI, ML and running a Business, so that others hopefully prepare for and ideally also avoid any heartaches!

No alt text provided for this image

The immortality of Data

10 Jun
Boundless Podcast Episode EP48

The latest Boundless Podcast Episode is out! It features myself in a deep conversation with Richard Foster-Fletcher about Big Data and Speech Analytics for Voice First & Voice Assistants (but not only), BigTech, AI Ethics and the need for a new legal framework for Responsible AI and Explainable Machine Learning. Below is a snippet of the conversation:

Boundless Podcast Episode EP48 – snippet

“My data is being used and exploited, and I can do nothing about it. We need to modernise the legal system. Apart from all the ethical, moral discussions that need to be made, we need a legal system that takes into consideration the fact that intelligence doesn’t need to be visible to be acting against me.”

I wouldn’t call myself a technophobe. Quite the opposite. I was learning how to program (BASIC!) back in 1986 in the Summer after getting my degree in English Language & Literature; I was teaching computers how to translate between human languages and chatting online with mainframes in University buildings 2 kms away back in 1991; I was programming Artificial Neural Networks and using Parallel computers back in 1993; I was reading internet newsgroups and downloading music – very much legally! – on a web browser (Netscape!) in 1994; I was designing Voice interfaces already in 1996 and voice-controlled home assistants back in 1998. I have even been using LinkedIn since 2005.

Yet, I am very sceptical and pessimistic about our uninhibited sharing of personal data and sensitive information all day every day on multiple channels, to multiple audiences, much of it willingly, much more unwillingly, in the name of sharing and human connection, service and product personalisation and ultimately, far too often, monetisation.

What will that mean for our legacy as individuals? Who will own, control and curate all the generated data after our death? Who can clone us and why? Will there be a second life after death? Will there ever be freedom from a dictator or will there ever be any point in bequeathing anything in a will? These and many more questions are discussed in this episode. I had lots of fun recording this! Thank you so much to Richard for creating this🙏

You can listen to a snippet of the conversation here (Download)

No alt text provided for this image

You can listen to the full episode here or wherever you get your podcasts.

Alternatively, you can listen to it on YouTube (in 2 parts):

Part 1

Boundless EP48 – Part 1

And Part 2

Boundless EP48 – Part 2

#BigData #BigTech #SpeechAnalytics #VoiceFirst #VoiceAssistants #Alexaskills #GoogleAssistant #Bixby #AIethics #responsibleAI #explainableVUI #AI #ArtificialIntelligence #MachineLearning #ML #DeepLearning #ANNs

My baby, DialogCONNECTION, is 11!

4 Dec

This week, my company, DialogCONNECTION Limited, turned 11 years old! 🎉 🥂 😁

It feels like yesterday, when in December 2008 I registered it with Companies House and became Company Director (with multiple hats).

My very first client project was for the NHS Business Authority on their EHIC Helpline (which hopefully will survive the Brexit negotiations). Back then, whenever I was telling anyone what my company does (VUI Design for Speech IVRs), I was greeted by blank stares of confusion or incomprehension. It did feel a bit lonely at times!

Many more clients and thousands of long hours, long days and working weekends since, here we are in December 2019 and I suddenly find myself surrounded by VUI Designers and Voice Strategists who have now seen the potential and inescapable nature of speech interfaces and have followed on my footsteps. I feel vindicated, especially since I started in Voice back in 1996 with my Post-Doc in Spoken Dialogue Management at the University of Erlangen! 😎 (Yet another thing I’m hugely grateful to the EU for!)

We started with Voice-First VUI Design back in 1996, well before Samsung’s BIXBY (2017), Google’s ASSISTANT (2016), Amazon’s ALEXA (2014), Apple’s SIRI (2010) and even before the world started using GOOGLE for internet searches (1998)!

http://dialogconnection.com/who-designs-for-you.html

It’s quite frustrating when I realise that many of these newcomers have never heard of an IVR (Interactive Voice Response) system before, but they will eventually learn. 🤓 In the past 25 years it was the developers who insisted could design conversational interfaces without any (Computational) Linguistics, Natural Language Processing (NLP) or Speech Recognition (ASR) background and didn’t need, therefore, a VUI Designer. And we were an allegedly superfluous luxury and rarity in those times. In the past couple of years it’s the shiny Marketing people, who make a living from their language mastery, and the edgy GUI Designers, who excell in visual design and think they can design voice interfaces too, but still know nothing about NLP or ASR.

What they don’t know is that, by modifying, for instance, just the wording of what your system says (prompt tuning), you can achieve dramatically better speech recognition and NLU accuracy, because the user is covertly “guided” to say what we expect (and have covered in the grammar). The same holds for tuned grammars (for out-of-vocabulary words), word pronunciations (for local and foreign accents), tuned VUI designs (for error recovery strategies) and tuned ASR engine parameters (for timeouts and barge-ins). It’s all about knowing how the ASR software and our human brain language software works.

Excited to see what the next decade is going to bring for DialogCONNECTION and the next quarter of a century for Voice! Stay tuned!

Voice control in Space!

20 Nov

I recently attended The Association for Conversational Interaction Design (ACIXD) Brown bag “Challenges of Implementing Voice Control for Space Applications” presented by the NASA Authority in the field, George Salazar. George Salazar is Human Computing Interface Technical Discipline Lead at NASA with over 30 years of experience and innovation in Space applications. Among a long list of achievements, he was involved in the development of the International Space Station internal audio system and has been awarded several awards, including a John F. Kennedy Astronautics Award, a NASA Silver Achievement Medal and a Lifetime Achievement Award for his service and commitment to STEM. His acceptance speech for that last one brought tears to my eyes! An incredibly knowledgeable and experienced man with astounding modesty and willingness to pass his knowledge and passion to younger generations.

George Salazar’s Acceptance Speech

Back to Voice Recognition.

Mr Salazar explained how space missions slowly migrated over the years from ground control (with dozens of engineers involved) to vehicle control and from just 50 to 100s of buttons. This put the onus of operating all those buttons to the 4-5 person space crew, which in turn brought in speech recognition as an invaluable interface that would make good sense in such a complex environment. 

Screenshot from George Salazar’s ACIxD presentation

Factors affecting ASR accuracy in Space

He described how they have tested different Speech Recognition (ASR) software to see which fared the best, both speaker-independent and speaker-dependent. As he noted, they all claim 99% accuracy officially but that is never the case in practice! He listed many factors that affect recognition accuracy, including:

  • background noise (speaker vs background signal separation)
  • multiple speakers speaking simultaneously (esp. in such a noisy environment)
  • foreign accent recognition (e.g. Dutch crew speaking English)
  • intraspeaker speech variation due to psychological factors (as being in space can, apparently, make you depressed, which in turn affects your voice!), but presumably also to physiological factors (e.g. just having a cold)
  • Astronaut gender (low pitch in males vs high pitch in females): ASR software was designed for males, so male astronauts always had better error rates!
  • The effects of microgravity (physiological effects) on the voice quality, as already observed on the first flight (using templates from ground testing as the baseline), are impossible to separate from the environment and crew stress and can lead to a 10-30% error increase!
Screenshot from George Salazar’s ACIxD presentation

  • Even radiation can affect the ASR software, but also the hardware (computing power). As a comparison, AMAZON Alexa uses huge computer farms, whereas in Space they rely on slow “radiation-hardened” processors: they can handle the radiation, but are actually 5-10 times slower than commercial processors!
Screenshot from George Salazar’s ACIxD presentation

Solutions to Space Challenges

To counter all these negative factors, a few different approaches and methodologies have been employed:

  • on-orbit retrain capability: rendering the system adaptive to changes in voice and background noise, resulting in up to 100% accuracy
  • macro-commanding: creating shortcuts to more complex commands
  • redundacy as fallback (i.e. pressing a button as a second modality)
Screenshot from George Salazar’s ACIxD presentation

Critical considerations

One of the challenges that Mr Salazar mentioned in improving ASR accuracy is overadaptation or skewing the system to a single astronaut.

In addition, he mentioned the importance of Dialog Design in NASA’s human-centered design (HCD) Development approach. The astronauts should always be able to provide feedback to the system, particularly for error correction (Confusability leads to misrecognitions).

Screenshot from George Salazar’s ACIxD presentation
Screenshot from George Salazar’s ACIxD presentation

In closing, Mr Salazar stressed that speech recognition for Command and Control in Space applications is viable, especially in the context of a small crew navigating a complex habitat.

Moreover, he underlined the importance of trust that the ASR system needs to inspire in its users, as in this case the astronauts may literally be placing their lives onto its performance and accuracy.

Screenshot from George Salazar’s ACIxD presentation

Q & A

After Mr Salazar’s presentation, I couldn’t help but pose a couple of questions to him, given that I consider myself to be a Space junkie (and not in the sci-fi franchise sense either!).

So, I asked him to give us a few examples of the type of astronaut utterances and commands that their ASR needs to be able to recognise. Below are some such phrases:

  • zoom in, zoom out
  • tilt up, tilt down
  • pan left
  • please repeat

and their synonyms. He also mentioned the case of one astronaut who kept saying “Wow!” (How do you deal with that!)

I asked whether the system ever had to deal with ambiguity in trying to determine which component to tilt, pan or zoom. He answered that, although they do carry out plenty of confusability studies, the context is quite deterministic: the astronaut selects the monitor by going to the monitor section and speaking the associated command. Thus, there is no real ambiguity as such.

Screenshot from George Salazar’s ACIxD presentation

My second question to Mr Salazar was about the type of ASR they have gone for. I understood that the vocabulary is small and contained / unambiguous, but wasn’t sure whether they went for speaker-dependent or speaker-independent recognition in the end. He replied that the standard now is speaker-independent ASR, which however has been adapted to a small group of astronauts (i.e. “group-dependent“). Hence, all the challenges of distinguishing between different speakers with different pitch and accents, all against the background noise and the radiation and microgravity effects! They must be really busy!

It was a great pleasure to listen to the talk and an incredible and rare honour to get to speak with such an awe-inspiring pioneer in Space Engineering!

Ad Astra!

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!

Amazon Alexa Developers Conference (10 Oct, London)

7 Oct

Exciting times for Amazon Alexa!

capture

Amazon Echo

  • Alexa has crossed over 3,000 skills in the US Skill Store, up from 1,000 in June;
  • Alexa has just started shipping in the UK and is coming soon to Germany too. Echo and Echo Dot were just made available in the UK, whereas in Germany they are available by invitation for those who want to help shape Alexa as she evolves—the devices will start shipping next month.
  • Amazon has also announced a new Echo Dot,  enabled with new Echo Spatial Perception (ESP) which allows devices to determine which device a user is talking to (meaning that only one will respond, when multiple devices hear the user). The Dot will increase the number of devices Alexa can talk to in the home, creating an innovative customer experience. It will retail for £49.99 and Echo for £149.99.

Here are 2 neat little YouTube videos showing Alexa in action.

 

 

In this context, Amazon is bringing their Alexa training events to Europe in October. Hello Alexa London is on Monday 10th October. 

  • Developers, engineers, QA/testers and anyone who wants to learn how to build skills can participate in the full-day agenda from 8:30am – 4:30pm (+ Happy Hour afterwards!)
  • Business development, strategy and planning, VUX/UX /VUI, account teams, producers and project management professionals can participate in the Alexa Business Development for Agencies session later in the day from 3pm – 4:30pm (and then of course join the Happy Hour!). They can also join the breakfast session, Welcome to Alexa (a 45-minute keynote) and Hello Alexa (a 1-hour session on the basics of creating a skill: what goes into the design, build, test, publish) from 8:30am – 10:45am.
  • Click here to register (although the event is already sold out by now!)  and hope to see you there!

I am really excited at how ubiquitous speech recognition is becoming! It was already ubiquitous as we were dragging it around on our smartphones (Apple SIRI, Google Now), but now it’s penetrated our homes too, crossing over work/personal/family lives. The future is omni-channel but unimodal?!

Award to our METALOGUE Presentation Trainer!

26 Sep

Fantastic news!

Our EU project METALOGUE has won the EC-TEL 2014 Technology-Enhanced Learning Best Demo Award! Our METALOGUE Partners at the Dutch Open University (OUNL) demonstrated and won the audience over with their Presentation Trainer, a public speaking instructor which tracks and analyses the user’s body posture and movements, speaking cadence and voice volume, and provides instructional feedback on their non-verbal communication skills (sensor-based learning). Congratulations to our OUNL partners!

EC-TEL 2014 Best Demo Award

EU FP7 logo

Develop your own Android voice app!

26 Dec

Voice application Development for Android

My colleague Michael F. McTear has got a new and very topical book out! Voice Application Development for Android, co-authored with Zoraida Callejas. Apart from a hands-on step-by-step but still condensed guide to voice application development, you get the source code to develop your own Android apps for free!

Get the book here or through Amazon. And have a look at the source code here.

Exciting times ahead for do-it-yourself Android speech app development!

The AVIxD 49 VUI Tips in 45 Minutes !

6 Nov

Image

 

 

The illustrious Association for Voice Interaction Design (AVIxD) organised a Workshop in the context of SpeechTEK in August 2010, whose goal was “to provide VUI designers with as many tips as possible during the session“. Initially the goal was 30 Tips in 45 minutes. But they got overexcited and came up with a whooping 49 Tips in the end! The Session was moderated by Jenni McKienzie, and the panelists were David Attwater, Jon Bloom, Karen Kaushansky, and Julie Underdahl. This list dates back 3 years now, but it’s by no means outdated. This is the most sound advice you will find in designing better voice recognition IVRs and I hated it being buried in a PDF!

So I am audaciously plagiarising and bringing you here: the 49 VUI Tips for Better Voice User Interface Design! Or go and read the .PDF yourselves here:

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

And finally ….

Image

 

Have you got a VUI Tip you can’t find in this list that you’d like to share? Tell us here!

 

TEDxManchester 2012: Voice Recognition FTW!

12 Sep

After the extensive TEDxSalford report, and the TEDxManchester Best-of, it’s about time I posted the YouTube video of my TEDxManchester talk!

TEDxManchester took place on Monday 13th February this year at one of the iconic Manchester locations – and my “local” – the Cornerhouse. Among the luminary speakers were people I have always been admiring, such as the radio Goddess Mary Anne Hobbs, and people I have become very close friends with over the years – which has led me to an equal amount of admiration, such as Ian Forrester (@cubicgarden to most of us). You can check out their respective talks, as well as some awesome others, in my TEDxManchester report below.

My TEDxManchester talk

I spoke about the weird and wonderful world of Voice Recognition (“Voice Recognition FTW!”): from the inaccurate – and far too often funny – simple voice-to-text apps and dictation systems on your smartphones, to the most frustrating automated Call Centres, to the next generation, sophisticated SIRI and everything in-between. I explained why things go wrong and when things can go wonderfully right. The answer is “CONTEXT”; the more you have of it , the more accurate and relevant the interpretation of user intention will be, and the more relevant and impressive the system reaction / reply will be.

Here is finally my TEDxManchester video on YouTube.

And below are my TEDxManchester slides.