Tag Archives: speech system

The voice-activated lift won’t do Scottish! (Burnistoun S1E1 – ELEVEN!)

28 Jul

Voice recognition technology? …  In a lift? … In Scotland? … You ever TRIED voice recognition technology? It don’t do Scottish accents!

Today I found this little gem on Youtube and I thought I must share it, as apart from being hilarious, it says a thing or two about speech recognition and speech-activated applications. It’s all based on the urban myth that speech recognisers cannot understand regional accents, such as Scottish and Irish.

Scottish Elevator – Voice Recognition – ELEVEN!

(YouTube – Burnistoun – Series 1 , Episode 1 [ Part 1/3 ])

What? No Buttons?!

These two Scottish guys enter a lift somewhere in Scotland and find that there are no buttons for the floor selection, so they quickly realise it’s a “voice-activated elevator“, as the system calls itself. They want to go to the 11th floor and they first pronounce it the Scottish way:

/eh leh ven/

That doesn’t seem to work at all.

You need to try an American accent“, says one of them, so they try to mimic one, sadly very unsuccessfully:

/ee leh ven/

Then they try a quite funny, Cockney-like English accent:

/ä leh ven/

to no avail.

VUI Sin No. 1: Being condescending to your users

The system prompts them to “Please speak slowly and clearly“, which is exactly what they had been doing up to then in the first place! Instead, it should have said something along the lines of “I’m afraid I didn’t get that. Let’s try again.” and later “I’m really sorry, but I don’t seem to understand what you’re saying. Maybe you would like to try one more time?“. Of course, not having any buttons in the lift means that these guys could be stuck in there forever! That’s another fatal usability error: Both modalities, speech and button presses, should have been allowed to cater for different user groups (easy accents, tricky accents) and different use contexts (people who have got their hands full with carrier bags vs people who can press a button!).

I’m gonna teach you a lesson!

One of them tries to teach the system the Scottish accent: “I keep saying it until she understands Scottish!“, a very reasonable expectation, which would work particularly well with a speaker-dependent dictation system of the kind you’ve got on your PC, laptop or hand-held device. This speaker-independent one (‘cos you can’t really have your personal lift in each building you enter!) will take a bit more time to learn anything from a single conversation! It requires time analysing the recordings, their transcriptions and semantic interpretations, comparing what the system understood with what the user actually said and using those observations to tune the whole system. We are talking at least a week in most cases. They would die of dehydration and starvation by then!

VUI Sin No.2: Patronising your users until they explode

After a while, the system makes it worse by saying what no system should ever dare say to a user’s face: “Please state which floor you would like to go to in a clear and calm manner.” Patronising or what! The guys’ reaction is not surprising: “Why is it telling people to be calm?! .. cos Scottish people would be going out for MONTHS at it!“.

Well, that’s not actually true. These days off-the-shelf speech recognition software is optimised to work with most main accents in a language, yes, including Glaswegian! Millions of real-world utterances spoken by thousands of people with all possible accents in a language (and this for many different languages too) are used to statistically train the recognition software to work equally well with most of them and for most of the time. These utterances are collected from applications that are already live and running somewhere in the world for the corresponding language. The more real-world data available, the better the software can be tuned and the more accurate the recognition of “weird” pronunciations will be, even when you take the software out of the box.

VUI Best Practice: Tune your application to cater for YOUR user population

An additional safeguarding and optimising technique is tuning the pronunciations for a specific speech recognition application.  So when you already know that your system will be deployed in Scotland, you’d better add the Scottish pronunciation for each word explicitly in the recognition lexicon.  This includes manually adding /eh leh ven/ , as the standard /ee leh ven/ pronunciation is not likely to work very well. Given that applications are usually restricted to a specific domain anyway (selecting floors in a lift, getting your bank account balance, choosing departure and arrival train times etc.), this only needs to be done for the core words and phrases in your application, rather than the whole English, French, or Farsi language! So do not despair, there’s hope for freedom (of speech) even for the Scottish! 🙂

For a full transcript of the video, check out EnglishCentral.

Advertisements

The Loneliness of the long-distance … VUI Designer!

13 Jun

On Friday 11th June, I took part in the “Pathways” event organised annually by the University of Manchester Career Service to support PhD researchers as well as research staff in “making career choices, exploring future plans and discovering the breadth of opportunities available to them“. I was Guest Panellist at 3 different Sessions:

  1. Opportunities for Engineering and Physical Sciences
  2. Working as a Freelancer or Consultant and
  3. Enterprise, Entrepreneurship and Business Start Up

The University of Manchester Logo

As a University of Manchester graduate (well, technically UMIST, I felt compelled to take part in those Question and Answer panels in order to give some insight on how a career can develop: from a Bachelors in English & Linguistics in Greece, to a Masters of Science in Machine Translation and a Doctorate in Automatic Text Summarisation in the UK, to a Post-Doctoral Fellowship in Spoken Dialogue Management and a position as a Research Project Manager in Germany, to working in Industry both as a full-time employee and as an external contractor as a Voice User Interface (VUI) Designer in Germany, the UK, Switzerland, the US and further afield. It’s been a fascinating journey for sure! And I probably would never have arrived where I am now, if I hadn’t done those degrees or taken up those jobs in those specific places.

Have a look at the Guest Speaker profiles, including mine (p. 24), here: http://www.careers.manchester.ac.uk/media/media,172749,en.pdf

Some very inspiring career journeys!

I have to say I have thoroughly enjoyed the whole journey, the projects I have worked on, the people I have met on the way, the different organisational cultures I had the chance to experience. Plus, I wouldn’t change what I do now for the world! I love working as an external contractor and coming in to design speech self-service systems and voice-to-text services from scratch, or optimise existing ones, and the whole development, testing and tuning cycles:

  • writing functional specification documents
  • defining the system persona
  • drawing call flows
  • crafting system messages and coaching voice talents for the recordings
  • writing speech recognition grammars and pronunciations
  • devising and carrying out Wizard-of-Oz tests and Usability tests (including recording test subjects on video and interviewing them afterwards!)
  • transcribing and analysing phone calls
  • writing tuning reports

Everything is a lot of fun! It’s also great to be bringing in the same VUI Design processes and skills in different organisations and projects, and also getting to work at different places in the world at any one time! I love the variety of work and location of work, as well as the flexibility to work anytime and from anywhere! (Yes, working on your laptop – iPad soon – from a beach in the Caribbean is no longer a daydream but a realistic plan! :))

working on a deserted beach in the Caribbean is no longer a daydream!

Okay, it does get lonely. No gossiping in the kitchen during coffee breaks and no Christmas office parties. I still get to have probably as many face-to-face project meetings and conference calls as the average office worker though. We all have to work independently and in isolation, when analysing data or composing a report anyway. Only office workers have also got the hectic running-around of their colleagues and lots of intrusive and loud phone calls they have to unwillingly witness in silence. So my loneliness is a very content one! 😀