Tag Archives: speech

A speech recognition user interface works when it … disappears!

25 Oct

Today is a big day for me! I’m finally getting to meet in person one of the Coryphées of the VUI Design World (even though as far as I know he’s not a ballet dancer), Bruce Balentine of the Enterprise Integration Group (EIG).  Bruce is the author of one of the best books ever written on IVR / Speech applications / Voice User Interface Design, It’s Better to Be a Good Machine Than a Bad Person – Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age.

Apart from the ingenuity of the title itself, encapsulating the golden rule of good user experience / usability design, you can readily see to what great lengths Bruce has gone to serve his pearls of design wisdom in a most humourous and utterly witty way. This doesn’t in any way decrease in the least the importance, relevance and truthfulness of his observations and recommendations. Bruce is a veteran designer and he has seen it all before, from the excitement and optimism to the disappointment and pessimism, to the final destination, design realism:

First we tried to make them human. Now it’s time to make them work 

To get a flavour of the type of UX design advice and messages conveyed in the book, here’s an extract from Chapter  132: Will Speech Technology Ever Work? (pp. 393-395 in my 2007 edition):

In closing, I must ask the question. Will it ever work? And, of course, the answer is, yes. Speech recognition—and its related technologies (e.g., speaker verification, text-to-speech, audio indexing, speech data mining, dictation) will work. Indeed they already do. They will fill their respective application niches almost completely. And, in fact, the majority will do so quite soon. What will change is the definition of “work”.

Speech recognition is primarily a user interface technology*. As such, it works when it disappears. It’s really that simple. When the users are not thinking about the user interface, but instead are accomplishing the task to which they are connected by the user interface, then and only then can the interface be said to be “working.” We have to stay on message with this fundamental fact if we are ever to succeed at bringing speech to the performance level where we can legitimately claim that it “works.”

True words!!! As a bonus,  Leslie Degler’s illustrations perfectly complement and enhance the messages conveyed in the text, once again in the wittiest and most original manner.  Buy this book ASAP! After all, if you don’t agree with its theses, you can always return it. All you need to do is:

Write out in longhand, on a separate page, “I,” and add your name, “agree that there’s not a chance in Hell any refund will ever come of this claim.” Label this statement as your “declaration.”  

After you have received your refund, we’ll call you with an outbound IVR that asks you several hundred thought-provoking questions about your customer experience. We value your opinion—please give us your most honest and spontaneous responses. We’ll do our best to recognize them

It says it all really! 🙂

To date, I have only met Bruce virtually, through Skype calls and the Creative Speech Technology Network (CreST) of which we are both members, and I can already tell he is a very funny, witty, creative (musical!),  interesting, as well as intelligent person. So I can’t wait to meet him in person later today and hear some more fascinating stories and hilarious anecdotes from the world of speech recognition application design, voice interface usability and technology abuse!

UPDATE:

I went (to the dinner with Bruce) and (was) conquered by the brilliance and witticism of the man! I got my long-awaited autograph in his book too, as I can now prove!

SpeechTEK Europe 2011 – The Voice Solutions Showcase

20 May

(update at the end)

SpeechTEK Europe 2011 takes place in London next week (25 – 26 May 2011, Copthorne Tara Hotel, London, UK) and I am participating very actively! Firstly, I am co-chairing the Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I have already written a blog post on that. Then, I will be presenting the outcome of our discussions at the Workshop in the Main SpeechTEK Conference itself, on Wednesday 25th May (2:45 p.m. – 3:30 p.m) during Session B104: Speech organisations speak out. It should be a challenge as the Workshop runs from 1-7pm the previous day, so I will have a very busy evening after dinner trying to prepare a coherent and comprehensive presentation!!

And finally, on both days of the Main Conference (Wed 25 – Thu 26 May), I will be holding the free consultancy one-to-one appointments in the context of the brand new for this year Meet the Consultants Clinic.  I am one of the “5 global speech tech experts” available “to discuss your speech tech needs and challenges“. Maybe you need to check out my older blog post on speech recognition (for dummies!) to get an idea of what I will be chatting about with everyone. You may also want to check out my presentation slides from last year and from 2007. Get them from these older blog posts: ““The Eternal Battle Between the VUI Designer and the Customer“ and “Does Your Customer Know What They are Signing off??“. Although you do need to pre-book, these appointments are free for registered conference delegates or Expo visitors, so I’m looking forward to meeting some of you in person!

There’s still time to sign up for the SpeechTEK Europe Conference and Free Entry Expo. Use the following link to register and we’ll see you in London next week! http://www.speechtek.com/europe2011/Registration.aspx

Here’s a quick round-up of what’s happening:

  • Conference Keynotes by Google‘s Engineering Director, Dave Burke, who tells SpeechTEK Europe about Google’s plans for cloud-based speech recognition, and Professor Alex Waibel who describes and demonstrates how speech technology is helping to overcome language and cultural barriers. Free entry for Expo visitors too.
  • Learn from over 50 global expert speakers sharing their experiences – both good and bad – and enabling you to build the ultimate multimodal experience for your customers, saving you money and improving your service.
  • Network with colleagues from all over the world, who have already implemented successful strategies. Companies attending include ABN Amro Bank, Apple, Barclays Bank, Microsoft, Orange, Lloyds Bank, Dell, Cap Gemini and more.
  • Identify, evaluate, integrate, and optimise the latest speech technology solutions from world-leading providers at SpeechTEK Europe’s Expo.

SpeechTEK Europe features over 50 speakers from around the world, and from a wide range of business environments including Google, Barclays Bank, Deutsche Telekom, Nuance, Loquendo, Openstream, Voxeo, Belgian Railways, Telecom Italia, Cable & Wireless, and Westpac.

LEARN ABOUT

Business strategies – Speech biometrics – Multichannel applications – Multilingual applications – Multimodal applications – Assistive technologies – Analytics and Measurement – Voice User Interaction design – Speech application development tools and languages – Case studies, panel discussions and more …

UPDATE

SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).

But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!

Get the slides in PDF here:  Maria Aretoulaki – SpeechTEK Europe 2011 presentation.

FutureEverything 2011 – The Future is now (here in Manchester!)

12 May

Today saw the launch of the very interdisciplinary (some would say “transdisciplinary” even) FutureEverything Festival (previously Futuresonic) , a long-running and world-renowned annual Conference and Festival of Technology and Innovation, Art and Music running from the 11th to the 14th May in Manchester , UK (@FuturEverything #futr).  Apart from the annual May events,

FutureEverything creates year-round Digital Innovation projects that combine creativity, participation and new technologies to deliver elegant business and research solutions.   In 2010 we launched the FutureEverything Award, an international prize for artworks, social innovations or software and technology projects that bring the future into the present.

I have always made a point to attend at least one music or art event every year since 2007 (when the Festival was still called Futuresonic) and I have always been particularly interested in the forward-thinking Digital Technologies Conference.  So I was over the moon when I was invited to participate in the Conference and informally share my words of wisdom on speech and language technologies for emotional computing. Armed with my complimentary Festival Pass, I am now really looking forward to 2 days (Thu 12 – Fri 13 May 2011) packed with presentations, discussions and debates on: Urban Games and Virtual Identities, Robots  and Smart Cities, open data and participatory democracy. community-serving Geeks and Hackers, Open source software and citizen inclusion, and one of my favourites, emotional computing: making human-computer interfaces personable, engaging and persuasive and interaction with them more intuitive and even fun.

The FutureEverything Conference is brainstorming on a massive scale. Combined with all the live Twitter updates and feeds, it is going to have once again viral impact worldwide with the novel, brave and infectious ideas that will be coming out of it and around it. At the same time, the use of dynamic and democratic microblogging will allow massive participation to the Conference by people on both sides of the Atlantic who are not physically present but are still listening and virtually and remotely contributing their feedback and ideas. In fact, the FutureEverything Festival and the Conference are quintessential instantiations of the perfect balance of online – offline, virtual and real, local and remote, one-to-many / many-to-one broadcasting. And I’m right in the middle of this awesome time-space continuum (May 2011 in Manchester UK)! 🙂

Update (Sun 15 May):

There is now a FutureEverything Festival Portal with a compilation of blog posts, photos, audio, video and more related to the 4 days of the Festival and Conference. Check it out here: http://www.fe-2011.org/

I will also be adding my feedback on what I heard at the Conference in the next couple of days.

Cross-linguistic & Cross-cultural Voice Interaction Design

31 Jan

(update at the end)

2010 saw the first SpeechTEK Conference to have taken place outside of the US, SpeechTEK Europe 2010 in London. This year’s European Conference, SpeechTEK Europe 2011, will take place again in London (25 – 26 May 2011), but this time it will be preceded on Tuesday 24th May by a special Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). The main goal of AVIxD is to bring together voice interaction and experience designers from both Industry and Academia and, among other things, to “eliminate apathy and antipathy toward the need for good design of automated voice services” (that’s my favourite!). This is the first AVIxD Workshop to take place in Europe and I am honoured to have been appointed Co-Chair alongside Caroline Leathem-Collins from EIG.

Participation is free to AVIxD members and just £25 for non-members (which may be applied towards AVIxD membership). However in order to participate in the workshop, you need to submit a brief position paper in English (approx. 500 words) on any of the special topics of interest of the Workshop (See CFP below). The deadline for electronic submissions is Friday 25 March, so you need to hurry if you want to be part of it!

Here’s the full Call for (Position) Papers from the AVIxD site:

Call for Position Papers

First European AVIxD Workshop

Cross-linguistic & Cross-cultural Voice Interaction Design

Tuesday, 24 May 2011 (just prior to SpeechTEK Europe 2011), 1 – 7 PM

London, England

The Association for Voice Interaction Design (AVIxD) invites you to join us for our first voice interaction design workshop held in Europe, Cross-linguistic & Cross-cultural Voice Interaction Design. The AVIxD workshop is a hands-on day-long session in which voice user interface practitioners come together to debate a topic of interest to the speech community. The workshop is a unique opportunity for them to meet with their peers and delve deeply into a single topic.

As in previous years with the AVIxD Workshops held in the US, we will write papers based on our discussions which we will then publish on www.avixd.org. Please visit our website to see papers from previous workshops, and for more details on the purpose of the organization and how you can be part of it.

In order to participate in the workshop, individuals must submit a position paper of approximately 500 words in English. Possible topics to touch upon in your submission (to be discussed in depth during the workshop) include:

  1. Language choice and user demographics
  2. Presentation of the language options to the caller and caller preference
  3. Creation and (co-)maintenance of dialogue designs, grammars, prompts across languages
  4. Political and sociolinguistic issues in system prompt choices and recognition grammars, such as code-switching, formal versus informal registers
  5. Guidelines for application localization, translation, and interpretation
  6. Setting expectations regarding availability of multilingual agents, Language- and culture-sensitive persona definition
  7. Coordinating usability testing and tuning across diverse linguistic / cultural groups
  8. Language choice and modality preference

We always encourage the use of specific examples from applications you’ve worked on in your position paper.

Participation is free to AVIxD members; non-members will be charged £25, which may be applied towards AVIxD membership at the workshop. Please submit your position papers via email no later than Friday 25 March 2011 to cfp@avixd.org. Letters of acceptance will be sent out on 30 March 2011.

We look forward to engaging with the European speech design community to discuss the particular challenges of designing speech solutions for users from diverse linguistic and cultural backgrounds. Feel free to contact either of the co-chairs below, if you have any questions.

Caroline Leathem-Collins, EIG  (caroline {at} eiginc {dot} com)

Maria Aretoulaki, DialogCONNECTION Ltd (maria {at} dialogconnection {dot} com)

UPDATE

SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).

But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!

Get the slides in PDF here:  Maria Aretoulaki – SpeechTEK Europe 2011 presentation.

The voice-activated lift won’t do Scottish! (Burnistoun S1E1 – ELEVEN!)

28 Jul

Voice recognition technology? …  In a lift? … In Scotland? … You ever TRIED voice recognition technology? It don’t do Scottish accents!

Today I found this little gem on Youtube and I thought I must share it, as apart from being hilarious, it says a thing or two about speech recognition and speech-activated applications. It’s all based on the urban myth that speech recognisers cannot understand regional accents, such as Scottish and Irish.

Scottish Elevator – Voice Recognition – ELEVEN!

(YouTube – Burnistoun – Series 1 , Episode 1 [ Part 1/3 ])

What? No Buttons?!

These two Scottish guys enter a lift somewhere in Scotland and find that there are no buttons for the floor selection, so they quickly realise it’s a “voice-activated elevator“, as the system calls itself. They want to go to the 11th floor and they first pronounce it the Scottish way:

/eh leh ven/

That doesn’t seem to work at all.

You need to try an American accent“, says one of them, so they try to mimic one, sadly very unsuccessfully:

/ee leh ven/

Then they try a quite funny, Cockney-like English accent:

/ä leh ven/

to no avail.

VUI Sin No. 1: Being condescending to your users

The system prompts them to “Please speak slowly and clearly“, which is exactly what they had been doing up to then in the first place! Instead, it should have said something along the lines of “I’m afraid I didn’t get that. Let’s try again.” and later “I’m really sorry, but I don’t seem to understand what you’re saying. Maybe you would like to try one more time?“. Of course, not having any buttons in the lift means that these guys could be stuck in there forever! That’s another fatal usability error: Both modalities, speech and button presses, should have been allowed to cater for different user groups (easy accents, tricky accents) and different use contexts (people who have got their hands full with carrier bags vs people who can press a button!).

I’m gonna teach you a lesson!

One of them tries to teach the system the Scottish accent: “I keep saying it until she understands Scottish!“, a very reasonable expectation, which would work particularly well with a speaker-dependent dictation system of the kind you’ve got on your PC, laptop or hand-held device. This speaker-independent one (‘cos you can’t really have your personal lift in each building you enter!) will take a bit more time to learn anything from a single conversation! It requires time analysing the recordings, their transcriptions and semantic interpretations, comparing what the system understood with what the user actually said and using those observations to tune the whole system. We are talking at least a week in most cases. They would die of dehydration and starvation by then!

VUI Sin No.2: Patronising your users until they explode

After a while, the system makes it worse by saying what no system should ever dare say to a user’s face: “Please state which floor you would like to go to in a clear and calm manner.” Patronising or what! The guys’ reaction is not surprising: “Why is it telling people to be calm?! .. cos Scottish people would be going out for MONTHS at it!“.

Well, that’s not actually true. These days off-the-shelf speech recognition software is optimised to work with most main accents in a language, yes, including Glaswegian! Millions of real-world utterances spoken by thousands of people with all possible accents in a language (and this for many different languages too) are used to statistically train the recognition software to work equally well with most of them and for most of the time. These utterances are collected from applications that are already live and running somewhere in the world for the corresponding language. The more real-world data available, the better the software can be tuned and the more accurate the recognition of “weird” pronunciations will be, even when you take the software out of the box.

VUI Best Practice: Tune your application to cater for YOUR user population

An additional safeguarding and optimising technique is tuning the pronunciations for a specific speech recognition application.  So when you already know that your system will be deployed in Scotland, you’d better add the Scottish pronunciation for each word explicitly in the recognition lexicon.  This includes manually adding /eh leh ven/ , as the standard /ee leh ven/ pronunciation is not likely to work very well. Given that applications are usually restricted to a specific domain anyway (selecting floors in a lift, getting your bank account balance, choosing departure and arrival train times etc.), this only needs to be done for the core words and phrases in your application, rather than the whole English, French, or Farsi language! So do not despair, there’s hope for freedom (of speech) even for the Scottish! 🙂

For a full transcript of the video, check out EnglishCentral.

The Loneliness of the long-distance … VUI Designer!

13 Jun

On Friday 11th June, I took part in the “Pathways” event organised annually by the University of Manchester Career Service to support PhD researchers as well as research staff in “making career choices, exploring future plans and discovering the breadth of opportunities available to them“. I was Guest Panellist at 3 different Sessions:

  1. Opportunities for Engineering and Physical Sciences
  2. Working as a Freelancer or Consultant and
  3. Enterprise, Entrepreneurship and Business Start Up

The University of Manchester Logo

As a University of Manchester graduate (well, technically UMIST, I felt compelled to take part in those Question and Answer panels in order to give some insight on how a career can develop: from a Bachelors in English & Linguistics in Greece, to a Masters of Science in Machine Translation and a Doctorate in Automatic Text Summarisation in the UK, to a Post-Doctoral Fellowship in Spoken Dialogue Management and a position as a Research Project Manager in Germany, to working in Industry both as a full-time employee and as an external contractor as a Voice User Interface (VUI) Designer in Germany, the UK, Switzerland, the US and further afield. It’s been a fascinating journey for sure! And I probably would never have arrived where I am now, if I hadn’t done those degrees or taken up those jobs in those specific places.

Have a look at the Guest Speaker profiles, including mine (p. 24), here: http://www.careers.manchester.ac.uk/media/media,172749,en.pdf

Some very inspiring career journeys!

I have to say I have thoroughly enjoyed the whole journey, the projects I have worked on, the people I have met on the way, the different organisational cultures I had the chance to experience. Plus, I wouldn’t change what I do now for the world! I love working as an external contractor and coming in to design speech self-service systems and voice-to-text services from scratch, or optimise existing ones, and the whole development, testing and tuning cycles:

  • writing functional specification documents
  • defining the system persona
  • drawing call flows
  • crafting system messages and coaching voice talents for the recordings
  • writing speech recognition grammars and pronunciations
  • devising and carrying out Wizard-of-Oz tests and Usability tests (including recording test subjects on video and interviewing them afterwards!)
  • transcribing and analysing phone calls
  • writing tuning reports

Everything is a lot of fun! It’s also great to be bringing in the same VUI Design processes and skills in different organisations and projects, and also getting to work at different places in the world at any one time! I love the variety of work and location of work, as well as the flexibility to work anytime and from anywhere! (Yes, working on your laptop – iPad soon – from a beach in the Caribbean is no longer a daydream but a realistic plan! :))

working on a deserted beach in the Caribbean is no longer a daydream!

Okay, it does get lonely. No gossiping in the kitchen during coffee breaks and no Christmas office parties. I still get to have probably as many face-to-face project meetings and conference calls as the average office worker though. We all have to work independently and in isolation, when analysing data or composing a report anyway. Only office workers have also got the hectic running-around of their colleagues and lots of intrusive and loud phone calls they have to unwillingly witness in silence. So my loneliness is a very content one! 😀