Best Practice in Voice User Interface Design , speech recognition applications, mobile voice-to-text applications; updates in New Technologies including Social Media
“In 1974 Donald Sherman, whose speech was limited by a neurological disorder called Moebius Syndrome, used a new-fangled device designed by John Eulenberg to dial up a pizzeria. The first call went to Dominos, which hung up. They were apparently too busy becoming a behemoth. Mercifully, a humane pizzeria – Mr. Mike’s – took the call, and history was made. It all plays out below, and we hope that Mr. Mike’s is still thriving all these years later….” (Smithsonian.com Blog)
Speech synthesis on this computer was rather slow, and it also apparently required “Yes/No” questions to just simply generate a “Yes” or a “No” too. Still, it could also synthesize other phrases, such as the pizza toppings (pepperoni and mushrooms, salami ...), the complex delivery address (the Michigan State Computer Science Department), as well as the contact number for callback. So not bad at all!
I was touched by the patience and kindness of the pizza place employee. He would patiently wait for up to 5 seconds for any answer, which must have been unnerving in itself! And now he is part of History! Good on him!! And well done to the Michigan State University‘s Artificial Language Laboratory and Dr. John Eulenberg!
2012 can easily be dubbed the year of TEDx for me, as by mid-February I had already attended two TEDx events! First up was TEDxSalford in late January, where I was just a mindblown attendee, and two weeks later it was TEDxManchester where I had the honour to be a speaker!
TEDxManchester took place on Monday 13th February this year at one of the iconic Manchester locations – and my “local” – the Cornerhouse. Among the luminary speakers were people I have always been admiring, such as the radio Goddess Mary Anne Hobbs, and people I have become very close friends with over the years – which has led me to an equal amount of admiration, such as Ian Forrester (@cubicgarden to most of us).
Here are their respective talks at TEDxManchester 2012 for you to get a taste of the atmosphere at the event and of the impact of the ideas and the immediacy of the sentiments circulated!
I spoke about the weird and wonderful world of Voice Recognition (“Voice Recognition FTW!”): from the inaccurate – and far too often funny – simple voice-to-text apps and dictation systems on your smartphones, to the most frustrating automated Call Centres, to the next generation, sophisticated SIRI and everything in-between. I explained why things go wrong and when things can go wonderfully right. The answer is “CONTEXT”; the more you have of it , the more accurate and relevant the interpretation of user intention will be, and the more relevant and impressive the system reaction / reply will be.
@ar3toul4ki 15 Feb RT @global_lingo: Maria Aretoulaki on voice recognition software. Will digital transcription ever be any good? #tedxmcr no, no it won’t
Lynne McCadden @lmccadden 14 Feb
Belated I know but many congrats to @herbkim for a fantastic TEDxMCR yesterday been thinking about some of it all day today !
Retweeted by @ar3toul4ki
TEDxManchester @TEDxManchester 14 Feb
Here’s to the #TEDxMCR speakers in Session 2 – @daveerasmus @martinsfp @ar3toul4ki @cubicgarden @brendandawes
Retweeted by @ar3toul4ki
TEDxManchester @TEDxManchester 13 Feb
Thanks to @BandXMedia all today’s #TEDxMCR talks were recorded, will be edited & put online soon #TEDxMCR @s2martin
Retweeted by @ar3toul4ki
Lynne McCadden @lmccadden 13 Feb
#tedxmcr learning about quarks and leptons from @tarashears making particle physics easy – sort of
Retweeted by @ar3toul4ki
How to Wreck a Nice Beach @TEDxManchester #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb Luke Robert Mason @LukeRobertMason
It’s a bright future if you are an algorithm or infomorph #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb Luke Robert Mason @LukeRobertMason
@RichardMichie A little bit of non-human agency can’t hurt… Or can it #TEDxMCH
Retweeted by @ar3toul4ki
In reply to RichardMichie
13 Feb Ian Forrester @cubicgarden
Infomorphs or a weaver… #TEDxMCR love the idea very cool! They could work with #perceptivemedia yfrog.com/gzeg2jij
Retweeted by @ar3toul4ki
13 Feb
Ian Pettigrew @KingfisherCoach
#TEDxMCR @skeuomorphology challenging ‘necessity is the mother of invention’; cars weren’t invented as a response to a shortage of horses!
Retweeted by @ar3toul4ki
13 Feb Luke Robert Mason @LukeRobertMason
Pure information technologies are the first evolutionary aware technologies. They are stochastic… Emerge from randomness #TEDxMCR @weavrs
Retweeted by @ar3toul4ki
13 Feb Luke Robert Mason @LukeRobertMason
Living software ‘bots’ or infomorphs via @weavrs #infomorph #TEDxMCR @skeuomorphology
Retweeted by @ar3toul4ki
13 Feb Michael Di Paola @MichaelDiPaola
Robots made from programmable gel…where the hell am I? A parallel universe, the future. No. Just at #TEDxMCR listening to Dan O’Hara
Retweeted by @ar3toul4ki
13 Feb Luke Robert Mason @LukeRobertMason
Infomorph, a form that exists just of information @skeuomorphology #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb
Luke Robert Mason @LukeRobertMason
Another type of software agent that exhibits life, @weavrs #infomorph @skeuomorphology #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb
@ar3toul4ki
@pgaval δε πειράζει, θα είναι στο YouTube για πάντα! (Μαμά! )
In reply to Petros Gavalakis
13 Feb
@ar3toul4ki
Mondays are my favourite days of the week : D
13 Feb Matthew Brooks @brooksoid
Great, great talk by @brendandawes on the value of pursuing ideas, and the ideas they spawn, without necessarily knowing where you’re going
Retweeted by @ar3toul4ki
from Manchester, Manchester
13 Feb RichardMichie @RichardMichie
Failed art at school? You can still exhibit at #moma @brendandawes #tedxmcr great story love it
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
@brendandawes’ cinema redux of Hitchcock’s Vertigo #TEDxMCR twitpic.com/8jfs95
13 Feb
sphey1 @sphey1
If you make something, give it a name – re: Cinema Redux @brendandawes #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
Things that @brendandawes has done with his 3-printer #TEDxMCR twitpic.com/8jfpn6
13 Feb
@ar3toul4ki
@brendandawes : the creative process is iterative. ( but Battling it against time & cost constraints) #TEDxMCR
13 Feb @ar3toul4ki
RT @CMindsKelly: @brendandawes. The thing we in the room all share is curiosity. That’s why we’re always making new things #TEDxMCR”
13 Feb Martin Bryant @MartinSFP
At #tedxmcr, @cubicgarden explained how @tdobson and @adew saved his life. instagr.am/p/G9BmXRStoc/
Retweeted by @ar3toul4ki
13 Feb
@ar3toul4ki
Ian Forrester: fear the fear #TEDxMCR
13 Feb Claire-Marie @CMBoggiano
‘We are complex & unique organisms And yes, I am still an atheist.’ Ian Forrester, #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
Indeed! RT @TonyChurnside: @cubicgarden really touching. Very nicely done!
13 Feb @ar3toul4ki
@brooksoid any time!
In reply to Matthew Brooks
13 Feb Matthew Brooks @brooksoid
@ar3toul4ki great talk Maria, speech recog in focus at the beeb right now, be interesting to talk once I’ve worked out what our landscape is
Retweeted by @ar3toul4ki
13 Feb TEDxManchester @TEDxManchester
Link to the funny vid played by @ar3toul4ki – Scottish voice recognition problems.. http://youtu.be/sAz_UvnUeuU
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
Ευχαριστώ! Το είδες μήπως; RT @pgaval: @ar3toul4ki Καλή επιτυχία!
13 Feb Claire-Marie @CMBoggiano
‘When I was lying in bed dying, where were the real people?’ Ian Forrester, #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb Tony Churnside @TonyChurnside
Watching @cubicgarden talk about his #brushwithdeath. A very scary time. #TEDxMCR pic.twitter.com/hED5mimw
Retweeted by @ar3toul4ki
13 Feb
@ar3toul4ki
@tdobson @cubicgarden is talking about you! : D
In reply to Tim Dobson
13 Feb Tim Dobson @tdobson
so @cubicgarden is talking about it #brushwithdeath when I may or may not have been his flatmate at the time.. #tedxman
Retweeted by @ar3toul4ki
13 Feb Matthew Brooks @brooksoid
And @cubicgarden ‘s talk is about… @cubicgarden ! He’s finally gone recursive. #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb Tony Churnside @TonyChurnside
@cubicgarden you’re looking good! pic.twitter.com/n8xvkzJB
Retweeted by @ar3toul4ki
13 Feb
Ian Forrester @cubicgarden
And next on at #TEDxMCR its @ianforrester. With the story of me…
Retweeted by @ar3toul4ki
13 Feb TEDxManchester @TEDxManchester
Hilarious talk on Voice Recognition from Dr Maria Aretoulaki #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb Tim Dobson @tdobson
@davemee it’s all about context! /cc @ar3toul4ki
Retweeted by @ar3toul4ki
In reply to Dave Mee
13 Feb Tim Dobson @tdobson
@davemee @ar3toul4ki “fetish cheese”
Retweeted by @ar3toul4ki
In reply to Dave Mee
13 Feb Dave Mee @davemee
@tdobson @ar3toul4ki feed her through siri and send me a transcript!
Retweeted by @ar3toul4ki
In reply to Tim Dobson
13 Feb Kate Towey @katiemaymanc
Fascinating talk from Tara Shears on particle physics. ’2012 is year of the Higgs’ #tedxmcr
Retweeted by @ar3toul4ki
13 Feb Ian Pettigrew @KingfisherCoach
So far at #TEDxMCR we’ve covered pursuing your passion, JDI (and make mistakes), technology, algorithms, and particle physics. I’m happy!
Retweeted by @ar3toul4ki
13 Feb Allie Johns @AllieJohns
I propose bringing back Tomorrow’s World and having Tara Shears present it #tedxmcr
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
@TaraShears @TEDxManchester: oh my Higgs! We’ve seen something! Or have we?? #TEDxMCR twitpic.com/8jdo56
13 Feb
Ian Forrester @cubicgarden
The goddamn particle explained at #TEDxMCR yfrog.com/obsv5tmj
Retweeted by @ar3toul4ki
13 Feb
@ar3toul4ki
@TaraShears @TEDxManchester: where’s that God-damned Higgs particle?! If we don’t find it, we’ll have to start all over again… #TEDxMCR
13 Feb Claire-Marie @CMBoggiano
“@lmccadden: #tedxmcr learning about quarks and leptons from @tarashears making particle physics easy – sort of”
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
@TaraShears @TEDxManchester: symmetry, simplicity, elegance = beauty of the standard model of particle physics #TEDxMCR
13 Feb TEDxManchester @TEDxManchester
Up next @TEDxManchester is @TaraShears – tune in live to ow.ly/92eRf #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
@TEDx video 1 @TEDxManchester: pragmatic chaos to describe fluid things such as culture #TEDxMCR
13 Feb @ar3toul4ki
@coralgrainger no worries sweetness : )
In reply to coralgrainger
13 Feb @ar3toul4ki
@TEDx video @TEDxManchester: what we don’t understand, we give a name and a story to #TEDxMCR
13 Feb @ar3toul4ki
@maryannehobbs you were, nay ARE, awesome! Xx
In reply to maryanne hobbs
13 Feb @ar3toul4ki
Dan O’Hara @skeuomorphology @TEDxManchester: from random relentless replication (cf. spambots) to guided transformation of chaos #TEDxMCR
13 Feb Kim Willis @KimberleyWillis
Dan O’Hara: technology is not a selection of gadgets but a body of knowledge instagr.am/p/G8sGbrBVY7/ #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb
Ian Wareing @ianwareing
#tedxmcr @skeuomorphology “Necessity is not the mother of invention. Invention is the mother of necessity”
Retweeted by @ar3toul4ki
13 Feb Ian Forrester @cubicgarden
Bloatware… or stimulation of the real on the virtual RT @maanasvarun: Skeumorphism. wait what? #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
Dan O’Hara @skeuomorphology @TEDxManchester: the creation of living technology by merging the Arts and Sciences #TEDxMCR
13 Feb @ar3toul4ki
@maryannehobbs @TEDxManchester: John Peel saving lives again #TEDxMCR
13 Feb @ar3toul4ki
@maryannehobbs @TEDxManchester: follow your passion! #TEDxMCR twitpic.com/8jcwpn
13 Feb
TEDxManchester @TEDxManchester
Hi all we’re suggesting #TEDxMCR as the hashtag for the event today as it’s a bit shorter than #TEDxManchester
Retweeted by @ar3toul4ki
13 Feb TEDxManchester @TEDxManchester
Sorry folks for the livestream #fail. We’re currently on this channel live.. bit.ly/y9kkZa #TEDxMCR
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
@gazshaw cheers!
In reply to Gaz Shaw
13 Feb @ar3toul4ki
see you there Mike! It’s been a loooong time! RT @mike_higham: @ar3toul4ki @TEDxManchester Looking forward to it #TEDxMCR
13 Feb @ar3toul4ki
@heloukee oh nooo : s
In reply to Helen Keegan
13 Feb @ar3toul4ki
Excited & honoured to be speaking @TEDxManchester today. My talk “Voice Recognition FTW!” on the present+future of user interfaces #TEDxMCR
13 Feb @ar3toul4ki
See you there Matt! RT @matthbooth: A bit of work then @TEDxManchester. Looking forward to it.
13 Feb Allie Johns @AllieJohns
“@maryannehobbs: interesting day: speaking about passion at @TEDxManchester 1pm.. ” > we can never have enough passion in our lives.
Retweeted by @ar3toul4ki
13 Feb @ar3toul4ki
Will you be my groupie?? RT @technicalfault: @ar3toul4ki Dr Maria at TEDx!
12 Feb @ar3toul4ki
Looking forward to giving #TEDxMCR an insight into the wondrous+often misconstrued world of voice recognition @TEDxManchester tomorrow
12 Feb TEDxManchester @TEDxManchester
And in other late-breaking news Dr. Maria @Ar3toul4ki will also be taking the stage tomorrow at #TEDxMCR
Retweeted by @ar3toul4ki
12 Feb TEDxManchester @TEDxManchester
A big welcome for our latest speaker @MartinSFP – European Editor @TheNextWeb for #TEDxMCR. Like @MaryAnneHobbs a brave no-slide presenter!
Retweeted by @ar3toul4ki
11 Feb Anna Nachesa @ashalynd
I’ll probably be very evil if I ask during an interview if tail-optimized recursion is possible in C. OTOH, it might be a great icebreaker:)
Retweeted by @ar3toul4ki
11 Feb @ar3toul4ki
Tonight Channel 4 is showing the programme: “Richard Wilson On Hold“, or as it would probably be dubbed “The UK stuck in IVR Hell“.
“From telephone car parking payment systems to supermarket self-service tills, Richard Wilson investigates the rise of automated services across Britain and puts the machines to the test”
Richard Wilson doesn’t like to be on Hold (Copyright Glowfrog Studios)
This will probably put my profession to shame (even if the programme is only about waiting queues), but they would have a point, as there are some horrid voice recognition self-service IVRs out there! Watch it tonight, Monday 16 January 2012, at 8-9pm GMT on Channel 4, and check my update on here once I’ve watched it myself.
Apart from the ingenuity of the title itself, encapsulating the golden rule of good user experience / usability design, you can readily see to what great lengths Bruce has gone to serve his pearls of design wisdom in a most humourous and utterly witty way. This doesn’t in any way decrease in the least the importance, relevance and truthfulness of his observations and recommendations. Bruce is a veteran designer and he has seen it all before, from the excitement and optimism to the disappointment and pessimism, to the final destination, design realism:
First we tried to make them human. Now it’s time to make them work
To get a flavour of the type of UX design advice and messages conveyed in the book, here’s an extract from Chapter 132: Will Speech Technology Ever Work? (pp. 393-395 in my 2007 edition):
In closing, I must ask the question. Will it ever work? And, of course, the answer is, yes. Speech recognition—and its related technologies (e.g., speaker verification, text-to-speech, audio indexing, speech data mining, dictation) will work. Indeed they already do. They will fill their respective application niches almost completely. And, in fact, the majority will do so quite soon. What will change is the definition of “work”.
”Speech recognition is primarily a user interface technology*. As such, it works when it disappears. It’s really that simple. When the users are not thinking about the user interface, but instead are accomplishing the task to which they are connected by the user interface, then and only then can the interface be said to be “working.” We have to stay on message with this fundamental fact if we are ever to succeed at bringing speech to the performance level where we can legitimately claim that it “works.”
True words!!! As a bonus, Leslie Degler’s illustrations perfectly complement and enhance the messages conveyed in the text, once again in the wittiest and most original manner. Buy this book ASAP! After all, if you don’t agree with its theses, you can always return it. All you need to do is:
Write out in longhand, on a separate page, “I,” and add your name, “agree that there’s not a chance in Hell any refund will ever come of this claim.” Label this statement as your “declaration.”
…
After you have received your refund, we’ll call you with an outbound IVR that asks you several hundred thought-provoking questions about your customer experience. We value your opinion—please give us your most honest and spontaneous responses. We’ll do our best to recognize them.
It says it all really!
To date, I have only met Bruce virtually, through Skype calls and the Creative Speech Technology Network (CreST) of which we are both members, and I can already tell he is a very funny, witty, creative (musical!), interesting, as well as intelligent person. So I can’t wait to meet him in person later today and hear some more fascinating stories and hilarious anecdotes from the world of speech recognition application design, voice interface usability and technology abuse!
UPDATE:
I went (to the dinner with Bruce) and (was) conquered by the brilliance and witticism of the man! I got my long-awaited autograph in his book too, as I can now prove!
“The overall emphasis of the workshop is on the contribution of cognitive science to language processing, including conceptualisation, representation, discourse processing, meaning construction, ontology building, and text mining.”
There have been NLPCS Workshops in Porto (2004), Miami (2005), Paphos (2006), Funchal (2007), Barcelona (2008), Milan (2009) and Funchal (2010).
Copenhagen Business School
This year’s 8th International NLPCS Workshop just took place this weekend in Copenhagen, Denmark (20-21 Aug 2011). The Workshop topic was: “Human-Machine Interaction in Translation“, focussing on all aspects of human and machine translation, and human-computer interaction in translation, including: translators’ experiences with CAT tools, human-machine interface design, evaluation of interactive machine translation, user simulation and human factors. Thus, the topics were approached from a number of different perspectives:
from full automation by machines for machine (traditional NLP or HLT)
semi-automated processing, i.e. machine-mediated processing (programs assisting people in their tasks),
but also simulation of human cognitive processes
I had the opportunity once again to review a few of the paper submissions and can therefore highly recommend reading the full Proceedings of the NLPCS 2011 Workshop that have just been made available.
I found particularly interesting the following 3 contributions:
Valitutti, A. “How Many Jokes are Really Funny? A New Approach to the Evaluation of Computational Humour Generators”
Nilsson, M. and J. Nivre. “Entropy-Driven Evaluation of Models of Eye Movement Control in Reading”
and
Finch, A., Song, W., Tanaka-Ishii, K. and E. Sumita. “Source Language Generation from Pictures for Machine Translation on Mobile Devices”
“improve the human condition by advancing the discipline of Interaction Design”
A very worthy cause indeed, especially since it is true that “the human condition is increasingly challenged by poor experiences. “!
Today’s Joint Workshop in New York aims to bring together interaction design practitioners from across the voice, interactive, and digital areas to identify the issues and challenges involved in speech interaction design on mobile devices, such as smartphones and tablets, and to come up by the end of the day with ways to approach them or even tackle them. A very ambitious format that, however, really does work!
And if you don’t manage to take part in today’s workshop, make sure you go to the SpeechTEK Conference and Exhibition itself that starts tomorrow and runs until Wednesday the 10th. Listen to presentations and see or even try for yourself market-ready products relating to:
multimodal applications
cross-channel applications
speech analytics
speaker identification and verification
in-car systems
natural language and say-anything technologies
speech translation
voice-enabled personal assistants
as well as the latest speech recognition techniques and technologies
I particularly recommend the Keynote Panel on “Mobility — A Game-Changer for Speech?” on Tuesday on how smartphones are dramatically changing how customers interact with businesses and with the devices themselves. Some really interesting issues and questions will be raised, such as:
* How voice user interfaces will be integrated with graphical user interfaces?
or
* Will users embrace voice as they have embraced keypads on mobile devices?
Sadly I am in the UK today and next week, so I’m going to miss it all. But if you are lucky enough to be in or near New York, make sure you go and enjoy!
And finally, on both days of the Main Conference (Wed 25 – Thu 26 May), I will be holding the free consultancy one-to-one appointments in the context of the brand new for this year Meet the Consultants Clinic. I am one of the “5 global speech tech experts” available “to discuss your speech tech needs and challenges“. Maybe you need to check out my older blog post on speech recognition (for dummies!) to get an idea of what I will be chatting about with everyone. You may also want to check out my presentation slides from last year and from 2007. Get them from these older blog posts: ““The Eternal Battle Between the VUI Designer and the Customer“ and “Does Your Customer Know What They are Signing off??“. Although you do need to pre-book, these appointments are free for registered conference delegates or Expo visitors, so I’m looking forward to meeting some of you in person!
There’s still time to sign up for the SpeechTEK Europe Conference and Free Entry Expo. Use the following link to register and we’ll see you in London next week! http://www.speechtek.com/europe2011/Registration.aspx
Here’s a quick round-up of what’s happening:
Conference Keynotes by Google‘s Engineering Director, Dave Burke, who tells SpeechTEK Europe about Google’s plans for cloud-based speech recognition, and Professor Alex Waibel who describes and demonstrates how speech technology is helping to overcome language and cultural barriers. Free entry for Expo visitors too.
Learn from over 50 global expert speakers sharing their experiences – both good and bad – and enabling you to build the ultimate multimodal experience for your customers, saving you money and improving your service.
Network with colleagues from all over the world, who have already implemented successful strategies. Companies attending include ABN Amro Bank, Apple, Barclays Bank, Microsoft, Orange, Lloyds Bank, Dell, Cap Gemini and more.
Identify, evaluate, integrate, and optimise the latest speech technology solutions from world-leading providers at SpeechTEK Europe’s Expo.
SpeechTEK Europe features over 50 speakers from around the world, and from a wide range of business environments including Google, Barclays Bank, Deutsche Telekom, Nuance, Loquendo, Openstream, Voxeo, Belgian Railways, Telecom Italia, Cable & Wireless, and Westpac.
LEARN ABOUT
Business strategies – Speech biometrics – Multichannel applications – Multilingual applications – Multimodal applications – Assistive technologies – Analytics and Measurement – Voice User Interaction design – Speech application development tools and languages – Case studies, panel discussions and more …
UPDATE
SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).
But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!
2010 saw the first SpeechTEK Conference to have taken place outside of the US, SpeechTEK Europe 2010 in London. This year’s European Conference, SpeechTEK Europe 2011, will take place again in London (25 – 26 May 2011), but this time it will be preceded on Tuesday 24th May by a special Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). The main goal of AVIxD is to bring together voice interaction and experience designers from both Industry and Academia and, among other things, to “eliminate apathy and antipathy toward the need for good design of automated voice services” (that’s my favourite!). This is the first AVIxD Workshop to take place in Europe and I am honoured to have been appointed Co-Chair alongside Caroline Leathem-Collins from EIG.
Participation is free to AVIxD members and just £25 for non-members (which may be applied towards AVIxD membership). However in order to participate in the workshop, you need to submit a brief position paper in English (approx. 500 words) on any of the special topics of interest of the Workshop (See CFP below). The deadline for electronic submissions is Friday 25 March, so you need to hurry if you want to be part of it!
Tuesday, 24 May 2011 (just prior to SpeechTEK Europe 2011), 1 – 7 PM
London, England
The Association for Voice Interaction Design (AVIxD) invites you to join us for our first voice interaction design workshop held in Europe, Cross-linguistic & Cross-cultural Voice Interaction Design. The AVIxD workshop is a hands-on day-long session in which voice user interface practitioners come together to debate a topic of interest to the speech community. The workshop is a unique opportunity for them to meet with their peers and delve deeply into a single topic.
As in previous years with the AVIxD Workshops held in the US, we will write papers based on our discussions which we will then publish on www.avixd.org. Please visit our website to see papers from previous workshops, and for more details on the purpose of the organization and how you can be part of it.
In order to participate in the workshop, individuals must submit a position paper of approximately 500 words in English. Possible topics to touch upon in your submission (to be discussed in depth during the workshop) include:
Language choice and user demographics
Presentation of the language options to the caller and caller preference
Creation and (co-)maintenance of dialogue designs, grammars, prompts across languages
Political and sociolinguistic issues in system prompt choices and recognition grammars, such as code-switching, formal versus informal registers
Guidelines for application localization, translation, and interpretation
Setting expectations regarding availability of multilingual agents, Language- and culture-sensitive persona definition
Coordinating usability testing and tuning across diverse linguistic / cultural groups
Language choice and modality preference
We always encourage the use of specific examples from applications you’ve worked on in your position paper.
Participation is free to AVIxD members; non-members will be charged £25, which may be applied towards AVIxD membership at the workshop. Please submit your position papers via email no later than Friday 25 March 2011 to cfp@avixd.org. Letters of acceptance will be sent out on 30 March 2011.
We look forward to engaging with the European speech design community to discuss the particular challenges of designing speech solutions for users from diverse linguistic and cultural backgrounds. Feel free to contact either of the co-chairs below, if you have any questions.
Maria Aretoulaki, DialogCONNECTION Ltd (maria {at} dialogconnection {dot} com)
UPDATE
SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).
But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!
I promised some time ago to put up the slides of my presentation at this year’s SpeechTEK Europe 2010 in London, the first SpeechTEK to have taken place outside of the US. My presentation, “The Eternal Battle Between the VUI Designer and the Customer“, was on Wednesday 26th May 2010 and opened the “Voice User Interface Design: Major Issues” Session. It went down really well, and I had afterwards several people in the audience tell me about their own experience and asking me for tips on how to deal with similar issues.
Here is a PDF with the presentation slides:
Maria Aretoulaki – “The Eternal Battle Between the VUI Designer and the Customer” (SpeechTEK Europe 2010 presentation)
VUI Design is preoccupied with the conception, the design, the implementation, the testing, and the tuning of solutions that work in the most efficient, secure and non-irritating for the user manner. Well, realistically that’s what VUI Design can achieve. In an ideal world, the VUI Designer would actually strive to create speech applications that – apart from taking into consideration the customer’s financial and brand requirements – would also fit the caller’s needs, goals and preferences. The initial Requirements analysis should bring both in focus. So much is already known and accepted both amidst the VUI Designers and the customers.
The problems start just after they all leave the meeting room and start working on the implementation: Call flow design, system persona development and prompt crafting, but even recognition grammars, all seem to fall victim of a war of words and attitudes between the VUI Design expert who has seen systems being developed and spurned before, and the customer with his tech-savvy business team and their technical architects and programming geniuses, who all think they know what callers want and how call flows should be structured, prompt wording crafted and grammars written, just because they have got strong opinions! Even the results of Usability tests are liable to different interpretations by each side.
This presentation pinpoints common pitfalls in the communication between a VUI Designer and customer employees and recommends ways to resolve conflicts and disagreements on the application design and implementation.
Credits:
SpeechTEK Europe 2010 was organised by:
Information Today, Inc. 143 Old Marlton Pike Medford NJ 08055 U.S.A. Phone 1 (609) 654-6266. http://www.infotoday.com
“Voice recognition technology? … In a lift? … In Scotland? … You ever TRIED voice recognition technology? It don’t do Scottish accents!“
Today I found this little gem on Youtube and I thought I must share it, as apart from being hilarious, it says a thing or two about speech recognition and speech-activated applications. It’s all based on the urban myth that speech recognisers cannot understand regional accents, such as Scottish and Irish.
Scottish Elevator – Voice Recognition – ELEVEN!
(YouTube – Burnistoun – Series 1 , Episode 1 [ Part 1/3 ])
What? No Buttons?!
These two Scottish guys enter a lift somewhere in Scotland and find that there are no buttons for the floor selection, so they quickly realise it’s a “voice-activated elevator“, as the system calls itself. They want to go to the 11th floor and they first pronounce it the Scottish way:
/eh leh ven/
That doesn’t seem to work at all.
“You need to try an American accent“, says one of them, so they try to mimic one, sadly very unsuccessfully:
/ee leh ven/
Then they try a quite funny, Cockney-like English accent:
/ä leh ven/
to no avail.
VUI Sin No. 1: Being condescending to your users
The system prompts them to “Please speak slowly and clearly“, which is exactly what they had been doing up to then in the first place! Instead, it should have said something along the lines of “I’m afraid I didn’t get that. Let’s try again.” and later “I’m really sorry, but I don’t seem to understand what you’re saying. Maybe you would like to try one more time?“. Of course, not having any buttons in the lift means that these guys could be stuck in there forever! That’s another fatal usability error: Both modalities, speech and button presses, should have been allowed to cater for different user groups (easy accents, tricky accents) and different use contexts (people who have got their hands full with carrier bags vs people who can press a button!).
I’m gonna teach you a lesson!
One of them tries to teach the system the Scottish accent: “I keep saying it until she understands Scottish!“, a very reasonable expectation, which would work particularly well with aspeaker-dependent dictation system of the kind you’ve got on your PC, laptop or hand-held device. This speaker-independent one (‘cos you can’t really have your personal lift in each building you enter!) will take a bit more time to learn anything from a single conversation! It requires time analysing the recordings, their transcriptions and semantic interpretations, comparing what the system understood with what the user actually said and using those observations to tune the whole system. We are talking at least a week in most cases. They would die of dehydration and starvation by then!
VUI Sin No.2: Patronising your users until they explode
After a while, the system makes it worse by saying what no system should ever dare say to a user’s face: “Please state which floor you would like to go to in a clear and calm manner.” Patronising or what! The guys’ reaction is not surprising: “Why is it telling people to be calm?! .. cos Scottish people would be going out for MONTHS at it!“.
Well, that’s not actually true. These daysoff-the-shelf speech recognition software is optimised to work with most main accents in a language, yes, including Glaswegian! Millions of real-world utterances spoken by thousands of people with all possible accents in a language (and this for many different languages too) are used to statistically train the recognition software to work equally well with most of them and for most of the time. These utterances are collected from applications that are already live and running somewhere in the world for the corresponding language. The more real-world data available, the better the software can be tuned and the more accurate the recognition of “weird” pronunciations will be, even when you take the software out of the box.
VUI Best Practice: Tune your application to cater for YOUR user population
An additional safeguarding and optimising technique is tuning the pronunciations for a specific speech recognition application. So when you already know that your system will be deployed in Scotland, you’d better add the Scottish pronunciation for each word explicitly in the recognition lexicon. This includes manually adding /eh leh ven/ , as the standard /ee leh ven/ pronunciation is not likely to work very well. Given that applications are usually restricted to a specific domain anyway (selecting floors in a lift, getting your bank account balance, choosing departure and arrival train times etc.), this only needs to be done for the core words and phrases in your application, rather than the whole English, French, or Farsi language! So do not despair, there’s hope for freedom (of speech) even for the Scottish!
For a full transcript of the video, check out EnglishCentral.
Are Call Centres the factories of the 21st Century? (with behind-the-scenes insights from current Call Centre Agents)
bbc.co.uk/news/magazine-…3 days ago
@getkanban I met your dad on his UK travels and he recommended your software management tools to me! Say hello to him :) (Maria) 1 week ago