Archive | Mobile Apps RSS feed for this section

TEDxManchester (13 Feb 2012): Voice Recognition FTW!

15 May

2012 can easily be dubbed the year of TEDx for me, as by mid-February I had already attended two TEDx events! First up was TEDxSalford in late January, where I was just a mindblown attendee, and two weeks later it was TEDxManchester where I had the honour to be a speaker!

TEDxManchester took place on Monday 13th February this year at one of the iconic Manchester locations – and my “local” – the Cornerhouse.  Among the luminary speakers were people I have always been admiring, such as the radio Goddess Mary Anne Hobbs, and people I have become very close friends with over the years – which has led me to an equal amount of admiration, such as Ian Forrester (@cubicgarden to most of us).

Here are their respective talks at TEDxManchester 2012 for you to get a taste of the atmosphere at the event and of the impact of the ideas and the immediacy of the sentiments circulated!

Mary Anne Hobbs

Ian Forrester

 My TEDxManchester talk

I spoke about the weird and wonderful world of Voice Recognition (“Voice Recognition FTW!”): from the inaccurate – and far too often funny – simple voice-to-text apps and dictation systems on your smartphones, to the most frustrating automated Call Centres, to the next generation, sophisticated SIRI and everything in-between. I explained why things go wrong and when things can go wonderfully right. The answer is “CONTEXT”; the more you have of it , the more accurate and relevant the interpretation of user intention will be, and the more relevant and impressive the system reaction / reply will be.   Here are my TEDxManchester slides.

My TEDxManchester video hasn’t been uploaded yet, as it streamed copyrighted material from YouTube. So in the meantime, I am including the offending video here :)

Burnistoun S1E1 – ELEVEN! 

The (Re)Tweets

(in reverse chronological order)

Maria Ar3toul4ki @ar3toul4ki 17 Feb

thanks for the #TEDxMCR piccie @cubicgarden! http://farm8.staticflickr.com/7050/6875061121_69555f7eb3_b.jpg @TEDxManchester

Cornerhouse @CornerhouseMcr 16 Feb

For those of you who missed #TEDxMCR check out @cubicgarden’s pics! Videos should be with us in a couple of weeks http://www.flickr.com/photos/cubicgarden/tags/tedxmcr/

Retweeted by Maria Ar3toul4ki

Martin Williams @ukcopywriting 15 Feb

@ar3toul4ki ‘s Next level awesome epic bio - http://www.tedxmanchester.com/#speakers #TEDxMCR
Retweeted by Maria Ar3toul4ki
In reply to Maria Ar3toul4ki

Maria Ar3toul4ki @ar3toul4ki 15 Feb

What an awesome (wicked, epic) bio the cool guys and girls @TEDxManchester have written for me!!! http://www.tedxmanchester.com/#speakers #TEDxMCR

Maria Ar3toul4ki @ar3toul4ki 15 Feb
RT @global_lingo: Maria Aretoulaki on voice recognition software. Will digital transcription ever be any good? #tedxmcr no, no it won’t

Lynne McCadden @lmccadden 14 Feb
Belated I know but many congrats to @herbkim for a fantastic TEDxMCR yesterday been thinking about some of it all day today !
Retweeted by Maria Ar3toul4ki

TEDxManchester @TEDxManchester 14 Feb
Here’s to the #TEDxMCR speakers in Session 2 – @daveerasmus @martinsfp @ar3toul4ki @cubicgarden @brendandawes
Retweeted by Maria Ar3toul4ki

TEDxManchester @TEDxManchester 13 Feb
Thanks to @BandXMedia all today’s #TEDxMCR talks were recorded, will be edited & put online soon :-) #TEDxMCR @s2martin
Retweeted by Maria Ar3toul4ki

Lynne McCadden @lmccadden 13 Feb

#tedxmcr learning about quarks and leptons from @tarashears making particle physics easy – sort of
Retweeted by Maria Ar3toul4ki

Lynne McCadden @lmccadden 13 Feb

watching this @TEDxManchester kevin slavin’s TED talk on how algorithms shape our world:  http://www.ted.com/talks/kevin_slavin_how_algorithms_shape_our_world.html
Retweeted by Maria Ar3toul4ki

Dr Marieke Navin ‏ @lisamarieke

depends if fitting gaussians to your data is your thing… Question is, do you understand your data?!
Retweeted by Maria Ar3toul4ki
13 Feb
Maria Ar3toul4ki ‏ @ar3toul4ki

Oh yes! © Bruce Balentine RT @LukeRobertMason: “It’s better to be a good Machine than a bad person” Discuss? #TEDxMCR
13 Feb
Luke Robert Mason ‏ @LukeRobertMason

How to Wreck a Nice Beach @TEDxManchester #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
Luke Robert Mason ‏ @LukeRobertMason

It’s a bright future if you are an algorithm or infomorph #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
Luke Robert Mason ‏ @LukeRobertMason

@RichardMichie A little bit of non-human agency can’t hurt… Or can it ;) #TEDxMCH
Retweeted by Maria Ar3toul4ki
In reply to RichardMichie
13 Feb
 Ian Forrester ‏ @cubicgarden 

Infomorphs or a weaver… #TEDxMCR love the idea :-) very cool! They could work with #perceptivemedia yfrog.com/gzeg2jij
Retweeted by Maria Ar3toul4ki
13 Feb


Ian Pettigrew ‏ @KingfisherCoach

#TEDxMCR @skeuomorphology challenging ‘necessity is the mother of invention’; cars weren’t invented as a response to a shortage of horses!
Retweeted by Maria Ar3toul4ki
13 Feb
Luke Robert Mason ‏ @LukeRobertMason 

Pure information technologies are the first evolutionary aware technologies. They are stochastic… Emerge from randomness #TEDxMCR @weavrs
Retweeted by Maria Ar3toul4ki
13 Feb
Luke Robert Mason ‏ @LukeRobertMason 

Living software ‘bots’ or infomorphs via @weavrs #infomorph #TEDxMCR @skeuomorphology
Retweeted by Maria Ar3toul4ki
13 Feb
Michael Di Paola ‏ @MichaelDiPaola

Robots made from programmable gel…where the hell am I? A parallel universe, the future. No. Just at #TEDxMCR listening to Dan O’Hara
Retweeted by Maria Ar3toul4ki
13 Feb
 Luke Robert Mason ‏ @LukeRobertMason 

Infomorph, a form that exists just of information @skeuomorphology #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb

Luke Robert Mason ‏ @LukeRobertMason 

Another type of software agent that exhibits life, @weavrs #infomorph @skeuomorphology #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb

Maria Ar3toul4ki ‏ @ar3toul4ki 

@pgaval δε πειράζει, θα είναι στο YouTube για πάντα! (Μαμά! )
In reply to Petros Gavalakis
13 Feb

Maria Ar3toul4ki ‏ @ar3toul4ki 

Mondays are my favourite days of the week : D
13 Feb
 Matthew Brooks ‏ @brooksoid 

Great, great talk by @brendandawes on the value of pursuing ideas, and the ideas they spawn, without necessarily knowing where you’re going
Retweeted by Maria Ar3toul4ki
from Manchester, Manchester
13 Feb
 RichardMichie ‏ @RichardMichie 

Failed art at school? You can still exhibit at #moma @brendandawes #tedxmcr great story love it
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@brendandawes’ cinema redux of Hitchcock’s Vertigo #TEDxMCR twitpic.com/8jfs95
13 Feb

sphey1 ‏ @sphey1

If you make something, give it a name – re: Cinema Redux @brendandawes #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Things that @brendandawes has done with his 3-printer #TEDxMCR twitpic.com/8jfpn6
13 Feb


 Maria Ar3toul4ki ‏ @ar3toul4ki 

@brendandawes : the creative process is iterative. ( but Battling it against time & cost constraints) #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

RT @CMindsKelly: @brendandawes. The thing we in the room all share is curiosity. That’s why we’re always making new things #TEDxMCR”
13 Feb
 Martin Bryant ‏ @MartinSFP 

At #tedxmcr, @cubicgarden explained how @tdobson and @adew saved his life. instagr.am/p/G9BmXRStoc/
Retweeted by Maria Ar3toul4ki
13 Feb

Maria Ar3toul4ki ‏ @ar3toul4ki

Ian Forrester: fear the fear #TEDxMCR
13 Feb
 Claire-Marie ‏ @CMBoggiano

‘We are complex & unique organisms And yes, I am still an atheist.’ Ian Forrester, #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Indeed! RT @TonyChurnside: @cubicgarden really touching. Very nicely done!
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@brooksoid any time!
In reply to Matthew Brooks
13 Feb
 Matthew Brooks ‏ @brooksoid 

@ar3toul4ki great talk Maria, speech recog in focus at the beeb right now, be interesting to talk once I’ve worked out what our landscape is
Retweeted by Maria Ar3toul4ki
from Manchester, Manchester
13 Feb
 TEDxManchester ‏ @TEDxManchester

Link to the funny vid played by @ar3toul4ki – Scottish voice recognition problems.. http://youtu.be/sAz_UvnUeuU

Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Ευχαριστώ! Το είδες μήπως; RT @pgaval: @ar3toul4ki Καλή επιτυχία!
13 Feb
 Claire-Marie ‏ @CMBoggiano 

‘When I was lying in bed dying, where were the real people?’ Ian Forrester, #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Tony Churnside ‏ @TonyChurnside 

Watching @cubicgarden talk about his #brushwithdeath. A very scary time. #TEDxMCR pic.twitter.com/hED5mimw
Retweeted by Maria Ar3toul4ki
13 Feb

Maria Ar3toul4ki ‏ @ar3toul4ki 

@tdobson @cubicgarden is talking about you! : D
In reply to Tim Dobson
13 Feb
 Tim Dobson ‏ @tdobson 

so @cubicgarden is talking about it #brushwithdeath when I may or may not have been his flatmate at the time.. #tedxman
Retweeted by Maria Ar3toul4ki
13 Feb
 Matthew Brooks ‏ @brooksoid 

And @cubicgarden ‘s talk is about… @cubicgarden ! He’s finally gone recursive. #TEDxMCR
Retweeted by Maria Ar3toul4ki
from Manchester, Manchester
13 Feb
 Tony Churnside ‏ @TonyChurnside

@cubicgarden you’re looking good! pic.twitter.com/n8xvkzJB
Retweeted by Maria Ar3toul4ki
13 Feb


 Ian Forrester ‏ @cubicgarden

And next on at #TEDxMCR its @ianforrester. With the story of me…
Retweeted by Maria Ar3toul4ki
13 Feb
 TEDxManchester ‏ @TEDxManchester 

Hilarious talk on Voice Recognition from Dr Maria Aretoulaki #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Tim Dobson ‏ @tdobson

@davemee it’s all about context! /cc @ar3toul4ki ;)
Retweeted by Maria Ar3toul4ki
In reply to Dave Mee
13 Feb
 Tim Dobson ‏ @tdobson 

@davemee @ar3toul4ki “fetish cheese”
Retweeted by Maria Ar3toul4ki
In reply to Dave Mee
13 Feb
 Dave Mee ‏ @davemee 

@tdobson @ar3toul4ki feed her through siri and send me a transcript!
Retweeted by Maria Ar3toul4ki
In reply to Tim Dobson
13 Feb
 Kate Towey ‏ @katiemaymanc 

Fascinating talk from Tara Shears on particle physics. ’2012 is year of the Higgs’ #tedxmcr
Retweeted by Maria Ar3toul4ki
13 Feb
 Ian Pettigrew ‏ @KingfisherCoach 

So far at #TEDxMCR we’ve covered pursuing your passion, JDI (and make mistakes), technology, algorithms, and particle physics. I’m happy!
Retweeted by Maria Ar3toul4ki
13 Feb
 Allie Johns ‏ @AllieJohns

I propose bringing back Tomorrow’s World and having Tara Shears present it #tedxmcr
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki

@TaraShears @TEDxManchester: oh my Higgs! We’ve seen something! Or have we?? #TEDxMCR twitpic.com/8jdo56
13 Feb

Ian Forrester ‏ @cubicgarden 

The goddamn particle explained at #TEDxMCR yfrog.com/obsv5tmj
Retweeted by Maria Ar3toul4ki
13 Feb

Maria Ar3toul4ki ‏ @ar3toul4ki 

@TaraShears @TEDxManchester: where’s that God-damned Higgs particle?! If we don’t find it, we’ll have to start all over again… #TEDxMCR
13 Feb
 Claire-Marie ‏ @CMBoggiano 

“@lmccadden: #tedxmcr learning about quarks and leptons from @tarashears making particle physics easy – sort of”
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki

@TaraShears @TEDxManchester: symmetry, simplicity, elegance = beauty of the standard model of particle physics #TEDxMCR
13 Feb
 TEDxManchester ‏ @TEDxManchester 

Up next @TEDxManchester is @TaraShears – tune in live to ow.ly/92eRf #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@TEDx video 1 @TEDxManchester: pragmatic chaos to describe fluid things such as culture #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@coralgrainger no worries sweetness : )
In reply to coralgrainger
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@TEDx video @TEDxManchester: what we don’t understand, we give a name and a story to #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@maryannehobbs you were, nay ARE, awesome! Xx
In reply to maryanne hobbs
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki

Dan O’Hara @skeuomorphology @TEDxManchester: from random relentless replication (cf. spambots) to guided transformation of chaos #TEDxMCR
13 Feb
 Kim Willis ‏ @KimberleyWillis 

Dan O’Hara: technology is not a selection of gadgets but a body of knowledge instagr.am/p/G8sGbrBVY7/ #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb

Ian Wareing ‏ @ianwareing 

#tedxmcr @skeuomorphology “Necessity is not the mother of invention. Invention is the mother of necessity”
Retweeted by Maria Ar3toul4ki
13 Feb
 Ian Forrester ‏ @cubicgarden

Bloatware… or stimulation of the real on the virtual RT @maanasvarun: Skeumorphism. wait what? #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Dan O’Hara @skeuomorphology @TEDxManchester: the creation of living technology by merging the Arts and Sciences #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@tombloxhammbe @TEDxManchester: go through life making mistakes, otherwise you don’t take any decisions, just do it ! © #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@maryannehobbs @TEDxManchester: John Peel saving lives again #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@maryannehobbs @TEDxManchester: follow your passion! #TEDxMCR twitpic.com/8jcwpn
13 Feb


 TEDxManchester ‏ @TEDxManchester

Hi all we’re suggesting #TEDxMCR as the hashtag for the event today as it’s a bit shorter than #TEDxManchester :-)
Retweeted by Maria Ar3toul4ki
13 Feb
 TEDxManchester ‏ @TEDxManchester 

Sorry folks for the livestream #fail. We’re currently on this channel live.. bit.ly/y9kkZa #TEDxMCR
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@gazshaw cheers!
In reply to Gaz Shaw
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki

:D see you there Mike! It’s been a loooong time! RT @mike_higham: @ar3toul4ki @TEDxManchester Looking forward to it #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

@heloukee oh nooo : s
In reply to Helen Keegan
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Excited & honoured to be speaking @TEDxManchester today. My talk “Voice Recognition FTW!” on the present+future of user interfaces #TEDxMCR
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

See you there Matt! RT @matthbooth: A bit of work then @TEDxManchester. Looking forward to it.
13 Feb
 Allie Johns ‏ @AllieJohns 

“@maryannehobbs: interesting day: speaking about passion at @TEDxManchester 1pm.. ” > we can never have enough passion in our lives.
Retweeted by Maria Ar3toul4ki
13 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

:D Will you be my groupie?? RT @technicalfault: @ar3toul4ki Dr Maria at TEDx!
12 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Looking forward to giving #TEDxMCR an insight into the wondrous+often misconstrued world of voice recognition @TEDxManchester tomorrow
12 Feb
 TEDxManchester ‏ @TEDxManchester 

And in other late-breaking news Dr. Maria @Ar3toul4ki will also be taking the stage tomorrow at #TEDxMCR :-)
Retweeted by Maria Ar3toul4ki
12 Feb
 TEDxManchester ‏ @TEDxManchester 

A big welcome for our latest speaker @MartinSFP – European Editor @TheNextWeb for #TEDxMCR. Like @MaryAnneHobbs a brave no-slide presenter!
Retweeted by Maria Ar3toul4ki
11 Feb
 Anna Nachesa ‏ @ashalynd 

I’ll probably be very evil if I ask during an interview if tail-optimized recursion is possible in C. OTOH, it might be a great icebreaker:)
Retweeted by Maria Ar3toul4ki
11 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

RT @TEDxManchester: @ar3toul4ki:really excited bout Mons #TEDxMCR @maryannehobbs @brendandawes @cubicgarden @skeuomorphology @tombloxhammbe

10 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Can’t wait to finally meet you! : D RT @maryannehobbs: @ar3toul4ki :)

10 Feb
 Maria Ar3toul4ki ‏ @ar3toul4ki 

Getting really excited about Monday’s #TEDxMCR @CornerhouseMcr: @maryannehobbs @brendandawes @cubicgarden Dan O’Hara & my Uni’s Tom Bloxham

A speech recognition user interface works when it … disappears!

25 Oct

Today is a big day for me! I’m finally getting to meet in person one of the Coryphées of the VUI Design World (even though as far as I know he’s not a ballet dancer), Bruce Balentine of the Enterprise Integration Group (EIG).  Bruce is the author of one of the best books ever written on IVR / Speech applications / Voice User Interface Design, It’s Better to Be a Good Machine Than a Bad Person – Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age.

Apart from the ingenuity of the title itself, encapsulating the golden rule of good user experience / usability design, you can readily see to what great lengths Bruce has gone to serve his pearls of design wisdom in a most humourous and utterly witty way. This doesn’t in any way decrease in the least the importance, relevance and truthfulness of his observations and recommendations. Bruce is a veteran designer and he has seen it all before, from the excitement and optimism to the disappointment and pessimism, to the final destination, design realism:

First we tried to make them human. Now it’s time to make them work 

To get a flavour of the type of UX design advice and messages conveyed in the book, here’s an extract from Chapter  132: Will Speech Technology Ever Work? (pp. 393-395 in my 2007 edition):

In closing, I must ask the question. Will it ever work? And, of course, the answer is, yes. Speech recognition—and its related technologies (e.g., speaker verification, text-to-speech, audio indexing, speech data mining, dictation) will work. Indeed they already do. They will fill their respective application niches almost completely. And, in fact, the majority will do so quite soon. What will change is the definition of “work”.

Speech recognition is primarily a user interface technology*. As such, it works when it disappears. It’s really that simple. When the users are not thinking about the user interface, but instead are accomplishing the task to which they are connected by the user interface, then and only then can the interface be said to be “working.” We have to stay on message with this fundamental fact if we are ever to succeed at bringing speech to the performance level where we can legitimately claim that it “works.”

True words!!! As a bonus,  Leslie Degler’s illustrations perfectly complement and enhance the messages conveyed in the text, once again in the wittiest and most original manner.  Buy this book ASAP! After all, if you don’t agree with its theses, you can always return it. All you need to do is:

Write out in longhand, on a separate page, “I,” and add your name, “agree that there’s not a chance in Hell any refund will ever come of this claim.” Label this statement as your “declaration.”  

After you have received your refund, we’ll call you with an outbound IVR that asks you several hundred thought-provoking questions about your customer experience. We value your opinion—please give us your most honest and spontaneous responses. We’ll do our best to recognize them

It says it all really! :)

To date, I have only met Bruce virtually, through Skype calls and the Creative Speech Technology Network (CreST) of which we are both members, and I can already tell he is a very funny, witty, creative (musical!),  interesting, as well as intelligent person. So I can’t wait to meet him in person later today and hear some more fascinating stories and hilarious anecdotes from the world of speech recognition application design, voice interface usability and technology abuse!

UPDATE:

I went (to the dinner with Bruce) and (was) conquered by the brilliance and witticism of the man! I got my long-awaited autograph in his book too, as I can now prove!

Human-Machine Interaction in Translation (NLPCS 2011)

21 Aug

For a few years now I have been in the Programme Committee of the International Workshop on Natural Language Processing and Cognitive Science (NLPCS), organised by a long-time colleague and friend, Dr. Bernadette Sharp from Staffordshire University. The aim of this annual workshop is “to bring together researchers and practitioners in Natural Language Processing (NLP) working within the paradigm of Cognitive Science (CS)“.

The overall emphasis of the workshop is on the contribution of cognitive science to language processing, including conceptualisation, representation, discourse processing, meaning construction, ontology building, and text mining.”

There have been NLPCS  Workshops in Porto (2004), Miami (2005), Paphos (2006), Funchal (2007), Barcelona (2008), Milan (2009) and Funchal (2010).

Copenhagen Business School

Copenhagen Business School

This year’s 8th International NLPCS Workshop just took place this weekend in Copenhagen, Denmark (20-21 Aug 2011). The Workshop topic was: “Human-Machine Interaction in Translation“, focussing on all aspects of human and machine translation, and human-computer interaction in translation, including:  translators’ experiences with CAT tools, human-machine interface design, evaluation of interactive machine translation, user simulation and human factors. Thus, the topics were approached from a number of different perspectives:

  • from full automation by machines for machine (traditional NLP or HLT)
  • semi-automated processing, i.e. machine-mediated processing (programs assisting people in their tasks),
  • but also simulation of human cognitive processes

I had the opportunity once again to review a few of the paper submissions and can therefore highly recommend reading the full Proceedings of the NLPCS 2011 Workshop that have just been made available.

I found particularly interesting the following 3 contributions:

  • Valitutti, A. “How Many Jokes are Really Funny? A New Approach to the Evaluation of Computational Humour Generators”
  • Nilsson, M. and J. Nivre. “Entropy-Driven Evaluation of Models of Eye Movement Control in Reading” 

and

  • Finch, A., Song, W., Tanaka-Ishii, K. and E. Sumita. “Source Language Generation from Pictures for Machine Translation on Mobile Devices”

Enjoy!

Speech Interaction on Mobile Devices at SpeechTEK 2011 (New York)

7 Aug

Today sees the launch of the Joint AVIxD / IxDA Workshop on Speech Interaction on Mobile Devices that kick-starts the mother of Voice Solutions Fairs, SpeechTEK 2011 in New York next week (8-10 Aug).

AVIxD

AVIxD is the Association for Voice Interaction Design, a professional organisation that aims to

“eliminate apathy and antipathy toward the need for good design of automated voice services”, 

which has become my favourite VUI mantra!

IxDA is the Interaction Design Association, a much bigger professional “un-organisation” which  intends to:

“improve the human condition by advancing the discipline of Interaction Design”

A very worthy cause indeed, especially since it is true that “the human condition is increasingly challenged by poor experiences. “!

IxDA

Today’s Joint Workshop in New York aims to bring together interaction design practitioners from across the voice, interactive, and digital areas to identify the issues and challenges involved in  speech interaction design on mobile devices, such as smartphones and tablets, and to come up by the end of the day with ways to approach them or even tackle them. A very ambitious format that, however, really does work!

AVIxD organised another Workshop this year on Cross-linguistic & Cross-cultural Voice Interaction Design, which was also the 1st European Workshop, just before SpeechTEK Europe in London this May past. See what we all came up with in those 6 hours in the SpeechTEK Europe PDF presentation below.

And if you don’t manage to take part in today’s workshop, make sure you go to the SpeechTEK Conference and Exhibition itself that starts tomorrow and runs until Wednesday the 10th. Listen to presentations and see or even try for yourself market-ready products relating to:

  • multimodal applications
  • cross-channel applications
  • speech analytics
  • speaker identification and verification
  • in-car systems
  • natural language and say-anything technologies
  • speech translation
  • voice-enabled personal assistants
  • as well as the latest speech recognition techniques and technologies

I particularly recommend the Keynote Panel on “Mobility — A Game-Changer for Speech?” on Tuesday on how smartphones are dramatically changing how customers interact with businesses and with the devices themselves. Some really interesting issues and questions will be raised, such as:

* How voice user interfaces will be integrated with graphical user interfaces?

or

* Will users embrace voice as they have embraced keypads on mobile devices? 

Sadly I am in the UK today and next week, so I’m going to miss it all. But if you are lucky enough to be in or near New York, make sure you go and enjoy!

SpeechTEK 2011 New York

SpeechTEK Europe 2011 – The Voice Solutions Showcase

20 May

(update at the end)

SpeechTEK Europe 2011 takes place in London next week (25 – 26 May 2011, Copthorne Tara Hotel, London, UK) and I am participating very actively! Firstly, I am co-chairing the Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I have already written a blog post on that. Then, I will be presenting the outcome of our discussions at the Workshop in the Main SpeechTEK Conference itself, on Wednesday 25th May (2:45 p.m. – 3:30 p.m) during Session B104: Speech organisations speak out. It should be a challenge as the Workshop runs from 1-7pm the previous day, so I will have a very busy evening after dinner trying to prepare a coherent and comprehensive presentation!!

And finally, on both days of the Main Conference (Wed 25 – Thu 26 May), I will be holding the free consultancy one-to-one appointments in the context of the brand new for this year Meet the Consultants Clinic.  I am one of the “5 global speech tech experts” available “to discuss your speech tech needs and challenges“. Maybe you need to check out my older blog post on speech recognition (for dummies!) to get an idea of what I will be chatting about with everyone. You may also want to check out my presentation slides from last year and from 2007. Get them from these older blog posts: ““The Eternal Battle Between the VUI Designer and the Customer“ and “Does Your Customer Know What They are Signing off??“. Although you do need to pre-book, these appointments are free for registered conference delegates or Expo visitors, so I’m looking forward to meeting some of you in person!

There’s still time to sign up for the SpeechTEK Europe Conference and Free Entry Expo. Use the following link to register and we’ll see you in London next week! http://www.speechtek.com/europe2011/Registration.aspx

Here’s a quick round-up of what’s happening:

  • Conference Keynotes by Google‘s Engineering Director, Dave Burke, who tells SpeechTEK Europe about Google’s plans for cloud-based speech recognition, and Professor Alex Waibel who describes and demonstrates how speech technology is helping to overcome language and cultural barriers. Free entry for Expo visitors too.
  • Learn from over 50 global expert speakers sharing their experiences – both good and bad – and enabling you to build the ultimate multimodal experience for your customers, saving you money and improving your service.
  • Network with colleagues from all over the world, who have already implemented successful strategies. Companies attending include ABN Amro Bank, Apple, Barclays Bank, Microsoft, Orange, Lloyds Bank, Dell, Cap Gemini and more.
  • Identify, evaluate, integrate, and optimise the latest speech technology solutions from world-leading providers at SpeechTEK Europe’s Expo.

SpeechTEK Europe features over 50 speakers from around the world, and from a wide range of business environments including Google, Barclays Bank, Deutsche Telekom, Nuance, Loquendo, Openstream, Voxeo, Belgian Railways, Telecom Italia, Cable & Wireless, and Westpac.

LEARN ABOUT

Business strategies – Speech biometrics – Multichannel applications – Multilingual applications – Multimodal applications – Assistive technologies – Analytics and Measurement – Voice User Interaction design – Speech application development tools and languages – Case studies, panel discussions and more …

UPDATE

SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).

But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!

Get the slides in PDF here:  Maria Aretoulaki – SpeechTEK Europe 2011 presentation.

FutureEverything 2011 – The Future is now (here in Manchester!)

12 May

Today saw the launch of the very interdisciplinary (some would say “transdisciplinary” even) FutureEverything Festival (previously Futuresonic) , a long-running and world-renowned annual Conference and Festival of Technology and Innovation, Art and Music running from the 11th to the 14th May in Manchester , UK (@FuturEverything #futr).  Apart from the annual May events,

FutureEverything creates year-round Digital Innovation projects that combine creativity, participation and new technologies to deliver elegant business and research solutions.   In 2010 we launched the FutureEverything Award, an international prize for artworks, social innovations or software and technology projects that bring the future into the present.

I have always made a point to attend at least one music or art event every year since 2007 (when the Festival was still called Futuresonic) and I have always been particularly interested in the forward-thinking Digital Technologies Conference.  So I was over the moon when I was invited to participate in the Conference and informally share my words of wisdom on speech and language technologies for emotional computing. Armed with my complimentary Festival Pass, I am now really looking forward to 2 days (Thu 12 – Fri 13 May 2011) packed with presentations, discussions and debates on: Urban Games and Virtual Identities, Robots  and Smart Cities, open data and participatory democracy. community-serving Geeks and Hackers, Open source software and citizen inclusion, and one of my favourites, emotional computing: making human-computer interfaces personable, engaging and persuasive and interaction with them more intuitive and even fun.

The FutureEverything Conference is brainstorming on a massive scale. Combined with all the live Twitter updates and feeds, it is going to have once again viral impact worldwide with the novel, brave and infectious ideas that will be coming out of it and around it. At the same time, the use of dynamic and democratic microblogging will allow massive participation to the Conference by people on both sides of the Atlantic who are not physically present but are still listening and virtually and remotely contributing their feedback and ideas. In fact, the FutureEverything Festival and the Conference are quintessential instantiations of the perfect balance of online – offline, virtual and real, local and remote, one-to-many / many-to-one broadcasting. And I’m right in the middle of this awesome time-space continuum (May 2011 in Manchester UK)! :)

Update (Sun 15 May):

There is now a FutureEverything Festival Portal with a compilation of blog posts, photos, audio, video and more related to the 4 days of the Festival and Conference. Check it out here: http://www.fe-2011.org/

I will also be adding my feedback on what I heard at the Conference in the next couple of days.

Cross-linguistic & Cross-cultural Voice Interaction Design

31 Jan

(update at the end)

2010 saw the first SpeechTEK Conference to have taken place outside of the US, SpeechTEK Europe 2010 in London. This year’s European Conference, SpeechTEK Europe 2011, will take place again in London (25 – 26 May 2011), but this time it will be preceded on Tuesday 24th May by a special Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). The main goal of AVIxD is to bring together voice interaction and experience designers from both Industry and Academia and, among other things, to “eliminate apathy and antipathy toward the need for good design of automated voice services” (that’s my favourite!). This is the first AVIxD Workshop to take place in Europe and I am honoured to have been appointed Co-Chair alongside Caroline Leathem-Collins from EIG.

Participation is free to AVIxD members and just £25 for non-members (which may be applied towards AVIxD membership). However in order to participate in the workshop, you need to submit a brief position paper in English (approx. 500 words) on any of the special topics of interest of the Workshop (See CFP below). The deadline for electronic submissions is Friday 25 March, so you need to hurry if you want to be part of it!

Here’s the full Call for (Position) Papers from the AVIxD site:

Call for Position Papers

First European AVIxD Workshop

Cross-linguistic & Cross-cultural Voice Interaction Design

Tuesday, 24 May 2011 (just prior to SpeechTEK Europe 2011), 1 – 7 PM

London, England

The Association for Voice Interaction Design (AVIxD) invites you to join us for our first voice interaction design workshop held in Europe, Cross-linguistic & Cross-cultural Voice Interaction Design. The AVIxD workshop is a hands-on day-long session in which voice user interface practitioners come together to debate a topic of interest to the speech community. The workshop is a unique opportunity for them to meet with their peers and delve deeply into a single topic.

As in previous years with the AVIxD Workshops held in the US, we will write papers based on our discussions which we will then publish on www.avixd.org. Please visit our website to see papers from previous workshops, and for more details on the purpose of the organization and how you can be part of it.

In order to participate in the workshop, individuals must submit a position paper of approximately 500 words in English. Possible topics to touch upon in your submission (to be discussed in depth during the workshop) include:

  1. Language choice and user demographics
  2. Presentation of the language options to the caller and caller preference
  3. Creation and (co-)maintenance of dialogue designs, grammars, prompts across languages
  4. Political and sociolinguistic issues in system prompt choices and recognition grammars, such as code-switching, formal versus informal registers
  5. Guidelines for application localization, translation, and interpretation
  6. Setting expectations regarding availability of multilingual agents, Language- and culture-sensitive persona definition
  7. Coordinating usability testing and tuning across diverse linguistic / cultural groups
  8. Language choice and modality preference

We always encourage the use of specific examples from applications you’ve worked on in your position paper.

Participation is free to AVIxD members; non-members will be charged £25, which may be applied towards AVIxD membership at the workshop. Please submit your position papers via email no later than Friday 25 March 2011 to cfp@avixd.org. Letters of acceptance will be sent out on 30 March 2011.

We look forward to engaging with the European speech design community to discuss the particular challenges of designing speech solutions for users from diverse linguistic and cultural backgrounds. Feel free to contact either of the co-chairs below, if you have any questions.

Caroline Leathem-Collins, EIG  (caroline {at} eiginc {dot} com)

Maria Aretoulaki, DialogCONNECTION Ltd (maria {at} dialogconnection {dot} com)

UPDATE

SpeechTEK Europe 2011 has come and gone and I’ve got many interesting things to report (as I have been tweeting through my @dialogconnectio Twitter account).

But first, here are the slides for my presentation at the main conference on the outcome of the AVIxD Workshop on Cross-linguistic & Cross-cultural Voice Interaction Design organised by the Association for Voice Interaction Design (AVIxD). I only had 12 hours to prepare them – including sleep and London tube commute – so I had to practically keep working on them until shortly before the Session! Still I think the slides capture the breadth and depth of topics discussed or at least touched upon at the Workshop. There are several people now writing up on all these topics and there should be one or more White papers on them very soon (by the end of July we hope!). So the slides did their job after all!

Get the slides in PDF here:  Maria Aretoulaki – SpeechTEK Europe 2011 presentation.

The eternal battle between the VUI Designer & the Customer

7 Dec

I promised some time ago to put up the slides of my presentation at this year’s SpeechTEK Europe 2010 in London, the first SpeechTEK to have taken place outside of the US. My presentation, “The Eternal Battle Between the VUI Designer and the Customer“, was on Wednesday 26th May 2010 and opened the “Voice User Interface Design: Major Issues” Session.  It went down really well, and I had afterwards several people in the audience tell me about their own experience and asking me for tips on how to deal with similar issues.

Here is a PDF with the presentation slides:

 

Maria Aretoulaki – “The Eternal Battle Between the VUI Designer and the Customer” (SpeechTEK Europe 2010 presentation)

Maria Aretoulaki – “The Eternal Battle Between the VUI Designer and the Customer” (SpeechTEK Europe 2010 presentation)

Maria Aretoulaki – SpeechTEK Europe 2010 presentation UPDATED ppt

And here’s the gist of it:

VUI Design is preoccupied with the conception, the design, the implementation, the testing, and the tuning of solutions that work in the most efficient, secure and non-irritating for the user manner. Well, realistically that’s what VUI Design can achieve. In an ideal world, the VUI Designer would actually strive to create speech applications that – apart from taking into consideration the customer’s financial and brand requirements – would also fit the caller’s needs, goals and preferences. The initial Requirements analysis should bring both in focus. So much is already known and accepted both amidst the VUI Designers and the customers.

The problems start just after they all leave the meeting room and start working on the implementation: Call flow design, system persona development and prompt crafting, but even recognition grammars, all seem to fall victim of a war of words and attitudes between the VUI Design expert who has seen systems being developed and spurned before, and the customer with his tech-savvy business team and their technical architects and programming geniuses, who all think they know what callers want and how call flows should be structured, prompt wording crafted and grammars written, just because they have got strong opinions! Even the results of Usability tests are liable to different interpretations by each side.

This presentation pinpoints common pitfalls in the communication between a VUI Designer and customer employees and recommends ways to resolve conflicts and disagreements on the application design and implementation.

Credits:

SpeechTEK Europe 2010 was organised by:

Information Today, Inc.
143 Old Marlton Pike
Medford NJ 08055 U.S.A.
Phone 1 (609) 654-6266.
http://www.infotoday.com

Speech Recognition for Dummies

20 May

OK, I often have to explain to people what I do and in most cases I get an enquiring and mystified look! What is Speech Recognition, let alone VUI Design! So I guess I have to go back to basics for a bit and explain what Speech Recognition is and what speech recognition applications involve.

What is Speech Recognition then?

Speech Recognition is the conversion of speech to text.  The words that you speak are turned into a written representation of those words for the computer to process further (figure out what you want in order to decide what to do or say next). This is not an exact science because – even among us humans – speech recognition is difficult and is fraught with misunderstandings or incomplete understanding. How many times have you had to repeat your name to someone (both in person and on the phone)? How many times have you had someone cracking up with laughter, because they thought you said something different to what you actually said? These are examples of human speech recognition failing magnificently! So it is no wonder that machines do it even less well. It’s all guesswork really.

In the case of machine speech recognition, the machine will have a kind of lexicon into its disposal with possible words in the corresponding language (English, French, German etc.) and their phonetic representation. This phonetic representation describes the ways that people are most likely to pronounce this specific word (think of Queen’s English or Hochdeutsch for German, at this point). Now if you bring regional accents and foreigners speaking the language into the equation, things get even more complicated. The very same letter combinations or whole words are pronounced completely differently depending on whether you are from London, Liverpool, Newcastle, Edinburgh, Dublin, Sydney, New York, or New Orleans. Likewise, the very same English letter combinations and words will sound even more different when spoken by a Greek, a German or a Japanese person. In order to deal with those cases, speech recognition lexica are augmented with additional “pronunciations” for each problematic word. So the machine can hear 3 different versions of the same word spoken by different people and still recognise it as one and the same word. Sorted! Of course you don’t need to go into all this trouble for every possible word or phrase in the language you are covering with your speech application. You only need to go to such lengths for words and phrases that are relevant to your specific application (and domain), as well as for accents that are representative of your end-user population. If an app is going to be used mainly in England, you are better off covering Punjabi and Chinese pronunciations of your English app words rather than Japanese or German variants. There will of course be Japanese and German users of your system, but they represent a much smaller percentage of your user population and we can’t have everything!!

Speech recognition may be based on text representations of words and their phonetic “translation” (pronunciations) but the whole process is actually statistical. What you say to the system will be processed by the system as a wave signal like this one here:

Speech signal for ".. and sadly crime experts predict that one day even a friendly conversation between mother and daughter will be conducted at gunpoint" :) (Based on the Channel 4 comedy series "Brass Eye" - Season 1)

So the machine will have to figure out what you’re saying by chopping this signal up into parts, each representing a word that makes sense in the context of the surrounding words. Unfortunately the same signal can potentially be chopped up in several different ways, each representing a different string of words and of course a different meaning! There’s a famous example of the following ambiguous string:

signal for "How to Wreck a Nice Beach" err I mean "How to recognise Speech"!! (Taken from FNLP 2010: Lecture 1: Copyright (C) 2010 Henry S. Thompson)

The same speech signal can be heard as “How to wreck a nice beach” or .. “How to recognise speech“!!! They sound very similar actually!! (Taken from FNLP 2010: Lecture 1) So you can see the types of problems that us humans, let alone a machine, are faced with when trying to recognise each other!

Speech Recognition Techniques

The approach to speech recognition described above, which uses hand-crafted lexica, is the standard “manual” approach. This is effective and sufficient for applications that represent very limited domains, e.g. ordering a printer or getting your account balance. The lexica and the corresponding manual “grammars” can describe most relevant phrases that are likely to be spoken by the user population. Any other phrases will be just irrelevant one-offs that can be ignored without negatively affecting the performance of the system.

For anything more complex and advanced, there is the “statistical” approach. This involves the collection of large amounts of real-world speech data, preferably in your application domain: medical data for medical apps, online shopping data for a catalogue ordering app etc. The statistical recogniser will be run over this data multiple times resulting in statistical representations of the most likely and meaningful combinations of sounds in the specific human language (English, German, French, Urdu etc.).  This type of speech recogniser is much more robust and accurate than a “symbolic” recogniser (which uses the manual approach), because it can accurately predict sound and word combinations that could not have been pre-programmed in a hand-crafted grammar. Thus statistical recognisers have got much better coverage of what people actually say (rather than what the programmer or linguist thinks that people say). Sadly, most speech apps (the Interactive Voice Response systems or IVRs, for instance, used in Call Centre automation) are based on the manual symbolic approach rather than the fancy statistical one, because the latter requires considerable amounts of data and this data is not readily available (especially for a new app that has never existed before). A lot of time would need to be spent recording relevant human-2-human conversations and even more time to analyse it in a useful manner. Even when data is available, things such as cost and privacy protection get in the way of either acquiring it or putting it into use.

Speech Recognition Applications

By now you should have realised how complex speech recognition is at the best of times, let alone how difficult it is to recognise people with different regional accents, linguistic backgrounds, and .. even moods or health conditions! (more on that later) Now let’s look at the different types of speech recognition applications. First of all, we should distinguish between speaker-dependent and speaker-independent apps.

Speaker-dependent applications involve the automatic speech recognition of a single person / speaker. It could be your dictation system that you’ve installed on your PC to take notes down, or start writing emails and letters. It could be your hand-held dictation system that you carry around as a doctor or a lawyer, composing a medical report on your patients or talking to your clients, walking up and down the room. It could even be your standard mobile phone or smartphone / iphone / Android  that you use to call (voice dial) one of your saved contacts, search through your music library for a track with a simple voice command (or two), or even to tweet. All these are speaker-dependent applications in that the corresponding recogniser has been trained to work with your voice and your voice only. You may have trained it in as little as 5 minutes of speaking to it or longer / shorter in other cases, but it will work sufficiently well with your voice, even if you’ve got a cold (and therefore a hoarse voice) or you’re feeling low (and are therefore more quiet than usual). Give it to your mate or colleague though and it will break down, or misrecognise you in some way. The same recogniser will have to be retrained with any other speaker in order to work.

Enter speaker-independent speech recognition systems! They have been trained on huge amounts of real-world data with thousands of speakers of all kinds of different linguistic, ethnic, regional, or educational backgrounds. As a result, those systems can recognise anyone, both you and your mate and even all your colleagues or anyone else you are likely to meet in the future. They are not tied to the way you pronounce things, your physiology or your voiceprint; they have been developed to work with any human (or indeed machine pretending to be a human, come to think of it!).  So when you buy off-the-shelf speech recognition software, it’s going to work immediately with any speaker, even if badly in some cases. You can later customise it to work for your specific app world and for your target user population, usually with some external help (Enter Professional Services providers.). Speaker-independent applications can work on any phone (mobile or landline) and are used mainly to (partly) automate Call Centres and Helplines, e.g. speech and DTMF IVRs for online shopping, telephone banking or e-Government apps. OK, speech recognition on a mobile can be tricky as the signal may not be good, i.e. intermittent, the line could be crackling, and of course there is the additional problem of background noise, since you are most likely to use it out in the busy streets or some kind of loud environment. Speaker-independent recognition is also used to create voice portals, i.e. speech-enabled versions of websites for greater accessibility and usability (think of disabled Web users). Moreover, a speaker-independent recogniser is also used for voicemail transcription, that is when you get all the voicemails you have received on your phone transcribed automatically and sent to you as text messages, for instant and – importantly – discrete accessibility. They are B2B applications, which means that the solution is sold to a company (a Call Centre, a Bank, a Government organisation). In contrast, speaker-dependent apps are sold to an individual, so they are B2C apps, they are sold directly to the end customer.

Because speaker-independent apps have to work with any speaker calling from any device or channel (even the web, think of Skype), the corresponding speech recogniser is usually stored on a server or cloud somewhere. Speaker-dependent apps on the other hand are stored locally on your personal PC, laptop, Mac, mobile phone or handheld.

And to clear any potential confusion beforehand, when you ring up from your mobile an automated Call Centre IVR (for instance to pay a utilities bill), you are using a speech recogniser stored at that Call Centre’s, the company’s, reseller’s or solution provider’s server rooms. So in that case, although you are using your unique voice on your personal mobile phone, the recogniser does not reside on it. The same holds for voicemail transcription, curiously! Although you are using your unique voiceprint on your personal phone to leave a voicemail on your mate’s phone, the speech recogniser used for the automatic transcription of your mate’s voicemail will be residing on some secret server somewhere, perhaps at the Headquarters of their mobile provider or whoever is charging your mate for this handy service. In contrast, when you use a dictation / voice-to-text app on your smartphone to voice dial one of your contacts, your personal voiceprint, created during training and stored on the device, is used for the speech recognition process. So recognition is a built-in feature. Nowadays there is, however, a third case: if you are using your smartphone to search for an Indian restaurant on Google Maps, the recogniser actually resides in the cloud, on Google servers, rather than on the device. So there are increasingly more permutations of system configurations now!

There are many off-the-shelf speech recognition software packages out there. Nuance is one of the biggest technology providers for both speaker-independent and speaker-dependent / dictation apps.  Other automatic speech recognition (ASR) software companies are Loquendo, Telisma, and LumenVox.  Companies specialising in speaker-dependent / dictation systems are Philips, Grundig and Olympus, among others.  However Microsoft has also long been active in Speech processing and lately Google has also been catching up very fast.

The sky is the limit, as the saying goes!

Follow

Get every new post delivered to your Inbox.