Voice Assistants and Kids

7 Nov

This week I bumped into Echodad and “the world’s harshest critics of Alexa skills“, i.e. his 2 kids. Hat tip for the idea by the way!! 😀

I read with particular interest his Medium post on “How Voice Interfaces are infiltrating society and changing our children“, a mockingly deceptive title for a list of witty repartees to all the usual arguments against Voice Assistants and speech interfaces being used by children. Highly enjoyable! 🙂

He addresses the perennial argument about children unlearning good manners, when they interact with VAs too much, and talks about his own experience with his kids assuring us that, actually, children know the difference between a human and a Voice Assistant. Phew, so there is no need to panic that they are going to suddenly be more and more rude to their teachers and grandparents and, hence, no need to reprogram voice interfaces to insist on “Pleases” and “Thank-yous” (Phew, again). I am fully in agreement there.

Similarly, he discusses the argument that children unlearn the importance and necessity of patience in the face of inescapable delays and life’s frustrations, when they can’t instantly get what they want (an answer to an obscure question about the animal kingdom or their favourite YouTube video). This seasoned dad knows that patience is something all kids have always struggled with and have to continually practise at that age. It is part of learning and growing up. And I can attest to that, too. (Oh the tantrums!)

Echodad & “the world’s harshest critics of Alexa skills”

However, in contrast to him, I am not at all comfortable or happy with the idea that my kid would go straight to Alexa or Google or Bixby for an answer to his Space or Transport questions and not even bother approach me. I may well take recourse to voice search myself for the answer, but I want to be the Gatekeeper to the Information Castle and not a passive bystander in his quests for knowledge. At least that’s what I consider as one of my – traditional – roles as a parent and I would like him to think that way too, at least until he is older. And it’s not just a hunch: having studied Psycholinguistics and Child Developmental Psychology at University in an earlier life (last century though!), I know the paramount importance of parental input, continuous feedback and playful interaction in the child’s quality and rate of learning anything, but particularly language.

In this respect, Voice Assistants, Voicebots, Chatbots and Voice Search itself have changed parent-child roles and interaction forever, and neither I nor many people can predict the long-term effect of this societal development. Kids are already and will become more and more independent, for sure, and that is not a negative thing in itself, but will that at the same time make me – as a parent – less dependable and approachable? Will he automatically assume I’m lacking in encyclopaedic knowledge and he that lessen his respect for me? And will the kid get the same quantity and quality of validation cues from Alexa or Google Assistant, as he would have got of me? More soberingly, will it teach kids that they don’t need to rely on or even invest much in relationships with people, because interacting with technology is easier, more fun and safer and more efficient? It’s not just the relationship to the parents we should be worried about, but all other family, work, peer and romantic relationships as well.

I am all for the use of voice interfaces and spoken dialogue systems by adults, e.g. myself, naturally, and my own clients; VUIs have been my bread and butter for decades! Nonetheless, at the risk of sounding like a regressive technophobe, I am vehemently against its use by children and will keep mine shielded for as long as I can manage! It’s already hard to keep him from doing a regular Google voice search for “pictures for buses” (sic). 🤔 😅

Design Voice Assistants as Performers

4 Nov

I recently read a very interesting article in Fast Company, which resonated a lot with me. Entitled “It’s time to rethink voice assistants completely“, it addresses a common obsession with designing voice interfaces that could easily be mistaken for humans.

It undoubtedly seems like a very noble goal, given that humans are at the apex of the communication pyramid among other living beings (let alone inorganic material!). However, the article makes the case, which I fully support, that human-like is not always desirable or even appropriate, especially if it fools you into thinking you are interacting with another human. This is a particularly poignant faux-pas, if the user did indeed believe the dialogue bot, only to be then frustrated by its lack of understanding or the lack of rationale in its response.

As the article suggests, human conversations should, of course, be studied thoroughly, but only be taken as inspiration for VUI and Voice Design, and not as the gold standard of interaction.

In fact, a disruptive idea is put forward, to consider Voice Assistants to be “performers, rather than human-like conversationalists”. That is precisely how you can create and craft more expressive, emotional, engaging and sticky conversational interfaces and the corresponding conversation designs.

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!

DESIGN your conversational interface before Coding it!

20 Sep

I just listened to the latest Google Cloud Platform podcast with Google’s own Cathy Pearl and was delighted (and vindicated!) to hear her stress the importance of DESIGNING your conversational interface before CODING it. You really need to think hard about the user needs you’re trying to meet with the interface and the possible dialogue “scripts” that would help you do that. Only then should you bother start coming up with training phrases for your intents or actions. This will save you a lot of headaches, time and, actually, face among your customers too. I’ve seen so many instances myself of developers seeing VUI Design and Voice UX design as an afterthought and then wonder why their conversational interface is not conversational at all or indeed much of an interface to the intended data and services, as it doesn’t always respond appropriately to what the customer wants or even to what they say.

Listen to the enlightening podcast here.

An Amazon Echo in every hotel room?

16 Dec

The Wynn Las Vegas Hotel just announced that it will be installing the Amazon Echo device in every one of its 4,748 guest rooms by Summer 2017. Apparently, hotel guests will be able to use Echo, Amazon’s hands-free voice-controlled speaker, to control room lights, temperature, and drapery, but also some TV functions.

 

CEO Steve Wynn:  “I have never, ever seen anything that was more intuitively dead-on to making a guest experience seamlessly delicious, effortlessly convenient than the ability to talk to your room and say .. ‘Alexa, I’m here, open the curtains, … lower the temperature, … turn on the news.‘ She becomes our butler, at the service of each of our guests”.

 

The announcement does, however, also raise security concerns. The Alexa device is always listening, at least for the “wake word”. This is, of course, necessary for it to work when you actually need it. It needs to know when it is being “addressed” to start recognising what you say and hopefully act on it afterwards. Interestingly, though, according to the Alexa FAQ:

 

When these devices detect the wake word, they stream audio to the cloud, including a fraction of a second of audio before the wake word.

That could get embarrassing or even dangerous, especially if the “wake word” was actually a “false alarm“, i.e. something the guest said to someone else in the room perhaps that sounded like the wake word.

All commands are saved on the device’s History. The question is: Will the hotel automatically wipe the device’s history once a guest has checked out? Or at least before the next guest arrives in the room! Can perhaps every guest have access to their own history of commands, so that they can delete it themselves just before check-out? These are crucial security aspects that the Hotel needs to consider, because it would be a shame for this seamlessly delicious and effortlessly convenient experience to be cut short by paranoid guests switching the Echo off as soon as they enter the room!

User Interface Design is the new black!

14 Dec

capture

 

LinkedIn Unveils The Top Skills That Can Get You Hired In 2017

 

Number 5: USER INTERFACE DESIGN!

“User interface design is the new black:

UI Design, which is designing the part of products that people interact with, is increasingly in-demand among employers. It ranked #14 in 2014, #10 last year, and #5 this year (second largest jump on this year’s Global Top Skills of 2016 list). Data has become central to many products, which has created a need for people with user interface design skills who can make those products easy for customers to use.”

 

capture

Read all about it here.

https://blog.linkedin.com/2016/10/20/top-skills-2016-week-of-learning-linkedin

2 for 1: Sponsor a Top Speech, NLP & Robotics Event (SPECOM & ICR 2017)

9 Dec

specom

 

Joint SPECOM 2017 and ICR 2017 Conference

The 19th International Conference on Speech and Computer (SPECOM 2017) and the 2nd International Conference on Interactive Collaborative Robotics (ICR 2017) will be jointly held in Hatfield, Hertfordshire on 12-16 September 2017.

SPECOM has been established as one of the major international scientific events in the areas of speech technology and human-machine interaction over the last 20 years. It attracts scientists and engineers from several European, American and Asian countries and every year the Programme Committee consists of internationally recognized experts in speech technology and human-machine interaction of diverse countries and Institutes, which ensure the scientific quality of the proceedings.

SPECOM TOPICS: Affective computing; Applications for human-machine interaction; Audio-visual speech processing; Automatic language identification; Corpus linguistics and linguistic processing; Forensic speech investigations and security systems; Multichannel signal processing; Multimedia processing; Multimodal analysis and synthesis; Signal processing and feature extraction; Speaker identification and diarization; Speaker verification systems; Speech analytics and audio mining; Speech and language resources; Speech dereverberation; Speech disorders and voice pathologies; Speech driving systems in robotics; Speech enhancement; Speech perception; Speech recognition and understanding; Speech translation automatic systems; Spoken dialogue systems; Spoken language processing; Text mining and sentiment analysis; Text-to-speech and speech-to-text systems; Virtual and augmented reality.

Since last year, SPECOM is jointly organised with ICR conference extending the interest also to human-robot interaction. This year the joint conferences will have 3 Special Sessions co-organised by academic institutes from Europe, USA, Asia and Australia.

ICR 2017 Topics: Assistive robots; Child-robot interaction; Collaborative robotics; Educational robotics; Human-robot interaction; Medical robotics; Robotic mobility systems; Robots at home; Robot control and communication; Social robotics; Safety robot behaviour.

Special Session 1: Natural Language Processing for Social Media Analysis

The exploitation of natural language from social media data is an intriguing task in the fields of text mining and natural language processing (NLP), with plenty of applications in social sciences and social media analytics. In this special session, we call for research papers in the broader field of NLP techniques for social media analysis. The topics of interest include (but are not limited to): sentiment analysis in social media and beyond (e.g., stance identification, sarcasm detection, opinion mining), computational sociolinguistics (e.g., identification of demographic information such as gender, age), and NLP tools for social media mining (e.g., topic modeling for social media data, text categorization and clustering for social media).

Special Session 2: Multilingual and Low-Resourced Languages Speech Processing in Human-Computer Interaction

Multilingual speech processing has been an active topic for many years. Over the last few years, the availability of big data in a vast variety of languages and the convergence of speech recognition and synthesis approaches to statistical parametric techniques (mainly deep learning neural networks) have put this field in the center of research interest, with a special attention for low- or even zero-resourced languages. In this special session, we call for research papers in the field of multilingual speech processing. The topics include (but are not limited to): multilingual speech recognition and understanding, dialectal speech recognition, cross-lingual adaptation, text-to-speech synthesis, spoken language identification, speech-to-speech translation, multi-modal speech processing, keyword spotting, emotion recognition and deep learning in speech processing.

Special Session 3: Real-Life Challenges in Voice and Multimodal Biometrics

Complex passwords or cumbersome dongles are now obsolete. Biometric technology offers a secure and user friendly solution to authenticate and have been employed in various real-life scenarios. This special session seeks to bring together researchers, professionals, and practitioners to present and discuss recent developments and challenges in Real-Life applications of biometrics. Topics of interest include (but are not limited to):

Biometric systems and applications; Identity management and biometrics; Fraud prevention; Anti-spoofing methods; Privacy protection of biometric systems; Uni-modalities, e.g. voice, face, fingerprint, iris, hand geometry, palm print and ear biometrics; Behavioural biometrics; Soft-biometrics; Multi-biometrics; Novel biometrics; Ethical and societal implications of biometric systems and applications.

Delegates’ profile

Speech technology, human-machine interaction and human-robot interaction attract a multidisciplinary group of students and scientists from computer science, signal processing, machine learning, linguistics, social sciences, natural language processing, text mining, dialogue systems, affective modelling, interactive interfaces, collaborative and social robotics, intelligent and adaptive systems. The estimated number of delegates which will attend the Joint SPECOM and ICR conferences is approximately 150 participants.

Who should sponsor:

  • Research Organisations
  • Universities and Research Labs
  • Research and Innovation Projects
  • Academic Publishers
  • Innovative Companies

Sponsorship Levels

Based on different sponsorship levels, sponsors will be able to disseminate their research, innovation and/or commercial activities by distributed leaflets/brochures and/or by 3 days booths, in common area with the coffee breaks and poster sessions.

Location

The joint SPECOM and ICR conferences will be held in the College Lane Campus of the University of Hertfordshire, in Hatfield. Hatfield is located 20 miles (30 kilometres) north of London and is connected to the capital via the A1(M) and direct trains to London King’s Cross (20 minutes), Finsbury Park (16 minutes) and Moorgate (35 minutes). It is easily accessible from 3 international airports (Luton, Stansted and Heathrow) via public transportation.

Contact:

Iosif Mporas

i.mporas@herts.ac.uk 

School of Engineering and Technology

University of Hertfordshire

Hatfield, UK