Tag Archives: Artificial Intelligence

My baby, DialogCONNECTION, is 11!

4 Dec

This week, my company, DialogCONNECTION Limited, turned 11 years old! 🎉 🥂 😁

It feels like yesterday, when in December 2008 I registered it with Companies House and became Company Director (with multiple hats).

My very first client project was for the NHS Business Authority on their EHIC Helpline (which hopefully will survive the Brexit negotiations). Back then, whenever I was telling anyone what my company does (VUI Design for Speech IVRs), I was greeted by blank stares of confusion or incomprehension. It did feel a bit lonely at times!

Many more clients and thousands of long hours, long days and working weekends since, here we are in December 2019 and I suddenly find myself surrounded by VUI Designers and Voice Strategists who have now seen the potential and inescapable nature of speech interfaces and have followed on my footsteps. I feel vindicated, especially since I started in Voice back in 1996 with my Post-Doc in Spoken Dialogue Management at the University of Erlangen! 😎 (Yet another thing I’m hugely grateful to the EU for!)

We started with Voice-First VUI Design back in 1996, well before Samsung’s BIXBY (2017), Google’s ASSISTANT (2016), Amazon’s ALEXA (2014), Apple’s SIRI (2010) and even before the world started using GOOGLE for internet searches (1998)!

http://dialogconnection.com/who-designs-for-you.html

It’s quite frustrating when I realise that many of these newcomers have never heard of an IVR (Interactive Voice Response) system before, but they will eventually learn. 🤓 In the past 25 years it was the developers who insisted could design conversational interfaces without any (Computational) Linguistics, Natural Language Processing (NLP) or Speech Recognition (ASR) background and didn’t need, therefore, a VUI Designer. And we were an allegedly superfluous luxury and rarity in those times. In the past couple of years it’s the shiny Marketing people, who make a living from their language mastery, and the edgy GUI Designers, who excell in visual design and think they can design voice interfaces too, but still know nothing about NLP or ASR.

What they don’t know is that, by modifying, for instance, just the wording of what your system says (prompt tuning), you can achieve dramatically better speech recognition and NLU accuracy, because the user is covertly “guided” to say what we expect (and have covered in the grammar). The same holds for tuned grammars (for out-of-vocabulary words), word pronunciations (for local and foreign accents), tuned VUI designs (for error recovery strategies) and tuned ASR engine parameters (for timeouts and barge-ins). It’s all about knowing how the ASR software and our human brain language software works.

Excited to see what the next decade is going to bring for DialogCONNECTION and the next quarter of a century for Voice! Stay tuned!

Towards EU collaboration on Conversational AI, Data & Robotics

22 Nov

I was really interested to read the BDVA – Big Data Value Association‘s and euRobotics‘ recent report on “Strategic Research, Innovation and Deployment Agenda for an AI PPP: A focal point for collaboration on Artificial Intelligence, Data and Robotics“, which you can find here.

Of particular relevance to me was the Section on Physical and Human Action and Interaction (pp. 39-41), which describes the dependencies, challenges and expected outcome of coordinated action on NLP, NLU and multimodal dialogue processing. The associated challenges are:

  • Natural interaction in unstructured contexts, which is the default in the case of voice assistants for instance, as they are expected to hold a conversation on any of a range of different topics and act on them
  • Improved natural language understanding, interaction and dialogue covering all European languages and age ranges, thus shifting the focus from isolated recognition to the interpretation of the semantic and cultural context, and the user intention
  • Development of verbal and non-verbal interaction models for people and machines, underlining the importance of gestures and emotion recognition and generation (and not only in embodied artificial agents)
  • Co-development of technology and regulation to assure safe interaction in safety-critical and unstructured environments, as the only way to assure trust and, hence, widespread citizen and customer adoption
  • The development of confidence measures for interaction and the interpretation of actions, leading to explanable AI and, hence, improved and more reliable decision-making
No alt text provided for this image

You can find the excellent and very comprehensive report here.

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!

An Amazon Echo in every hotel room?

16 Dec

The Wynn Las Vegas Hotel just announced that it will be installing the Amazon Echo device in every one of its 4,748 guest rooms by Summer 2017. Apparently, hotel guests will be able to use Echo, Amazon’s hands-free voice-controlled speaker, to control room lights, temperature, and drapery, but also some TV functions.

 

CEO Steve Wynn:  “I have never, ever seen anything that was more intuitively dead-on to making a guest experience seamlessly delicious, effortlessly convenient than the ability to talk to your room and say .. ‘Alexa, I’m here, open the curtains, … lower the temperature, … turn on the news.‘ She becomes our butler, at the service of each of our guests”.

 

The announcement does, however, also raise security concerns. The Alexa device is always listening, at least for the “wake word”. This is, of course, necessary for it to work when you actually need it. It needs to know when it is being “addressed” to start recognising what you say and hopefully act on it afterwards. Interestingly, though, according to the Alexa FAQ:

 

When these devices detect the wake word, they stream audio to the cloud, including a fraction of a second of audio before the wake word.

That could get embarrassing or even dangerous, especially if the “wake word” was actually a “false alarm“, i.e. something the guest said to someone else in the room perhaps that sounded like the wake word.

All commands are saved on the device’s History. The question is: Will the hotel automatically wipe the device’s history once a guest has checked out? Or at least before the next guest arrives in the room! Can perhaps every guest have access to their own history of commands, so that they can delete it themselves just before check-out? These are crucial security aspects that the Hotel needs to consider, because it would be a shame for this seamlessly delicious and effortlessly convenient experience to be cut short by paranoid guests switching the Echo off as soon as they enter the room!

2 for 1: Sponsor a Top Speech, NLP & Robotics Event (SPECOM & ICR 2017)

9 Dec

specom

 

Joint SPECOM 2017 and ICR 2017 Conference

The 19th International Conference on Speech and Computer (SPECOM 2017) and the 2nd International Conference on Interactive Collaborative Robotics (ICR 2017) will be jointly held in Hatfield, Hertfordshire on 12-16 September 2017.

SPECOM has been established as one of the major international scientific events in the areas of speech technology and human-machine interaction over the last 20 years. It attracts scientists and engineers from several European, American and Asian countries and every year the Programme Committee consists of internationally recognized experts in speech technology and human-machine interaction of diverse countries and Institutes, which ensure the scientific quality of the proceedings.

SPECOM TOPICS: Affective computing; Applications for human-machine interaction; Audio-visual speech processing; Automatic language identification; Corpus linguistics and linguistic processing; Forensic speech investigations and security systems; Multichannel signal processing; Multimedia processing; Multimodal analysis and synthesis; Signal processing and feature extraction; Speaker identification and diarization; Speaker verification systems; Speech analytics and audio mining; Speech and language resources; Speech dereverberation; Speech disorders and voice pathologies; Speech driving systems in robotics; Speech enhancement; Speech perception; Speech recognition and understanding; Speech translation automatic systems; Spoken dialogue systems; Spoken language processing; Text mining and sentiment analysis; Text-to-speech and speech-to-text systems; Virtual and augmented reality.

Since last year, SPECOM is jointly organised with ICR conference extending the interest also to human-robot interaction. This year the joint conferences will have 3 Special Sessions co-organised by academic institutes from Europe, USA, Asia and Australia.

ICR 2017 Topics: Assistive robots; Child-robot interaction; Collaborative robotics; Educational robotics; Human-robot interaction; Medical robotics; Robotic mobility systems; Robots at home; Robot control and communication; Social robotics; Safety robot behaviour.

Special Session 1: Natural Language Processing for Social Media Analysis

The exploitation of natural language from social media data is an intriguing task in the fields of text mining and natural language processing (NLP), with plenty of applications in social sciences and social media analytics. In this special session, we call for research papers in the broader field of NLP techniques for social media analysis. The topics of interest include (but are not limited to): sentiment analysis in social media and beyond (e.g., stance identification, sarcasm detection, opinion mining), computational sociolinguistics (e.g., identification of demographic information such as gender, age), and NLP tools for social media mining (e.g., topic modeling for social media data, text categorization and clustering for social media).

Special Session 2: Multilingual and Low-Resourced Languages Speech Processing in Human-Computer Interaction

Multilingual speech processing has been an active topic for many years. Over the last few years, the availability of big data in a vast variety of languages and the convergence of speech recognition and synthesis approaches to statistical parametric techniques (mainly deep learning neural networks) have put this field in the center of research interest, with a special attention for low- or even zero-resourced languages. In this special session, we call for research papers in the field of multilingual speech processing. The topics include (but are not limited to): multilingual speech recognition and understanding, dialectal speech recognition, cross-lingual adaptation, text-to-speech synthesis, spoken language identification, speech-to-speech translation, multi-modal speech processing, keyword spotting, emotion recognition and deep learning in speech processing.

Special Session 3: Real-Life Challenges in Voice and Multimodal Biometrics

Complex passwords or cumbersome dongles are now obsolete. Biometric technology offers a secure and user friendly solution to authenticate and have been employed in various real-life scenarios. This special session seeks to bring together researchers, professionals, and practitioners to present and discuss recent developments and challenges in Real-Life applications of biometrics. Topics of interest include (but are not limited to):

Biometric systems and applications; Identity management and biometrics; Fraud prevention; Anti-spoofing methods; Privacy protection of biometric systems; Uni-modalities, e.g. voice, face, fingerprint, iris, hand geometry, palm print and ear biometrics; Behavioural biometrics; Soft-biometrics; Multi-biometrics; Novel biometrics; Ethical and societal implications of biometric systems and applications.

Delegates’ profile

Speech technology, human-machine interaction and human-robot interaction attract a multidisciplinary group of students and scientists from computer science, signal processing, machine learning, linguistics, social sciences, natural language processing, text mining, dialogue systems, affective modelling, interactive interfaces, collaborative and social robotics, intelligent and adaptive systems. The estimated number of delegates which will attend the Joint SPECOM and ICR conferences is approximately 150 participants.

Who should sponsor:

  • Research Organisations
  • Universities and Research Labs
  • Research and Innovation Projects
  • Academic Publishers
  • Innovative Companies

Sponsorship Levels

Based on different sponsorship levels, sponsors will be able to disseminate their research, innovation and/or commercial activities by distributed leaflets/brochures and/or by 3 days booths, in common area with the coffee breaks and poster sessions.

Location

The joint SPECOM and ICR conferences will be held in the College Lane Campus of the University of Hertfordshire, in Hatfield. Hatfield is located 20 miles (30 kilometres) north of London and is connected to the capital via the A1(M) and direct trains to London King’s Cross (20 minutes), Finsbury Park (16 minutes) and Moorgate (35 minutes). It is easily accessible from 3 international airports (Luton, Stansted and Heathrow) via public transportation.

Contact:

Iosif Mporas

i.mporas@herts.ac.uk 

School of Engineering and Technology

University of Hertfordshire

Hatfield, UK

Our META Avatar immortalised in a TEDx talk!

31 Oct

 

meta-avatar-final-athens-pilots

Our very own Niels Taatgen from the University of Groeningen gave a TEDx talk last Summer on “Why computers are not intelligent (yet!)” and why they still have a long way to go.

 

Computers can be better than humans in very specialised tasks, such as chess and Go, but it’s much more difficult to learn how to learn new tasks and to think about intentions and goals other than their own, like humans do. Enter our EU R & D Project METALOGUE .

 

METALOGUE logo

 

Niels briefly shows an example interaction with our META Avatar negotiating the terms of a smoking ban and actually “thinking about the (human) opponent”, a crucial human negotiation and life skill. You can see META from minute 12 onwards but the whole talk is really interesting to watch, so don’t skip to it!

 

EU FP7 logo

Amazon Alexa Developers Conference (10 Oct, London)

7 Oct

Exciting times for Amazon Alexa!

capture

Amazon Echo

  • Alexa has crossed over 3,000 skills in the US Skill Store, up from 1,000 in June;
  • Alexa has just started shipping in the UK and is coming soon to Germany too. Echo and Echo Dot were just made available in the UK, whereas in Germany they are available by invitation for those who want to help shape Alexa as she evolves—the devices will start shipping next month.
  • Amazon has also announced a new Echo Dot,  enabled with new Echo Spatial Perception (ESP) which allows devices to determine which device a user is talking to (meaning that only one will respond, when multiple devices hear the user). The Dot will increase the number of devices Alexa can talk to in the home, creating an innovative customer experience. It will retail for £49.99 and Echo for £149.99.

Here are 2 neat little YouTube videos showing Alexa in action.

 

 

In this context, Amazon is bringing their Alexa training events to Europe in October. Hello Alexa London is on Monday 10th October. 

  • Developers, engineers, QA/testers and anyone who wants to learn how to build skills can participate in the full-day agenda from 8:30am – 4:30pm (+ Happy Hour afterwards!)
  • Business development, strategy and planning, VUX/UX /VUI, account teams, producers and project management professionals can participate in the Alexa Business Development for Agencies session later in the day from 3pm – 4:30pm (and then of course join the Happy Hour!). They can also join the breakfast session, Welcome to Alexa (a 45-minute keynote) and Hello Alexa (a 1-hour session on the basics of creating a skill: what goes into the design, build, test, publish) from 8:30am – 10:45am.
  • Click here to register (although the event is already sold out by now!)  and hope to see you there!

I am really excited at how ubiquitous speech recognition is becoming! It was already ubiquitous as we were dragging it around on our smartphones (Apple SIRI, Google Now), but now it’s penetrated our homes too, crossing over work/personal/family lives. The future is omni-channel but unimodal?!