Archive | multimodal RSS feed for this section

An Amazon Echo in every hotel room?

16 Dec

The Wynn Las Vegas Hotel just announced that it will be installing the Amazon Echo device in every one of its 4,748 guest rooms by Summer 2017. Apparently, hotel guests will be able to use Echo, Amazon’s hands-free voice-controlled speaker, to control room lights, temperature, and drapery, but also some TV functions.

 

CEO Steve Wynn:  “I have never, ever seen anything that was more intuitively dead-on to making a guest experience seamlessly delicious, effortlessly convenient than the ability to talk to your room and say .. ‘Alexa, I’m here, open the curtains, … lower the temperature, … turn on the news.‘ She becomes our butler, at the service of each of our guests”.

 

The announcement does, however, also raise security concerns. The Alexa device is always listening, at least for the “wake word”. This is, of course, necessary for it to work when you actually need it. It needs to know when it is being “addressed” to start recognising what you say and hopefully act on it afterwards. Interestingly, though, according to the Alexa FAQ:

 

When these devices detect the wake word, they stream audio to the cloud, including a fraction of a second of audio before the wake word.

That could get embarrassing or even dangerous, especially if the “wake word” was actually a “false alarm“, i.e. something the guest said to someone else in the room perhaps that sounded like the wake word.

All commands are saved on the device’s History. The question is: Will the hotel automatically wipe the device’s history once a guest has checked out? Or at least before the next guest arrives in the room! Can perhaps every guest have access to their own history of commands, so that they can delete it themselves just before check-out? These are crucial security aspects that the Hotel needs to consider, because it would be a shame for this seamlessly delicious and effortlessly convenient experience to be cut short by paranoid guests switching the Echo off as soon as they enter the room!

2 for 1: Sponsor a Top Speech, NLP & Robotics Event (SPECOM & ICR 2017)

9 Dec

specom

 

Joint SPECOM 2017 and ICR 2017 Conference

The 19th International Conference on Speech and Computer (SPECOM 2017) and the 2nd International Conference on Interactive Collaborative Robotics (ICR 2017) will be jointly held in Hatfield, Hertfordshire on 12-16 September 2017.

SPECOM has been established as one of the major international scientific events in the areas of speech technology and human-machine interaction over the last 20 years. It attracts scientists and engineers from several European, American and Asian countries and every year the Programme Committee consists of internationally recognized experts in speech technology and human-machine interaction of diverse countries and Institutes, which ensure the scientific quality of the proceedings.

SPECOM TOPICS: Affective computing; Applications for human-machine interaction; Audio-visual speech processing; Automatic language identification; Corpus linguistics and linguistic processing; Forensic speech investigations and security systems; Multichannel signal processing; Multimedia processing; Multimodal analysis and synthesis; Signal processing and feature extraction; Speaker identification and diarization; Speaker verification systems; Speech analytics and audio mining; Speech and language resources; Speech dereverberation; Speech disorders and voice pathologies; Speech driving systems in robotics; Speech enhancement; Speech perception; Speech recognition and understanding; Speech translation automatic systems; Spoken dialogue systems; Spoken language processing; Text mining and sentiment analysis; Text-to-speech and speech-to-text systems; Virtual and augmented reality.

Since last year, SPECOM is jointly organised with ICR conference extending the interest also to human-robot interaction. This year the joint conferences will have 3 Special Sessions co-organised by academic institutes from Europe, USA, Asia and Australia.

ICR 2017 Topics: Assistive robots; Child-robot interaction; Collaborative robotics; Educational robotics; Human-robot interaction; Medical robotics; Robotic mobility systems; Robots at home; Robot control and communication; Social robotics; Safety robot behaviour.

Special Session 1: Natural Language Processing for Social Media Analysis

The exploitation of natural language from social media data is an intriguing task in the fields of text mining and natural language processing (NLP), with plenty of applications in social sciences and social media analytics. In this special session, we call for research papers in the broader field of NLP techniques for social media analysis. The topics of interest include (but are not limited to): sentiment analysis in social media and beyond (e.g., stance identification, sarcasm detection, opinion mining), computational sociolinguistics (e.g., identification of demographic information such as gender, age), and NLP tools for social media mining (e.g., topic modeling for social media data, text categorization and clustering for social media).

Special Session 2: Multilingual and Low-Resourced Languages Speech Processing in Human-Computer Interaction

Multilingual speech processing has been an active topic for many years. Over the last few years, the availability of big data in a vast variety of languages and the convergence of speech recognition and synthesis approaches to statistical parametric techniques (mainly deep learning neural networks) have put this field in the center of research interest, with a special attention for low- or even zero-resourced languages. In this special session, we call for research papers in the field of multilingual speech processing. The topics include (but are not limited to): multilingual speech recognition and understanding, dialectal speech recognition, cross-lingual adaptation, text-to-speech synthesis, spoken language identification, speech-to-speech translation, multi-modal speech processing, keyword spotting, emotion recognition and deep learning in speech processing.

Special Session 3: Real-Life Challenges in Voice and Multimodal Biometrics

Complex passwords or cumbersome dongles are now obsolete. Biometric technology offers a secure and user friendly solution to authenticate and have been employed in various real-life scenarios. This special session seeks to bring together researchers, professionals, and practitioners to present and discuss recent developments and challenges in Real-Life applications of biometrics. Topics of interest include (but are not limited to):

Biometric systems and applications; Identity management and biometrics; Fraud prevention; Anti-spoofing methods; Privacy protection of biometric systems; Uni-modalities, e.g. voice, face, fingerprint, iris, hand geometry, palm print and ear biometrics; Behavioural biometrics; Soft-biometrics; Multi-biometrics; Novel biometrics; Ethical and societal implications of biometric systems and applications.

Delegates’ profile

Speech technology, human-machine interaction and human-robot interaction attract a multidisciplinary group of students and scientists from computer science, signal processing, machine learning, linguistics, social sciences, natural language processing, text mining, dialogue systems, affective modelling, interactive interfaces, collaborative and social robotics, intelligent and adaptive systems. The estimated number of delegates which will attend the Joint SPECOM and ICR conferences is approximately 150 participants.

Who should sponsor:

  • Research Organisations
  • Universities and Research Labs
  • Research and Innovation Projects
  • Academic Publishers
  • Innovative Companies

Sponsorship Levels

Based on different sponsorship levels, sponsors will be able to disseminate their research, innovation and/or commercial activities by distributed leaflets/brochures and/or by 3 days booths, in common area with the coffee breaks and poster sessions.

Location

The joint SPECOM and ICR conferences will be held in the College Lane Campus of the University of Hertfordshire, in Hatfield. Hatfield is located 20 miles (30 kilometres) north of London and is connected to the capital via the A1(M) and direct trains to London King’s Cross (20 minutes), Finsbury Park (16 minutes) and Moorgate (35 minutes). It is easily accessible from 3 international airports (Luton, Stansted and Heathrow) via public transportation.

Contact:

Iosif Mporas

i.mporas@herts.ac.uk 

School of Engineering and Technology

University of Hertfordshire

Hatfield, UK

Our META Avatar immortalised in a TEDx talk!

31 Oct

 

meta-avatar-final-athens-pilots

Our very own Niels Taatgen from the University of Groeningen gave a TEDx talk last Summer on “Why computers are not intelligent (yet!)” and why they still have a long way to go.

 

Computers can be better than humans in very specialised tasks, such as chess and Go, but it’s much more difficult to learn how to learn new tasks and to think about intentions and goals other than their own, like humans do. Enter our EU R & D Project METALOGUE .

 

METALOGUE logo

 

Niels briefly shows an example interaction with our META Avatar negotiating the terms of a smoking ban and actually “thinking about the (human) opponent”, a crucial human negotiation and life skill. You can see META from minute 12 onwards but the whole talk is really interesting to watch, so don’t skip to it!

 

EU FP7 logo

Amazon Alexa Developers Conference (10 Oct, London)

7 Oct

Exciting times for Amazon Alexa!

capture

Amazon Echo

  • Alexa has crossed over 3,000 skills in the US Skill Store, up from 1,000 in June;
  • Alexa has just started shipping in the UK and is coming soon to Germany too. Echo and Echo Dot were just made available in the UK, whereas in Germany they are available by invitation for those who want to help shape Alexa as she evolves—the devices will start shipping next month.
  • Amazon has also announced a new Echo Dot,  enabled with new Echo Spatial Perception (ESP) which allows devices to determine which device a user is talking to (meaning that only one will respond, when multiple devices hear the user). The Dot will increase the number of devices Alexa can talk to in the home, creating an innovative customer experience. It will retail for £49.99 and Echo for £149.99.

Here are 2 neat little YouTube videos showing Alexa in action.

 

 

In this context, Amazon is bringing their Alexa training events to Europe in October. Hello Alexa London is on Monday 10th October. 

  • Developers, engineers, QA/testers and anyone who wants to learn how to build skills can participate in the full-day agenda from 8:30am – 4:30pm (+ Happy Hour afterwards!)
  • Business development, strategy and planning, VUX/UX /VUI, account teams, producers and project management professionals can participate in the Alexa Business Development for Agencies session later in the day from 3pm – 4:30pm (and then of course join the Happy Hour!). They can also join the breakfast session, Welcome to Alexa (a 45-minute keynote) and Hello Alexa (a 1-hour session on the basics of creating a skill: what goes into the design, build, test, publish) from 8:30am – 10:45am.
  • Click here to register (although the event is already sold out by now!)  and hope to see you there!

I am really excited at how ubiquitous speech recognition is becoming! It was already ubiquitous as we were dragging it around on our smartphones (Apple SIRI, Google Now), but now it’s penetrated our homes too, crossing over work/personal/family lives. The future is omni-channel but unimodal?!

Meet META, the Meta-cognitive skills Training Avatar!

16 Jun

METALOGUE logo

EU FP7 logo

 

Since November 2013, I’ve had the opportunity to participate in the EU-funded FP7 R & D project, METALOGUE, through my company DialogCONNECTION Ltd, one of 10 Consortium Partners. The project aims to develop a natural, flexible, and interactive Multi-perspective and Multi-modal Dialogue system with meta-cognitive abilities; a system that can:

  • monitor, reason about, and provide feedback on its own behaviour, intentions and strategies, and the dialogue itself,
  • guess the intentions of its interlocutor,
  • and accordingly plan the next step in the dialogue.

The system tries to dynamically adapt both its strategy and behaviour (speech and non-verbal aspects) in order to influence the dialogue partner’s reaction, and, as a result, the progress of the dialogue over time, and thereby also achieve its own goals in the most advantageous way for both sides.

The project is in its 3rd and final year (ending in Oct 2016) and has a budget of € 3,749,000 (EU contribution: € 2,971,000). METALOGUE brings together 10 Academic and Industry partners from 5 EU countries (Germany, Netherlands, Greece, Ireland, and UK).

 

METALOGUE focuses on interactive and adaptive training situations, where negotiation skills play a key role in the decision-making processes. Reusable and customisable software components and algorithms have been developed, tested and integrated into a prototype platform, which provides learners with a rich and interactive environment that motivates them to develop meta-cognitive skills, by stimulating creativity and responsibility in the decision-making, argumentation, and negotiation process. The project is producing a virtual trainer, META, a Training Avatar capable of engaging in natural interaction in English (currently, with the addition of German and Greek in the future), using gestures, facial expressions, and body language.

METALOGUE Avatar

Pilot systems have been developed for 2 different user scenarios: a) debatingand b) negotiation, both tested and evaluated by English-speaking students at the Hellenic Youth Parliament. We are currently targeting various industry verticals, in particular Call Centres, e.g. to semi-automate and enhance Call Centre Agent Training.

 

And here’s META in action!

 

In this video, our full-body METALOGUE Avatar is playing the role of a business owner, who is negotiating a smoking ban with a local Government Counsellor.   Still imperfect (e.g. there is some slight latency before replying – and an embarrassing repetition at some point!), but you can also see the realistic facial expressions, gaze, gestures, and body language, and even selective and effective pauses. It can process natural spontaneous speech in a pre-specified domain (smoking ban, in this case) and it has reached an ASR error rate below 24% (down from almost 50% 2 years ago!). The idea is to use such an Avatar in Call Centres to provide extra training support on top of existing training courses and workshops. It’s not about replacing the human trainer, but rather empowering and motivating Call Centre Trainee Agents who are trying to learn how to read their callers and how to successfully negotiate deals and even complaints with them in an optimal way.

IMG_20151218_143348

 

My company, DialogCONNECTION, is charged with the task of attracting interest and feedback from industry to gauge the relevance and effectiveness of the METALOGUE approach in employee training contexts (esp. negotiation and decision-making). We are looking in particular for Call Centres;both small and agile (serving multiple small clients) and large (and probably plagued by the well-known agent burn-out syndrome). Ideally, you would give us access to real-world Call Centre Agent-Caller/Customer recordings or even simulated Trainer – Trainee phone calls that are used for situational Agent training (either already available or collected specifically for the project). A total of just 15 hours of audio (and video if available) would suffice to train the METALOGUE speech recognisers and the associated acoustic and language models, as well as its metacognitive models.

However, if you don’t want to commit your organisation’s data, any type of input and feedback would make us happy! As an innovative pioneering research project, we really need guidance, evaluation and any input from the real world of industry! So, if we have sparked your interest in any way and you want to get involved and give it a spin, please get in touch!