Tag Archives: NLU

‘statistics ≠ understanding’​

7 May
‘statistics ≠ understanding’​

I recently read an article on a new approach to Common Sense understanding, which uses a combination of traditional, Good Old-Fashioned AI (GOFAI) symbolic and the latest data-intensive Machine Learning (ML) / Deep Learning neural network approaches to deal with the hard problem of human reasoning. Here’s a link to the article (with thanks to Phillip Hunter for the pointer!):

My favourite quote from the article is:

‘statistics ≠ understanding'”

That’s because (another favourite quote):

“common sense, like natural language, remains fundamentally fuzzy”

I was delighted to read about this research, especially because almost 30 years ago, when I was doing my PhD at the University of Manchester, I, too, realised that the only promising way to capture this fuzziness, ambiguity and complexity of language and meaning is through a hybrid approach, combining hand-crafted “rules” (human annotations, i.e. symbolic processing) with the automatic weight distribution and semi-supervised learning of a neural network (connectionist processing).

No alt text provided for this image
No alt text provided for this image
My PhD Thesis

Thus, I used text annotations generated by humans, which encoded morphosyntactic / grammatical, lexical-semantic and discourse pragmatic features of each sentence in a news article.

No alt text provided for this image
sentence annotations with discourse pragmatic features

I would then feed them into a basic feed-forward backpropagation neural network (ANN) that would calculate the degree of “importance” of each sentence in the whole article and generate a YES or NO answer to the question whether that specific sentence would be included (not necessarily verbatim) in the final summary of that news article.

ANN decides the degree of importance of a sentence in a summary

It was a neat idea, very imperfectly executed, as both the data set was not that large for today’s standards (1,100 sentences representing 55 news articles) and the ANN barely had 3 layers and the single hidden layer only had 30 units (so very skin-deep learning!).

You can find my PhD thesis as a PDF below:

My PhD thesis only scratched the surface. It’s awesome to see a similarly hybrid approach now gaining momentum! We now have both the huge data collections and the sophisticated Deep Learning algorithms to try out different things and better copy and simulate human intelligence in AI systems and, hence, achieve deeper understanding and generate more relevant and useful responses and actions. This will also contribute to more Explainable AI and, by extension, more Explainable Conversational AI for transparency and reusability in Voice User Interface (VUI) Design.

My baby, DialogCONNECTION, is 11!

4 Dec

This week, my company, DialogCONNECTION Limited, turned 11 years old! 🎉 🥂 😁

It feels like yesterday, when in December 2008 I registered it with Companies House and became Company Director (with multiple hats).

My very first client project was for the NHS Business Authority on their EHIC Helpline (which hopefully will survive the Brexit negotiations). Back then, whenever I was telling anyone what my company does (VUI Design for Speech IVRs), I was greeted by blank stares of confusion or incomprehension. It did feel a bit lonely at times!

Many more clients and thousands of long hours, long days and working weekends since, here we are in December 2019 and I suddenly find myself surrounded by VUI Designers and Voice Strategists who have now seen the potential and inescapable nature of speech interfaces and have followed on my footsteps. I feel vindicated, especially since I started in Voice back in 1996 with my Post-Doc in Spoken Dialogue Management at the University of Erlangen! 😎 (Yet another thing I’m hugely grateful to the EU for!)

We started with Voice-First VUI Design back in 1996, well before Samsung’s BIXBY (2017), Google’s ASSISTANT (2016), Amazon’s ALEXA (2014), Apple’s SIRI (2010) and even before the world started using GOOGLE for internet searches (1998)!

http://dialogconnection.com/who-designs-for-you.html

It’s quite frustrating when I realise that many of these newcomers have never heard of an IVR (Interactive Voice Response) system before, but they will eventually learn. 🤓 In the past 25 years it was the developers who insisted could design conversational interfaces without any (Computational) Linguistics, Natural Language Processing (NLP) or Speech Recognition (ASR) background and didn’t need, therefore, a VUI Designer. And we were an allegedly superfluous luxury and rarity in those times. In the past couple of years it’s the shiny Marketing people, who make a living from their language mastery, and the edgy GUI Designers, who excell in visual design and think they can design voice interfaces too, but still know nothing about NLP or ASR.

What they don’t know is that, by modifying, for instance, just the wording of what your system says (prompt tuning), you can achieve dramatically better speech recognition and NLU accuracy, because the user is covertly “guided” to say what we expect (and have covered in the grammar). The same holds for tuned grammars (for out-of-vocabulary words), word pronunciations (for local and foreign accents), tuned VUI designs (for error recovery strategies) and tuned ASR engine parameters (for timeouts and barge-ins). It’s all about knowing how the ASR software and our human brain language software works.

Excited to see what the next decade is going to bring for DialogCONNECTION and the next quarter of a century for Voice! Stay tuned!

Towards EU collaboration on Conversational AI, Data & Robotics

22 Nov

I was really interested to read the BDVA – Big Data Value Association‘s and euRobotics‘ recent report on “Strategic Research, Innovation and Deployment Agenda for an AI PPP: A focal point for collaboration on Artificial Intelligence, Data and Robotics“, which you can find here.

Of particular relevance to me was the Section on Physical and Human Action and Interaction (pp. 39-41), which describes the dependencies, challenges and expected outcome of coordinated action on NLP, NLU and multimodal dialogue processing. The associated challenges are:

  • Natural interaction in unstructured contexts, which is the default in the case of voice assistants for instance, as they are expected to hold a conversation on any of a range of different topics and act on them
  • Improved natural language understanding, interaction and dialogue covering all European languages and age ranges, thus shifting the focus from isolated recognition to the interpretation of the semantic and cultural context, and the user intention
  • Development of verbal and non-verbal interaction models for people and machines, underlining the importance of gestures and emotion recognition and generation (and not only in embodied artificial agents)
  • Co-development of technology and regulation to assure safe interaction in safety-critical and unstructured environments, as the only way to assure trust and, hence, widespread citizen and customer adoption
  • The development of confidence measures for interaction and the interpretation of actions, leading to explanable AI and, hence, improved and more reliable decision-making
No alt text provided for this image

You can find the excellent and very comprehensive report here.

UBIQUITOUS VOICE: Essays from the Field now on Kindle!

14 Oct

In 2018, a new book on “Voice First” came out on Amazon and I was proud and deeply honoured, as it includes one of my articles! Now it has come out on Kindle as an e-Book and we are even more excited at the prospect of a much wider reach!

“Ubiquitous Voice: Essays from the Field”: Thoughts, insights and anecdotes on Speech Recognition, Voice User Interfaces, Voice Assistants, Conversational Intelligence, VUI Design, Voice UX issues, solutions, Best practices and visions from the veterans!

I have been part of this effort since its inception, working alongside some of the pioneers in the field who now represent the Market Leaders (GOOGLE, AMAZON, NUANCE, SAMSUNG VIV .. ). Excellent job by our tireless and intrepid Editor, Lisa Falkson!

My contribution “Convenience + Security = Trust: Do you trust your Intelligent Assistant?” is on data privacy concerns and social issues associated with the widespread adoption of voice activation. It is thus platform-, ASR-, vendor- and company-agnostic.

You can get the physical book here and the Kindle version here.

Prepare to be enlightened, guided and inspired!