Thursday, 5 April 2018

Naming Intents

How should you name intents? Heres one way and an explanation as to why.

In this post we described clustering a topic into intents. The naming scheme I used was TopicIntent.

When you go to improve accuracy you will merge and split intents. You tend not to do this outside Topics. I find that if you have the topic name in the intent when you do these changes it is easier to keep your brain in one context.

Cluster Topics

"Happy families are all alike; every unhappy family is unhappy in its own way." the Anna Karenina principle

Some Topics cover loads but you don't really care about the individual intents inside. For example if you have a Complaints topic that could cover all sorts of things people moan about.

No one wants a message back saying "This robot cares we have lost your bags". A complaint question will have to be passed onto a person. If we can tell that person that we have a complaint they can then decide what to do next. If you do not break down complaint topic into intents though all sorts of questions will be in one intent. It will deal with damage, delays, queues, lost items, dirty conditions etc. This giant varied intent will suck in other questions damaging your overall system accuracy.

With a varied topic like complaints that your chatbot cannot handle by itself. If you make one giant intent it will damage your overall accuracy. But because complaints tend to be about a few things at once, 'The food was terrible and the portions were small', there is often not one solid intent anyway. By labelling all complaints ComplaintIntent it is possible to ignore the intent part as getting the topic right is good enough.

In our accuracy tests we can strip the intent part off and say that if we land in Complaint that is good enough. But not create on giant intent that covers too much and that will suck in all other questions.

This issue of big topic particularly happens with Off Topic topics where questions are out of scope, silly or just cover large areas that you can't really answer.

There are other ways to label intents. This TopicIntent method is what I use. If you have a different way please mention it in the comments.

Wednesday, 4 April 2018

Clustering Questions Part 2: Intentions

Once you have divided your questions into Topics the next step is to divide them into Intents. This is how I would find the intents inside a topic

An Intent is a purpose or goal expressed by a user’s input such as finding contact information or booking a trip.

Imagine you had an airline booking chatbot. And you had these questions in the Topic booking

There is a dataset of travel questions here I will take some questions from there and invent some myself

A booking topic could have

Question Intent
I'd like to book a trip to Atlantis from Caprica on May 13 BookTicket
I'd like to book a trip from Chicago to San Diego between Aug 26th and Sept 5th BookTicket
i wanna go to Kobe whats available? BookTicket
Can I get information for a trip from Toluca to Paris on August 25th? BookTicket
I'd like to book a trip to Tel Aviv from Tijuana. I was wondering if there are any packages from August 23rd to 26th BookTicket
I want to know how far in advance I can book a flight BookFuture
When do bookings open for 6 months time BookFuture
I want to get a ticket for my christmas flight home BookFuture
Can I check my booking? BookCheck
can i check my booking status BookCheck
can i check the status of my booking BookCheck
how do i check the status of my booking BookCheck
i need to check my booking status BookCheck
Let me know the status of my Booking BookCheck
Can I book now pay later BookPay
How can I pay for a booking? BookPay

On this the verbs Check and Pay each seem to form an intent. There is one intent on When bookings can happen. And a few unknowns that might make more sense when we have more questions later.

At this stage realise you are going to make mistakes and have to go back over your intents as you learn by doing. Fixing your intents once you have had a first cut at defining I will come back to.

One could way to find intents in a topic is to look for verbs. Unrelated actions tend to have different verbs. In this case the topic Booking is already a verb and a noun. This is common enough. Dual meanings like this can be a nightmare with Entities but that is another blogpost.

In this topic something like 'cancel a booking' is likely to be an intention. Here Cancel is the verb and booking as the object of the sentence.

Other clues to the intention are the Lexical Answer Type, the subject and the object The LAT is the type of question. Who questions have different types of answers to When questions. In practise I don't find you commonly use the LAT to define intentions.

One possible exception to this is definitional questions where users ask "What is a..." for a domain term to be explained. If more than 5% of your questions are definitional you may not have collected representitive questions as manufactured questions by non real users or real users forced to ask questions tend to be definitional. When someone runs out of real questions they will ask 'What is a booking'.

The Subject of the sentence is also rarely useful. Sometimes who is doing an action changes the answer but usually there is a set scheme to buy, book, cancel etc and who is doing it doesn't matter.

The Object of the sentence is more often useful. Frequently an intention is a combination of the verb and what it is being done to. Whichever one isn't the Topic is usually the intent. Booking might be a topic and various things you do with a booking would be intents.

In summary go through each topic. If there are verbs shared across questions they might go together in an intent. But you have to use the domain experts knowledge of what questions have the same intention this step cannot be automated.

Tuesday, 3 April 2018

Clustering Questions into Topics

IBM Watson used to claim it took 15 minutes to match up a question with an intent. The technique described here halves that time. Context switching is mentally draining and wastes a lot of time. Concentrating on one part of a job until it is done is much more efficient than switching between tasks.

In a similar way once we have our questions collected the next task is to divide them into topics. Then these topics will be looked at individually.

A topic is a category of types of questions people will ask your chatbot.

In an Airline these might be Checkin, Booking, Airmiles

In an Insurance company Renewal, Claim, Coverage

Before looking at the questions try think of 5 topics that might occur in customer questions to your business.

How many topics?

Roughly 20. You might have ten or 30. A rule of thumb used in K Nearest Neighbour classification is if you have N documents you expect to have Sqrt(N) clusters. This works out as 44 for 2000 clusters. You won't have 2000 questions at this stage more likely under 1000.

Can you automate discovering topics

Yes you can using a KNN algorithm with the number of clusters given above. No you really should not. You learn a hell of a lot clustering 500 questions. You will have to read all these questions eventually anyway so you might as well learn this stuff now.

Process of Marking up topics

Say you have 500 questions in a spreadsheet. What we are trying to do here is mark up a new column 'Topic' that puts each of these questions in a topic.

Go through your 500 questions. Looking for the 5 topics you listed in question above. You may find that actually what you thought was one cluster is two. Or that a topic you expected is missing If you are looking for the clothesReturn topic I would search for the key words 'return' and 'bring back'. I would look for the obvious words in each of the topics I expect.

Once I had marked up the obvious keywords from my list of 500 that were clothesReturn if I found a new question in that topic I would look for the word it had that showed me it was that topic but was not in my original search list

Can I exchange a jumper I bought yesterday for a new one

I would then search for other uses of 'exchange'. It is a word likely to be used in clothesReturn but one I missed earlier.

If you know the domain roughly half of the questions will be classified by your obvious keywords.

I would read through the remaining questions with my 5 expected topics in my head. If I see something that is obviously a new topic I add that to the topic list.

Feel free with marking 5-10% of questions with unknown. these might make more sense when you have more questions or might be part of the long tail, out of scope or off topic that your chatbot will not handle.

What Next

Once you have a spreadsheet with a column marked up with the Topic of each question the next step is to find the intent of each question. But now you are reviewing a series of questions in one topic which makes it much easier to concentrate and work in a batch mode.

I will describe this step of marking up intents in a later blogpost

Monday, 5 February 2018

An alternative way of training Watson Discovery Service

Watson Discovery Service (WDS) provides an excellent natural language query service. This service works well out of the box, but many users like to improve the results for their particular domain by training the service. In order to train the service how to better rank the results of natural language query you need to provide the service with some sample queries and for each query indicate which documents are good results for this query and equally importantly which documents would be a bad result for the query.

The standard user interface to the training capability allows you to view the potential results in a browser and then click on a button to indicate if the result is good or bad. Clicking on the results is easy for a small sample of queries, but it quickly becomes tedious. For this reason, many users prefer to use the API for the training service which gives additional control and capabilities.

Unfortunately the WDS training service only works well with large amounts of training data and in many cases it is not feasible to collect this volume of training data. Luckily there is an alternative (homegrown) way of training WDS which works significantly better for small amounts of training data. The method (which is known as hinting) is amazingly simple. All you need to do is add a new field to your target documents (e.g. named hints) with the text of the question that you want the document to be selected as an answer. Obviously when you as this question (or a similar question) the natural language query engine will select your target document and rank it highly since it is clearly a good match.

This alternative training method is sometimes called hinting because you are providing hints to WDS about which questions this document provides and answer. An additional benefit of this training method is that it helps find matches where the question and the answer document don't have any words in common. Whereas, the standard WDS training method only impacts upon the ranking of results so if the answer document you want to be selected is not even in the list of top 100 answers fetched for the query the normal training would not help.

Tuesday, 30 January 2018

Visualizing Chatbot Quality with Swarm Plot

When you create a chatbot you frequently want to see where it is going wrong so that you can fix problems. When you look at the logs or run tests you get results of the form

Question, Correct Intent, Returned Intent, Confidence

Can I update my account settings?,Update,Check,0.332

I have a demo dataset here you can use to follow along with the code. Swarm csv

Usually the confusion matrix of which intentions are mixed up with each other can be shown with a heatmap. But an interesting visualisation for this type of data is a swarm plot using the Python seaborn library. There is a nice guide to the seaborn visualization library here

# Pandas for managing datasets
import pandas as pd
# Matplotlib for additional customization
from matplotlib import pyplot as plt
%matplotlib inline
# Seaborn for plotting and styling
import seaborn as sns
#read in the csv
df = pd.read_csv('swarm.csv', index_col=0, encoding='mac_roman')
df.columns = ['Intent', 'Expected','Confidence']
#draw the swarm chart
swarm_plot = sns.swarmplot(y='Confidence',
plt.legend(bbox_to_anchor=(1, 1), loc=2,title='Got')
plt.title('Swarm Report')

The graph shows you which intentions are being mixed up and the confidence that your chatbot has in its answers.

Tuesday, 16 January 2018

Combining the annotation capabilities of both Watson Knowledge Studio and Watson Discovery Service

Watson Discovery Service (WDS) provides a capability to automatically annotate the documents being ingested. This capability is available in several languages and it is able to recognize a wide range of entity types commonly found in typical texts written in these languages.

Unfortunately many users of WDS have to deal with documents which are not typical. For example, they could be dealing with medical documents that contain unusual drug and disease names or they could be dealing with a business domain that has obscure terminology that would not be understood by WDS (or indeed by most speakers of the language in question).

Luckily Watson Knowledge Studio (WKS) is can be used to create a language model that understands the specialized terminology for any domain. However many document collections will contain a mixture of specialized terminology and normal test. By default, when users choose to specify that a customized WKS domain model is to be used instead of the generic WDS model it is as a replacement and none of the normal entities will be annotated by WDS.

It is not feasible for users to build a complete WKS model that incorporates all of the normal language dictionaries as well as the specialized domain terminology. However, there is a trick which can be used to get WDS to use both the domain specific annotator from WKS and the generic language annotator from WDS.

Unfurtunately this trick is not possible with the normal WDS UI, but it requires the use of the REST API - hopefully you are already familiar with this and you should be able to export your configuration to a JSON file. Assuming that you have configured a number of enrichments for the field named "text" you will see that your configuration contains a fragment that looks something like the following:

  "enrichments": [
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}

This fragment means that you have selected a number of different enrichment types to be computed for the text field and the results to be placed in the field named "enriched_text". For most of these enrichments you will use the language model which is provided with the natural language understanding unit that is built into WDS, but for entities it will instead rely upon the WKS model ID "a3398f8b-2282-4fdc-b062-227a162dc0eb".

If you want to have the core WDS detected entities as well as the WKS detected ones, you need to define an additional enrichment entry in your configuration file to place these enrichments in a different named field e.g. wds_enriched_text. The fragment of JSON above needs to be replaced with the fragment below and then the new configuration should be uploaded via the API.

  "enrichments": [
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "wds_enriched_text",
      "options": {
        "features": {
          "entities": {}

What this configuration will produce is two different enrichment fields containing the entities detected by WDS and WKS. However, it is likely that you want to have all of the detected entities available in a single field. Luckily this is possible by configuring the collection to merge the two fields during the "Normalize" phase.

Wednesday, 20 September 2017

Adding a speech interface to the Watson Conversation Service

The IBM Watson Conversation Service does a great job of providing an interface that closely resembles a conversation with a real human being. However, with the advent of products like the Amazon Echo, Microsoft Cortana and the Google Home, people increasingly prefer to interact with services by speaking rather than typing. Luckily IBM Watson also has Text to Speech and Speech to Text services. In this post we show how to hook these services together to provide a unified speech interface to Watson's capabilities.

In this blog we will build upon the existing SpeechToSpeech sample which takes text spoken in one language and then leverages Watson's machine translation service to speak it back to you in another language. You can try the application described here on Bluemix or access the code on GitHub to see how you can customise the code and/or deploy on your own server.

This application has only one page and it is quite simple from the user's point of view.
  • At the top there is some header text introducing the sample and telling users how to use it. 
  • The sample uses some browser audio interfaces that are only available in recent browser versions. If we detect that these features are not present we put up a message telling the user that they need to choose a more modern browser. Hopefully you won't ever see this message.
  • In the original sample there are two drop down selection boxes which allow you to specify the source and target language. We removed these drop downs since they are not relevant to our modified use case.
  • The next block of the UI gives the user a number of different ways to enter speech samples:
    • There is a button   which allows you to start capturing audio directly from the microphone. Whatever you say will be buffered and then passed directly to the transcription service. While capturing audio, the button changes colour to red and the icon changes  - this is a visual indication that recording is in progress. When you are finished talking, click the button again to stop audio capture.
    • If are working in a noisy environment or if you don't have a good quality microphone, it might be difficult for you to speak clearly to Watson. To help solve this problem we have provided you with some ample files hosted in the web app. To play one of these samples click on one of the buttons to play the associated file and use it as input.
    • If you have your own recording that you can click on the  button and select the file containing the audio input that you want to send to the speech-to-text service.
    • Last, but not least, you can drag and drop an audio file onto the page to have it instantly uploaded
  • The transcribed text is displayed on an input box (so you can see if Watson is hearing properly) and sent to either the translation service (in the original version) or the conversation service in our updated service. If there is a problem with the way your voice is being transcribed, see this previous article on how to improve it.
  • When we get a response from the conversation or translation service we place the received text on an output text box and we also call the text-to-speech service to read out the response and save you the bother of having to read.
I know that you want to understand what is going on under the covers so here is a brief overview:
  • The app.js file is the core of the web application. It implements the connections between the front end code that runs in the browser and the various Watson services. This involves establishing 3 back-end REST services. This indirection is needed because you don't want to include your service credentials in the code sent to the browser and because your browser's cross site script protections will prohibit you from making a direct call to the Watson service from your browser. The services are
    • /message - this REST service implements the interface to the Watson Conversation service. Every time we have a text utterance transcribed, we do a POST on this URL with a JSON payload like {"context":{...},"input":{"text":"<transcribed_text>"}}. The first time we call the service we specify an empty context {} and in each subsequent call we supply the context object that the server sent back to us the last time. This allows the server to keep track of the state of the conversation.
      Most conversation flows are programmed to give a trite greeting in response to the first message. To avoid spending time on this the client code sends initial blank message when the page loads to get this out of the way.
    • /synthesize - this REST service use used to convert the response into audio. All that this service does to convert a get on http://localhosts:3000/synthesize?voice=en-US_MichaelVoice&text=Some%20responsevoice=en-US_MichaelVoice&text=Some%20response into a get on the URL this will return a .wav file with the text "some response" being spoken in US English by the voice "Michael". 
    • /token - the speech to text transcription is an exception to the normal rule that your browser shouldn't connect directly to the Watson service. For performance reasons we chose to use the websocket interface to the speech to text service. At page load time, the browser will do a GET on this /token REST service and it will respond with a token code that can then be included in the URL used to open the websocket. After this, all sound information captured from the microphone (or read from a sample file) is sent via the websocket directly from the browser to the Watson speech to text service.
  • The index.html file is the UI that the user sees. 
    • As well as defining the main UI elements which appear on the page, it also  includes main.js which is the client side code that handles all interaction in your browser.
    • It also includes the JQuery and Bootstrap modules. But I won't cover these in detail.
  • You might want to have a closer look at the client side code which is contained in a file public/js/main.js:
    • The first 260 lines of code are concerned with how to capture audio from the client's microphone (if the user allows it - there are tight controls on when/if browser applications are allowed to capture audio). Some of the complexity of this code is due to the different ways that different browsers deal with audio. Hopefully it will become easier in the future. 
    • Regardless of what quality audio your computer is capable of tracking, we down sample it to 16bit, mono at 16 Khz because this is what the speech recognition is expecting.
    • Next we declare which language model we want to use for speech recognition. We have hardcoded this to a model named "en-GB_BroadbandModel" which is a model tuned to work with high fidelity captures of of speakers of UK English (sadly there is no language model available for Irish English). However, we have left in a few other language models commented out to make it easy for you if you want to change to another language. Consult the Watson documentation for a full list of language models available.
    • The handleFileUpload function deals with file uploads. Either file uploads which happen as a result of explicitly clicking on the "Select File" button or upload that happen as a result of a drag-and-drop event.
    • The initSocket function manages with the interface to the websicket that we use to communicate to/from the speech_to_text service. It declares that the showResult function should be called when a response is received. Since it is not always clear when a spaker is finnished talking, the text-to-speech can return several times. As a result the msg.results[0].final variable is used to deremine if the current transcription is final. If it is an intermediate result, we just update the resultsText field with what we heard. If it is the final result, the msg.results[0].alternatives[0].transcript variable is also used as the most likely transcription of what the user said and it is passed on to the converse function.
    • The converse function handles sending the detected text to the Watson Conversation Service (WCS) via the /message REST interface which was descibed above. When the service gives a response to the question, we pass it to the text-to-speech service via the TTS function and we write it on the response textarea so it can be read as well as listened to.
  • In addition there are many other files which control the look and feel of the web page, but won't be described in detail here e.g. 
    • Style sheets in the /public/css directory
    • Audio sample files in the /public/audio directory
    •  Images in the public/images directory
    • etc.
Anyone with a knowledge of how web applications work, should be able to figure out how it works. If you have any trouble, post your question as a comment on this blog.
At the time of writing, there is an instance of this application running at so you can see it running even if you are having trouble with your local deployment. However, I can't guarantee that this instance will stay running due to limits on mypersonal Bluemix account.