Thursday, 5 April 2018

Naming Intents

How should you name intents? Heres one way and an explanation as to why.

In this post we described clustering a topic into intents. The naming scheme I used was TopicIntent.

When you go to improve accuracy you will merge and split intents. You tend not to do this outside Topics. I find that if you have the topic name in the intent when you do these changes it is easier to keep your brain in one context.

Cluster Topics

"Happy families are all alike; every unhappy family is unhappy in its own way." the Anna Karenina principle

Some Topics cover loads but you don't really care about the individual intents inside. For example if you have a Complaints topic that could cover all sorts of things people moan about.

No one wants a message back saying "This robot cares we have lost your bags". A complaint question will have to be passed onto a person. If we can tell that person that we have a complaint they can then decide what to do next. If you do not break down complaint topic into intents though all sorts of questions will be in one intent. It will deal with damage, delays, queues, lost items, dirty conditions etc. This giant varied intent will suck in other questions damaging your overall system accuracy.

With a varied topic like complaints that your chatbot cannot handle by itself. If you make one giant intent it will damage your overall accuracy. But because complaints tend to be about a few things at once, 'The food was terrible and the portions were small', there is often not one solid intent anyway. By labelling all complaints ComplaintIntent it is possible to ignore the intent part as getting the topic right is good enough.

In our accuracy tests we can strip the intent part off and say that if we land in Complaint that is good enough. But not create on giant intent that covers too much and that will suck in all other questions.

This issue of big topic particularly happens with Off Topic topics where questions are out of scope, silly or just cover large areas that you can't really answer.

There are other ways to label intents. This TopicIntent method is what I use. If you have a different way please mention it in the comments.

Wednesday, 4 April 2018

Clustering Questions Part 2: Intentions

Once you have divided your questions into Topics the next step is to divide them into Intents. This is how I would find the intents inside a topic

An Intent is a purpose or goal expressed by a user’s input such as finding contact information or booking a trip.

Imagine you had an airline booking chatbot. And you had these questions in the Topic booking

There is a dataset of travel questions here I will take some questions from there and invent some myself

A booking topic could have

Question Intent
I'd like to book a trip to Atlantis from Caprica on May 13 BookTicket
I'd like to book a trip from Chicago to San Diego between Aug 26th and Sept 5th BookTicket
i wanna go to Kobe whats available? BookTicket
Can I get information for a trip from Toluca to Paris on August 25th? BookTicket
I'd like to book a trip to Tel Aviv from Tijuana. I was wondering if there are any packages from August 23rd to 26th BookTicket
I want to know how far in advance I can book a flight BookFuture
When do bookings open for 6 months time BookFuture
I want to get a ticket for my christmas flight home BookFuture
Can I check my booking? BookCheck
can i check my booking status BookCheck
can i check the status of my booking BookCheck
how do i check the status of my booking BookCheck
i need to check my booking status BookCheck
Let me know the status of my Booking BookCheck
Can I book now pay later BookPay
How can I pay for a booking? BookPay

On this the verbs Check and Pay each seem to form an intent. There is one intent on When bookings can happen. And a few unknowns that might make more sense when we have more questions later.

At this stage realise you are going to make mistakes and have to go back over your intents as you learn by doing. Fixing your intents once you have had a first cut at defining I will come back to.

One could way to find intents in a topic is to look for verbs. Unrelated actions tend to have different verbs. In this case the topic Booking is already a verb and a noun. This is common enough. Dual meanings like this can be a nightmare with Entities but that is another blogpost.

In this topic something like 'cancel a booking' is likely to be an intention. Here Cancel is the verb and booking as the object of the sentence.

Other clues to the intention are the Lexical Answer Type, the subject and the object The LAT is the type of question. Who questions have different types of answers to When questions. In practise I don't find you commonly use the LAT to define intentions.

One possible exception to this is definitional questions where users ask "What is a..." for a domain term to be explained. If more than 5% of your questions are definitional you may not have collected representitive questions as manufactured questions by non real users or real users forced to ask questions tend to be definitional. When someone runs out of real questions they will ask 'What is a booking'.

The Subject of the sentence is also rarely useful. Sometimes who is doing an action changes the answer but usually there is a set scheme to buy, book, cancel etc and who is doing it doesn't matter.

The Object of the sentence is more often useful. Frequently an intention is a combination of the verb and what it is being done to. Whichever one isn't the Topic is usually the intent. Booking might be a topic and various things you do with a booking would be intents.

In summary go through each topic. If there are verbs shared across questions they might go together in an intent. But you have to use the domain experts knowledge of what questions have the same intention this step cannot be automated.

Tuesday, 3 April 2018

Clustering Questions into Topics

IBM Watson used to claim it took 15 minutes to match up a question with an intent. The technique described here halves that time. Context switching is mentally draining and wastes a lot of time. Concentrating on one part of a job until it is done is much more efficient than switching between tasks.

In a similar way once we have our questions collected the next task is to divide them into topics. Then these topics will be looked at individually.

A topic is a category of types of questions people will ask your chatbot.

In an Airline these might be Checkin, Booking, Airmiles

In an Insurance company Renewal, Claim, Coverage

Before looking at the questions try think of 5 topics that might occur in customer questions to your business.

How many topics?

Roughly 20. You might have ten or 30. A rule of thumb used in K Nearest Neighbour classification is if you have N documents you expect to have Sqrt(N) clusters. This works out as 44 for 2000 clusters. You won't have 2000 questions at this stage more likely under 1000.

Can you automate discovering topics

Yes you can using a KNN algorithm with the number of clusters given above. No you really should not. You learn a hell of a lot clustering 500 questions. You will have to read all these questions eventually anyway so you might as well learn this stuff now.

Process of Marking up topics

Say you have 500 questions in a spreadsheet. What we are trying to do here is mark up a new column 'Topic' that puts each of these questions in a topic.

Go through your 500 questions. Looking for the 5 topics you listed in question above. You may find that actually what you thought was one cluster is two. Or that a topic you expected is missing If you are looking for the clothesReturn topic I would search for the key words 'return' and 'bring back'. I would look for the obvious words in each of the topics I expect.

Once I had marked up the obvious keywords from my list of 500 that were clothesReturn if I found a new question in that topic I would look for the word it had that showed me it was that topic but was not in my original search list

Can I exchange a jumper I bought yesterday for a new one

I would then search for other uses of 'exchange'. It is a word likely to be used in clothesReturn but one I missed earlier.

If you know the domain roughly half of the questions will be classified by your obvious keywords.

I would read through the remaining questions with my 5 expected topics in my head. If I see something that is obviously a new topic I add that to the topic list.

Feel free with marking 5-10% of questions with unknown. these might make more sense when you have more questions or might be part of the long tail, out of scope or off topic that your chatbot will not handle.

What Next

Once you have a spreadsheet with a column marked up with the Topic of each question the next step is to find the intent of each question. But now you are reviewing a series of questions in one topic which makes it much easier to concentrate and work in a batch mode.

I will describe this step of marking up intents in a later blogpost