Wednesday, 11 September 2019

Convert Watson csv's to RASA yml

Watson allows you to export the ground truth (questions and intents) RASA expects data in a slightly different format.
Watson is in the form
Question, Intent
Can I buy a sandwich?, #buy_sandwich

RASA in the form
## intent:buy_sandwich
- Can I buy a sandwich?

This is yml format but RASA calls it .md. The code for this conversion from Watson format to RASA is below

import pandas as pd
 questions = pd.read_excel(filePath,names=['Intent','Question'])
file = open("","w", encoding="utf-8")  
for x in labels:
    #print the intention name in rasa way
    file.write("## intent:"+x+"\n")
    #then print every question in the dataset with that label
    #change series into an array
    i = 0
    while i < len(z):
        file.write(str("- "+z[i]+"\n"))


Monday, 12 August 2019

Migrating your Watson Assistant workspace to WebHooks

In the last post we examined the differences between webhooks and the old way (sometimes called web actions) that IBM Watson assistant called REST functions through the use of IBM Cloud Functions. In this post we will look at a simple example of migrating from one to the other.

There is a tutorial on DeveloperWorks which guides you through all of the steps to connect your Watson Assistant skill to the Wikipedia API using the old style mechanism. In this blog post we will assume that you have already gone through the steps in the original tutorial and we will describe how you can convert your workspace to use the newly released webhooks feature.

The first step in the migration is that  you need to make sure that the Cloud Function you created to lookup Wikipedia call able to be called externally.

If the function definition is associated with a resource group, the security model will make it hard to be called, so you need to ensure that your function definition it is associated with a Cloud Foundry space. You can tell if your selected namespace is IAM based or Cloud Foundry based because of the drop-down selector at the top of the page will say (CF-Based) e.g.:

If your function is defined in a IAM resource group, the easiest way to move it is to create a new function (with the exact same code) in a Cloud Foundry space i.e. switch to the new space with the drop-down and then create the function as described in the original tutorial.

After you have saved the Action you should try it out by changing the object_of_interest to different things you might be interested in looking up and then see what the function returns.

Since WebHooks doesn't interact directly with Cloud Functions as such, you will need to ensure that your action is turned into a WebAction which can be called by any REST client. To do this, click on the EndPoints link in the left margin. This will give you the option to make your action invokable by a REST URL.

Once  you do this, you will see a curl command which can be used to invoke the web action. Initially the screen only shows API-KEY rather than the actual API key assigned to you. Click on the eye icon on the right to display the fully correct curl command.

If you have curl installed you can copy this command and execute it in your command line window, However, you will get an error because you haven't given any input parameters. To solve this add more command line options to specify the object of interest and the fact that the data you are supplying in in JSON format. e.g.:
curl -u 3a686c56-12fc-4bd9-8a08-55317fec468d:CE4A88p1qGYV69dF43iVENNn3Ok6DHdtlYz7tlrCh0yG7aNRvUgzcHHBNJxi15z9 --header "Content-Type: application/json"  --data "{\"object_of_interest\": \"love\"}" -X POST
If you prefer using another tool like POSTMAN, you can easily convert this command to suit. The one thing you need to be aware of is the fact that the authorisation you supply with the -u parameter to the curl command consists of two parts - the part before the colon is effectively a username and the part after the colon is the password.

Once you have verified that you WebAction is working correctly, you next need to change the Watson Assistant skill to use webhooks when calling Wikipedia. You do this by clicking on the options tab when editing the dialog and then selecting the Webhooks option on the left (it should be selected by default) and then entering details of the URL you want to call, what credentials to use and any other headers you want to pass to the function.

You can use the URL and credentials from the curl command that you got from the web actions page described above. You might be slightly worried that there is a single URL assigned to a  skill because in some cases where you might need to access services from different sites. There are ways of getting around this limitation, but I won't describe them here since our use case doesn't need to connect to multiple services. The next blog post in this series will describe in detail how you can connect to multiple REST services from a single WA workspace..

The Dialog node which implements the interface to Wikipedia through Webhooks will be significantly different from the old one, therefore I suggest that you either delete or disable the old node. For example you could rename the node to Old Wikipedia and disable it my changing the match condition to false as illustrated below.

Now you have to create a new Dialog for calling the webhook. You should call the node Wikipedia or something similar and make sure it is activated whenever the #tell_me_about intent is detected. Next click on the 'customize' icon and this will give you an option to turn on Webhooks for this node.

As soon as you close the customize dialog you will see that you see additional UI elements which parameters you would like to pass to the REST call and what context variable you would like to use to store the response.

You will also see that the node has been converted into a multi condition response node and it pre-configures two output slots for what to say when your REST call succeeded (i.e. when the context variable has been set) or when the variable wasn't set (which probably indicates a network error or something similar).

You can use the same responses as in the in the original tutorial since the format of the response won't have changed. You can now test your application and see that it behaves more or less as before.

There are two things that you should note about the way that Watson Assistant Webhooks work:

  1. We specified that you add a parameter named object_of interest and set its value to the contents of the @object_of_interest entity (lets assume that you asked "what is love" so the value will be "love" ).

    Normally when people say that they are adding parameters to a POST call they mean that they are adding a header with the value "object_of interest: love". However, this is not what Watson Assistant does. Instead it sends a JSON body with each of the parameter values e.g. {"object_of_interest": "love"}.

    This is actually a better thing to do, but make sure you don't get confused by the terminology in the documentation.
  2. Watson Assistant tells you that it stores the response from the REST call in the context variable you specify, but this is not exactly what it does. While the Webhooks functionality is not totally tied to the Cloud Functions, it does make certain assumptions based upon the way Cloud Functions operate.

    The response from a call to a Cloud Function will contain lots of information about the call other than just the response from the REST service called. It look something like: 
   "activationId": "xxx",
   "response": {
      "result": { ...},
      "status": "success",
      "success": true

When you IBM Cloud Function returns, the data returned by the REST service is contained in the response.result part of the JSON structure retuned. If you are not using Cloud Functions, make sure you follow this convention because Watson Assistant will be expecting it. Similarly, you should also set the response.sucsess variable to the value true because otherwise Watson Assistant will assume that the call has failed.

Thursday, 8 August 2019

What is the benefit of the new webhooks feature in Watson Assistant

When building an AI chatbot it is impossible to incorporate all knowledge directly in your skill. As a result developers often find themselves wanting to call external functions to answer certain queries. IBM has responded to this requirement by supporting the calling of cloud functions from within a Dialog node in Watson Assistant.

While developers have found this useful, they have also complained that it is inflexible and not so easy to use. To answer these complaints,  IBM has recently released a new feature called webhooks. This feature was available in limited Beta for several months, but has just been released generally so now is a good time to look at it.

This table summarises the differences between the two mechanisms:

Aspect webhooks old way
URL Flexibility With webhooks you can call any arbitrary URL. This means that you are not tied to using IBM Cloud Functions as an intermediate layer, Of course Watson Assistant will always make a POST call to your URL and supply the parameters in JSON. If the REST API you want to call does not accept this, you will need some mechanism to transform the call. However, you are free to choose any transformation tool that you want. With the old mechanism, you could only call a Cloud Function which is defined in the same environment as the Watson Assistant instance which is doing the calling. This was quite restrictive and although it worked OK with Cloud Foundry based authentication, it was not really compatible with the new IAM style resource group authentication currently used in the IBM cloud.
UI Assistance There is a UI to guide you in defining the authorisation and other parameters for your call to a webhook. This makes it quite user friendly. The way that you specified that a REST call should be made is by editing the JSON response from a node to include an action parameter. Apart from the documentation there was no assistance to developer to define this correctly.

Now that we have compared the two mechanisms, our next blog post will look at a simple example of migrating from one to the other.

Wednesday, 3 July 2019

Watson is starting to sound much more natural

IBM has recently implemented a very significant change in the technology that they use for speech synthesis.

To simplify, the traditional technology involved splitting up the training audio in to chunks of roughly half a phoneme and when given a snippet of speech to synthesise it will pick the most suitable chunks of  to use and combine. Sometimes it will be lucky and it will find a large part of the desired speech already in the training corpus and in this case it can generate a very realistic output (because it is essentially replaying a recorded sample). However, more often Watson will need to combine chunks from different utterances in the training data. While there are techniques to try and seamlessly fuse the different chunks together, users frequently complain that they can hear a choppiness and the voice sounds more robotic than human.

The newly released technology generates the synthesised speech from scratch rather than leveraging recorded chunks of speech. It makes use of three different Deep Neural Networks (DNNs) that look after prosody, acoustic features and voice signal creation. The result is a much more natural sounding voice. Another advantage is that it is much easier to adapt the engine to a new voice because the amount of speech we require from the actor is much less (since we don't need a large corpus to pick samples from).

You can read an academic description of the research here and a more end user based description here.

Most users agree that this new technology sounds much better. You can try it out for yourself here at the normal Watson TTS demo page. When you select a voice to use, the ones with this new technology are identified by having 'dnn technology' written after their voice. I am sure that you will agree that these sound better than the traditional voices (which are still available).

Tuesday, 12 February 2019

Matching Only on the Number of Digits you Want

Frequently in Watson you will have two entities that both involve numbers. Say a birth month is two digits long and a birth year four. A problem can arise where the shorter number is found in the longer number because there are two digits inside the four digits.

Month Entity

Year Entity

But when Year is given Month is found.

A way around this is to use the \b word boundary regular expression to say I only want numbers if there are spaces or words around the digits.

and now it works. The \b is not included in the captured entity just the number which is handy

I have tested this in Chinese where they do not use spaces and it also works. Which is great.

Wednesday, 28 November 2018

Matching patterns and getting their values in Watson Assistant/Conversation

When IBM Watson Assistant (formerly known as Watson Conversation) is deciding how to respond to a user's utterance it is vital that it correctly identifies the intent (what the user wants to do) and the entities (what are the things involved in the intent). For example, if the user says "I want to buy a book" - the intent would be #MakePurchase and the entity @ItemOfInterest would have a value of "book".

In earlier releases of Watson Assistant, the only way to specify possible entity values was either by manually specifying a list of possible values or else by selecting one of the predefined system entities such as @sys-date. Sometimes this works quite well, but other times (e.g. when you are expecting an email address or an account number) is not feasible to list all of the possible values that people might enter.

Luckily, the latest version of the Watson Assistant service allows you to specify allowable entity values with a regular expression. Unfortunately, people sometimes find it hard to retrieve the matched value from a pattern match. If you are not careful you will be told that an email address was specified and not what exact email address was given. Therefore this blog post works through a very simple conversational design to explain what you need to do.

First off, you define an intent. We will call our intent #sendMessage and we give Watson a few examples of what the user might say when they want to send a message.

Then we create a @contact_info entity which we expect users to specify when they are sending a message. To complete this entity, the user types a message indicating that they want to send a message. We expect that the message will also contain details of where to send the message, either an email address or a phone number (the phone number can be specified in US style or in the e164 standard common in other parts of the world).

This picture shows how the entity definition will look. Don't worry if you can't read the regular expressions in the screenshots, you can download the workspace design.

Now you need to insert a dialog node to handle requests to send messages. We create a node in our dialog flow which is triggered when Watson detects that the user's intention is to send a message. We know that it is necessary to have contact information to send a message, so if the user didn't supply this we will prompt them.

Then we need sub-nodes which deal with sending either emails or phone messages. We select which to activate depending the value of the @contact_info variable, which will be either email, us_phone_num or e164_phone_num.

When sending a message, it is not enough to know that the user gave us an email address - we need to know the exact email address given.  To do that, we  need to define a variable whose value will be specified as  "<? @contact_info.literal ?>".  The screen shot below shows the dialog node for sending a phone message.

This is the end of our very simple BOT. If you want to see this in action, download the design file here and  import it into your own Watson Assistant instance. Here is a screenshot of what I saw when I clicked on the "Try it out" button to see the bot in action.

In summary, regular expression entities can be really useful, so long as you remember to use the @entity_name.literal syntax to get the actual content that was matched rather than simply which rule was fire.

Friday, 14 September 2018

Connecting IBM Watson Speech services to the public phone system

Many use cases for IBM Watson speech services involve connecting phone calls. This can be tricky so I decided that it might be useful to publish a sample which shows such a connection in action. This application is very simple, it uses Speech to Text (STT) to understand what the caller says and when the caller pauses it uses Text to Speech (TTS) to read it back to them. This simple application can easily be used as a starting point for building a more complex application which does more with the received speech.

Flow Diagram

I chose to use the NEXMO service because it is the easiest way to connect a phone call to a websocket. You can visit their documentation site if you want to learn details of how this works. The short summary is that it acts as a broker between the phone system and a web application of your choice. You need to provide two URLs that define the interface. Firstly nexmo will do a GET on the '/answer" URL  every time a call comes in to the number you configure - the way your application handles this request is the key part of the application experience. Secondly nexmo will do a POST to your '/events' URL anytime anything happens on your number (e.g. a call comes in or the person hangs up) in our case we don't do anything interesting with these except to write them to the log for debugging purposes.

To get this working for yourself, the first thing you need to do is deploy my sample application. You can get the code from this GIT repository. Before you deploy it to IBM Bluemix, you need to edit the manifest.yml file and choose a unique URL for your instance of the application. You also need to create instances of the IBM Watson STT and TTS services and bind them to your application.

Next you need to configure nexmo to connect to your application. You need to login to the nexmo website and then click on the 'Voice' menu on the left side and then 'Create Application. This pops up a form where you can enter details of the /event and /answer URLs for the web application you just deployed. After you fill in this form, you will get an nexmo application id.

Unfortunately connecting to the phone system costs money. Nexmo charges different amounts of money for numbers depending upon what country they are associated with. In my case I bought the number +35315134721 which is a local number in Dublin, Ireland. This costs me €2.50 per month so I might not leave it live too long or maybe swap for a US based number at a reduced cost of US$0.67 per month.

Once you get out your credit card and buy a number, you must tell nexmo which application id you want to associate with the number. Visit the your numbers page and enter the details (like you see below).

Having done this, you can now ring the number and see it in action. When we receive a call, we open a websocket interface to the STT service and start echoing all audio received from the phone line to the STT service. Although the TTS service supports a websocket interfacer, we don't use the websocket interface because the chunks of audio data from the TTS service won't necessarily be returned evenly spaced the nexmo service will produce crackly audio output. Instead we use the REST interface and we write the returned audio into a temporary file before streaming it back into the phone call as a smooth rate.

The bulk of the code is in a file named index.js and it is fairly well explained in the comments, but here are a few more explanatory notes:

  • The first 70 lines or so are boilerplate code which should be familiar to anyone who has experience of deploying node.js application to BlueMix. First we import the required libraries that we use and then we try and figure out the details of the Watson service instances that we are using. If running on the cloud, this will be parsed from the environment variables. However, if you want to run it locally, you will need a file named vcap-local.json that contains the same information. I have included a file named vcap-local-sample.json in the repository to show you the required structure of the file.
  • Next comes a function named tts_stream which acts as an interface to the TTS service. It takes two parameters, the text to synthesise and the socket on which to play the result. We use the REST interface instead of opening a websocket to the TTS service (like we do with the STT service). The reason for this choice is that it results in crackly audio as the audio chunks coming back from the TTS service are not evenly spaced. The way it works is that it saves the audio to a temporary file and then pipes the file smoothly into the nexmo socket before deleting the temporary file. This approach introduces a slight delay because we need to wait for the entire response to be synthesised before we start playing. However, the problem is not as bad as you might think because a 15 second audio response might get sent back in under a second.
  • Next comes the two functions which respond to the /events and /answer URLS. As mentioned earlier the  /event  handler is very simple because it just echos the POST data into the log.  The /answer function is surprisingly simple also. Firstly it creates a websocket and then it sends a specially formatted message back to nexmo to tell it you want to connect the incoming phone call into the new websocket.
  • The real meat of the code is in the on connect method which we associate with the websocket that we created.
    • The first thing we do is stream audio from a file named greeting.wav which explains to the user what to do. While this message is helpful to the user, it also gives the application some breathing room because it might take some time to initialise the various services and the greeting will stop the user talking before we are ready.
    • Next we create a websocket stt_ws which is connected to the Watson STT service. 
      • As soon as the connection is established, we send a special JSON message to the service to let it know what type of audio we will send and what features we want to enable.
      • When the connection to STT is started, a special first message is sent back saying that it is ready to receive audio. We use a boolean variable stt_connected to record whether or not this message is received. This is because attempting to send audio data to the websocket before it is ready will cause errors.
      • When starting the STT service, we specify that we would like to receive interim results i.e. when it is transcribing some audio and it thinks it knows what was said, but it does not yet consider the results to be final (because it might change its mind when it hears next).  We do this because we want to speed up responses, but we don't want to echo back a transcription which might later be revised. For this reason we check the value of the final variable in the returned JSON and only call the tts_stream function when the results are final.
    • For the nexmo websocket, we simply say that all input received should automatically be echoed to the STT websocket (once we have received confirmation that the STT service link has been initialised properly.
    • When the nexmo websocket closes we also try to close the STT web socket
To give credit, I should point out that my starting point was this sample from nexmo. I should also point out that the code is currently only able to deal with only one call at a time. It should be possible to solve this problem, but I will leave this as a learning exercise for some reader of the blog.