Tuesday, 30 January 2018

Visualizing Chatbot Quality with Swarm Plot

When you create a chatbot you frequently want to see where it is going wrong so that you can fix problems. When you look at the logs or run tests you get results of the form

Question, Correct Intent, Returned Intent, Confidence

Can I update my account settings?,Update,Check,0.332

I have a demo dataset here you can use to follow along with the code. Swarm csv

Usually the confusion matrix of which intentions are mixed up with each other can be shown with a heatmap. But an interesting visualisation for this type of data is a swarm plot using the Python seaborn library. There is a nice guide to the seaborn visualization library here

# Pandas for managing datasets
import pandas as pd
# Matplotlib for additional customization
from matplotlib import pyplot as plt
%matplotlib inline
# Seaborn for plotting and styling
import seaborn as sns
#read in the csv
df = pd.read_csv('swarm.csv', index_col=0, encoding='mac_roman')
df.columns = ['Intent', 'Expected','Confidence']
#draw the swarm chart
plt.figure(figsize=(10,6))
swarm_plot = sns.swarmplot(y='Confidence',
                           x='Expected', 
                           hue='Intent', 
                           data=df)
plt.legend(bbox_to_anchor=(1, 1), loc=2,title='Got')
plt.title('Swarm Report')

The graph shows you which intentions are being mixed up and the confidence that your chatbot has in its answers.

Tuesday, 16 January 2018

Combining the annotation capabilities of both Watson Knowledge Studio and Watson Discovery Service

Watson Discovery Service (WDS) provides a capability to automatically annotate the documents being ingested. This capability is available in several languages and it is able to recognize a wide range of entity types commonly found in typical texts written in these languages.

Unfortunately many users of WDS have to deal with documents which are not typical. For example, they could be dealing with medical documents that contain unusual drug and disease names or they could be dealing with a business domain that has obscure terminology that would not be understood by WDS (or indeed by most speakers of the language in question).

Luckily Watson Knowledge Studio (WKS) is can be used to create a language model that understands the specialized terminology for any domain. However many document collections will contain a mixture of specialized terminology and normal test. By default, when users choose to specify that a customized WKS domain model is to be used instead of the generic WDS model it is as a replacement and none of the normal entities will be annotated by WDS.

It is not feasible for users to build a complete WKS model that incorporates all of the normal language dictionaries as well as the specialized domain terminology. However, there is a trick which can be used to get WDS to use both the domain specific annotator from WKS and the generic language annotator from WDS.

Unfurtunately this trick is not possible with the normal WDS UI, but it requires the use of the REST API - hopefully you are already familiar with this and you should be able to export your configuration to a JSON file. Assuming that you have configured a number of enrichments for the field named "text" you will see that your configuration contains a fragment that looks something like the following:

  "enrichments": [
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          },
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}
        }
      }
    }
  ],

This fragment means that you have selected a number of different enrichment types to be computed for the text field and the results to be placed in the field named "enriched_text". For most of these enrichments you will use the language model which is provided with the natural language understanding unit that is built into WDS, but for entities it will instead rely upon the WKS model ID "a3398f8b-2282-4fdc-b062-227a162dc0eb".

If you want to have the core WDS detected entities as well as the WKS detected ones, you need to define an additional enrichment entry in your configuration file to place these enrichments in a different named field e.g. wds_enriched_text. The fragment of JSON above needs to be replaced with the fragment below and then the new configuration should be uploaded via the API.

  "enrichments": [
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          },
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}
        }
      }
    }, 
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "wds_enriched_text",
      "options": {
        "features": {
          "entities": {}
        }
      }
    }
  ],

What this configuration will produce is two different enrichment fields containing the entities detected by WDS and WKS. However, it is likely that you want to have all of the detected entities available in a single field. Luckily this is possible by configuring the collection to merge the two fields during the "Normalize" phase.