Watson Tips and Tricks: WDS

Monday, 5 February 2018

An alternative way of training Watson Discovery Service

Watson Discovery Service (WDS) provides an excellent natural language query service. This service works well out of the box, but many users like to improve the results for their particular domain by training the service. In order to train the service how to better rank the results of natural language query you need to provide the service with some sample queries and for each query indicate which documents are good results for this query and equally importantly which documents would be a bad result for the query.

The standard user interface to the training capability allows you to view the potential results in a browser and then click on a button to indicate if the result is good or bad. Clicking on the results is easy for a small sample of queries, but it quickly becomes tedious. For this reason, many users prefer to use the API for the training service which gives additional control and capabilities.

Unfortunately the WDS training service only works well with large amounts of training data and in many cases it is not feasible to collect this volume of training data. Luckily there is an alternative (homegrown) way of training WDS which works significantly better for small amounts of training data. The method (which is known as hinting) is amazingly simple. All you need to do is add a new field to your target documents (e.g. named hints) with the text of the question that you want the document to be selected as an answer. Obviously when you as this question (or a similar question) the natural language query engine will select your target document and rank it highly since it is clearly a good match.

This alternative training method is sometimes called hinting because you are providing hints to WDS about which questions this document provides and answer. An additional benefit of this training method is that it helps find matches where the question and the answer document don't have any words in common. Whereas, the standard WDS training method only impacts upon the ranking of results so if the answer document you want to be selected is not even in the list of top 100 answers fetched for the query the normal training would not help.

Tuesday, 16 January 2018

Combining the annotation capabilities of both Watson Knowledge Studio and Watson Discovery Service

Watson Discovery Service (WDS) provides a capability to automatically annotate the documents being ingested. This capability is available in several languages and it is able to recognize a wide range of entity types commonly found in typical texts written in these languages.

Unfortunately many users of WDS have to deal with documents which are not typical. For example, they could be dealing with medical documents that contain unusual drug and disease names or they could be dealing with a business domain that has obscure terminology that would not be understood by WDS (or indeed by most speakers of the language in question).

Luckily Watson Knowledge Studio (WKS) is can be used to create a language model that understands the specialized terminology for any domain. However many document collections will contain a mixture of specialized terminology and normal test. By default, when users choose to specify that a customized WKS domain model is to be used instead of the generic WDS model it is as a replacement and none of the normal entities will be annotated by WDS.

It is not feasible for users to build a complete WKS model that incorporates all of the normal language dictionaries as well as the specialized domain terminology. However, there is a trick which can be used to get WDS to use both the domain specific annotator from WKS and the generic language annotator from WDS.

Unfurtunately this trick is not possible with the normal WDS UI, but it requires the use of the REST API - hopefully you are already familiar with this and you should be able to export your configuration to a JSON file. Assuming that you have configured a number of enrichments for the field named "text" you will see that your configuration contains a fragment that looks something like the following:

  "enrichments": [
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          },
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}
        }
      }
    }
  ],

This fragment means that you have selected a number of different enrichment types to be computed for the text field and the results to be placed in the field named "enriched_text". For most of these enrichments you will use the language model which is provided with the natural language understanding unit that is built into WDS, but for entities it will instead rely upon the WKS model ID "a3398f8b-2282-4fdc-b062-227a162dc0eb".

If you want to have the core WDS detected entities as well as the WKS detected ones, you need to define an additional enrichment entry in your configuration file to place these enrichments in a different named field e.g. wds_enriched_text. The fragment of JSON above needs to be replaced with the fragment below and then the new configuration should be uploaded via the API.

  "enrichments": [
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "enriched_text",
      "options": {
        "features": {
          "keywords": {},
          "entities": {
            "model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
          },
          "sentiment": {},
          "emotion": {},
          "categories": {},
          "relations": {},
          "concepts": {},
          "semantic_roles": {}
        }
      }
    }, 
    {
      "enrichment": "natural_language_understanding",
      "source_field": "text",
      "destination_field": "wds_enriched_text",
      "options": {
        "features": {
          "entities": {}
        }
      }
    }
  ],

What this configuration will produce is two different enrichment fields containing the entities detected by WDS and WKS. However, it is likely that you want to have all of the detected entities available in a single field. Luckily this is possible by configuring the collection to merge the two fields during the "Normalize" phase.