Monday, 5 February 2018

An alternative way of training Watson Discovery Service

Watson Discovery Service (WDS) provides an excellent natural language query service. This service works well out of the box, but many users like to improve the results for their particular domain by training the service. In order to train the service how to better rank the results of natural language query you need to provide the service with some sample queries and for each query indicate which documents are good results for this query and equally importantly which documents would be a bad result for the query.

The standard user interface to the training capability allows you to view the potential results in a browser and then click on a button to indicate if the result is good or bad. Clicking on the results is easy for a small sample of queries, but it quickly becomes tedious. For this reason, many users prefer to use the API for the training service which gives additional control and capabilities.

Unfortunately the WDS training service only works well with large amounts of training data and in many cases it is not feasible to collect this volume of training data. Luckily there is an alternative (homegrown) way of training WDS which works significantly better for small amounts of training data. The method (which is known as hinting) is amazingly simple. All you need to do is add a new field to your target documents (e.g. named hints) with the text of the question that you want the document to be selected as an answer. Obviously when you as this question (or a similar question) the natural language query engine will select your target document and rank it highly since it is clearly a good match.

This alternative training method is sometimes called hinting because you are providing hints to WDS about which questions this document provides and answer. An additional benefit of this training method is that it helps find matches where the question and the answer document don't have any words in common. Whereas, the standard WDS training method only impacts upon the ranking of results so if the answer document you want to be selected is not even in the list of top 100 answers fetched for the query the normal training would not help.