Watson Discovery Service (WDS) provides a capability to automatically annotate the documents being ingested. This capability is available in several languages and it is able to recognize a wide range of entity types commonly found in typical texts written in these languages.
Unfortunately many users of WDS have to deal with documents which are not typical. For example, they could be dealing with medical documents that contain unusual drug and disease names or they could be dealing with a business domain that has obscure terminology that would not be understood by WDS (or indeed by most speakers of the language in question).
Luckily Watson Knowledge Studio (WKS) is can be used to create a language model that understands the specialized terminology for any domain. However many document collections will contain a mixture of specialized terminology and normal test. By default, when users choose to specify that a customized WKS domain model is to be used instead of the generic WDS model it is as a replacement and none of the normal entities will be annotated by WDS.
It is not feasible for users to build a complete WKS model that incorporates all of the normal language dictionaries as well as the specialized domain terminology. However, there is a trick which can be used to get WDS to use both the domain specific annotator from WKS and the generic language annotator from WDS.
Unfurtunately this trick is not possible with the normal WDS UI, but it requires the use of the
REST API - hopefully you are already familiar with this and you should be able to
export your configuration to a JSON file. Assuming that you have configured a number of enrichments for the field named "text" you will see that your configuration contains a fragment that looks something like the following:
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "text",
"destination_field": "enriched_text",
"options": {
"features": {
"keywords": {},
"entities": {
"model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
},
"sentiment": {},
"emotion": {},
"categories": {},
"relations": {},
"concepts": {},
"semantic_roles": {}
}
}
}
],
This fragment means that you have selected a number of different enrichment types to be computed for the text field and the results to be placed in the field named "enriched_text". For most of these enrichments you will use the language model which is provided with the natural language understanding unit that is built into WDS, but for entities it will instead rely upon the WKS model ID "a3398f8b-2282-4fdc-b062-227a162dc0eb".
If you want to have the core WDS detected entities as well as the WKS detected ones, you need to define an additional enrichment entry in your configuration file to place these enrichments in a different named field e.g. wds_enriched_text. The fragment of JSON above needs to be replaced with the fragment below and then the new configuration should be
uploaded via the API.
"enrichments": [
{
"enrichment": "natural_language_understanding",
"source_field": "text",
"destination_field": "enriched_text",
"options": {
"features": {
"keywords": {},
"entities": {
"model": "a3398f8b-2282-4fdc-b062-227a162dc0eb"
},
"sentiment": {},
"emotion": {},
"categories": {},
"relations": {},
"concepts": {},
"semantic_roles": {}
}
}
},
{
"enrichment": "natural_language_understanding",
"source_field": "text",
"destination_field": "wds_enriched_text",
"options": {
"features": {
"entities": {}
}
}
}
],
What this configuration will produce is two different enrichment fields containing the entities detected by WDS and WKS. However, it is likely that you want to have all of the detected entities available in a single field. Luckily this is possible by configuring the collection to merge the two fields during the "Normalize" phase.