Tuesday, 12 February 2019

Matching Only on the Number of Digits you Want

Frequently in Watson you will have two entities that both involve numbers. Say a birth month is two digits long and a birth year four. A problem can arise where the shorter number is found in the longer number because there are two digits inside the four digits.

Month Entity

Year Entity

But when Year is given Month is found.

A way around this is to use the \b word boundary regular expression to say I only want numbers if there are spaces or words around the digits.

and now it works. The \b is not included in the captured entity just the number which is handy

I have tested this in Chinese where they do not use spaces and it also works. Which is great.