Configure Tangible Characters
The TangibleCharacters
configuration parameter specifies a list of characters to treat as part of a word, rather than as word boundaries. You can set this value when using the Named Entity Recognition SDK, Named Entity Recognition Server, or the Named Entity Recognition command-line utility (edktool
).
Some entities in the Named Entity Recognition Grammars Package grammar files require you to set tangible characters to allow them to perform correctly. For details, see the descriptions of the entities in the appropriate grammar reference: PII Grammar Reference, PHI Grammar Reference, PCI Grammar Reference, or Government Grammar Reference.
When you use Named Entity Recognition to search for matches, TangibleCharacters
applies across all of your chosen entities. If you use multiple entities that have different recommended tangible character sets, you might need to take some extra steps. For example:
-
In the Named Entity Recognition SDK, create a separate configuration file for each distinct set of tangible characters and associated entities, and create an EDK engine for each configuration file.
- In Named Entity Recognition Server, send a separate action (
EduceFromText
orEduceFromFile
) for each distinct set of tangible characters. In each action, set theTangibleCharacters
andEntities
action parameters to specify which set of tangible characters and which entities to use. - In the command line
edktool
, create a separate configuration file for each distinct set of tangible characters and associated entities, and process your input text once with each configuration file.
For more information about the TangibleCharacters
configuration parameter, refer to the Named Entity Recognition User and Programming Guide.