Configure Tangible Characters

The TangibleCharacters configuration parameter specifies a list of characters to treat as part of a word, rather than as word boundaries. You can set this value when using the Named Entity Recognition SDK, Named Entity Recognition Server, or the Named Entity Recognition command-line utility (edktool).

Some entities in the Named Entity Recognition Grammars Package grammar files require you to set tangible characters to allow them to perform correctly. For details, see the descriptions of the entities in the appropriate grammar reference: PII Grammar Reference, PHI  Grammar Reference, PCI Grammar Reference, or Government Grammar Reference.

When you use Named Entity Recognition to search for matches, TangibleCharacters applies across all of your chosen entities. If you use multiple entities that have different recommended tangible character sets, you might need to take some extra steps. For example:

  • In the Named Entity Recognition SDK, create a separate configuration file for each distinct set of tangible characters and associated entities, and create an EDK engine for each configuration file.

  • In Named Entity Recognition Server, send a separate action (EduceFromText or EduceFromFile) for each distinct set of tangible characters. In each action, set the TangibleCharacters and Entities action parameters to specify which set of tangible characters and which entities to use.
  • In the command line edktool, create a separate configuration file for each distinct set of tangible characters and associated entities, and process your input text once with each configuration file.

For more information about the TangibleCharacters configuration parameter, refer to the Named Entity Recognition User and Programming Guide.