.NET API Concepts
The Named Entity Recognition SDK provides a .NET API that enables your application to create an extraction engine and perform entity extractions.
This section describes the concepts used to write .NET applications with the Named Entity Recognition EDK.
The .NET SDK consists of:
EductionDotNet.dll
, which contains the Named Entity Recognition .NET class library.edk.dll
(Windows) orlibedk.so
(MacOS and Linux), which performs the Named Entity Recognition functionality.
NOTE: You might also need additional runtime libraries to run the Named Entity Recognition SDK. See Named Entity Recognition SDK Package.
Concurrency Control
Concurrency in Named Entity Recognition is handled using sessions, represented by an ITextExtractionSession
object.
You initialize an instance of an ITextExtractionEngine
object with a configuration file that describes the grammars and settings that you want to use for entity extraction. You can create multiple ITextExtractionSession
objects from this engine, each of which use the same grammars and settings as the parent engine. Each session maintains its state independent of others.
Character Encoding
The underlying edk.dll
and grammars assume that all your input is UTF-8 encoded. The Named Entity Recognition .NET SDK functions that accept System.string
automatically handle conversion from UTF-16 to UTF-8. However, functions that accept a System.IO.Stream
(for example Eduction.ITextExtractionSession.SetInputStream
) require the byte data in the stream to be UTF-8.
Some of the available metadata that the SDK returns represent byte counts or offsets. These values are correct for the UTF-8 representation of the matched texts. Character counts and offsets are independent of the encoding.