Optical Character Recognition

Optical Character Recognition (OCR) recognizes text in media. This includes text that appears in images, video, and text embedded in PDF files and Office document file formats.

Configuration Parameter Description
Blacklist Characters to exclude from the character set used for recognition.
CharacterTypes The types of characters to include in the character set used for recognition.
ContextCheck Specifies whether to use context checking to improve OCR results
DetectAlphabet Specifies whether to detect the alphabet for each image or page.
FontType The basic character type of the text that you want to recognize
HollowText Specifies whether to look for outlined text.
Input The image track to process.
KeepOnly Keep only particular types of words and discard all others.
Languages The languages to use, which affects the character set and dictionaries used.
MaxInputQueueLength Can be used to place a limit on latency.
NumParallel The maximum number of video frames to analyze simultaneously.
OcrMode The OCR mode to use when you ingest images or documents.
Orientation The orientation of text in the ingested media.
ProcessTextElements Specifies whether to merge the content of text elements into the OCR results.
Region A region of the image or video frame to restrict processing to.
RegionUnit The units that the Region parameter uses to specify the size and position of a region.
RestrictToInputRegion Specifies whether to analyze a region of the input image or video frame that is specified in the input record, instead of the entire image.
SampleInterval The interval at which frames are selected to be analyzed.
Spacing Specifies whether to allow multiple spaces between words in the output from OCR.
Type The analysis engine to use. Set this parameter to OCR.
UserDictionary A comma-separated list of dictionaries to use in addition to the standard dictionaries.
Whitelist Extra characters to add to the character set.
WordRejectThreshold The minimum confidence level required to include a word in the output.

Output Tracks

Output track Type Description Output1This column indicates whether the information contained in the track is included by default in the output created by an output task (when you don't set the Input parameter for the output task).
Data OCRResult Contains one record, describing the analysis results, per line of text, per video frame. No
DataWithSource OCRResultAndImage

The same as the Data track, but each record also includes the source frame.

No
Result OCRResult Contains one record, describing the analysis results, for each line of text. When a line of text appears in many consecutive frames, Media Server produces a single result. Yes
ResultWithSource OCRResultAndImage

The same as the Result track, but each record also includes the best source frame.

No
CharResult OCRDetail

Contains one record, describing the analysis results, for each line of text. However, the records in this track provide detail about individual characters rather than the whole line.

This track is available only when you ingest images or documents. It is not available if the source is a video file or stream.

No
WordResult OCRResult

Contains one record, describing the analysis results, for each word.

This track is available only when you ingest images or documents. It is not available if the source is a video file or stream.

No
Start OCRResult

The same as the Data track, except it contains only the first record of each event.

No
End OCRResult

The same as the Data track, except it contains only the last record of each event.

No

OCRResult

Field name Type Description
id UUIDData

A unique identifier to identify the line of text.

Every record in the Result track has a different id. Records in all of the other tracks have the same id if they correspond to the same line.

text TextData The result of running OCR on the text.
region RectangleData The location of the text in the frame.
confidence Integer The confidence score from OCR, or 100 for text extracted from text elements.
angle Integer The orientation of the text in degrees (rotated clockwise 0, 90, 180, or 270 degrees from upright).
source String Specifies the origin of the text: static text from an image or video (image), text from video of a news ticker, with text scrolling from right to left (scroller, left), or a text element in a document (text).

OCRResultAndImage

The same as OCRResult records, but with the following additional fields.

Field name Type Description
image ImageData The source frame.

OCRDetail

Field name Type Description
id UUIDData

A unique identifier to identify the line of text.

Every record in the Result track has a different id. Records in all of the other tracks have the same id if they correspond to the same line.

character OCRChar

There is a character element for each character on the line, including spaces. This element includes the following information:

  • text (TextData type) - the character that was recognized. This element is empty if the character is a space.
  • region (RegionData type) - the location of the character in the source frame.

_FT_HTML5_bannerTitle.htm