LangDetectUTF8

Whether automatic language detection detects files that contain 7-bit ASCII as UTF-8, rather than a single-byte encoding, such as ISO-8859-1 (ASCII).

Automatic language detection uses the contents of the LangDetectType fields to detect the language of a document. By default, if these fields contain only 7-bit ASCII, the Content component detects the document as UTF-8. If you want to group these documents with documents that use an 8-bit ASCII encoding, set LangDetectUTF8 to False.

After Content detects the language of a document, it identifies the encoding by checking against the encoding options that you configure for the language (see Language Configuration). If you have not configured any compatible encodings, Content assigns the default language type. To ensure that a language is detected as UTF-8, you must include UTF8 as one of its encoding options.

Type: Boolean
Default: True
Required: No
Configuration Section: LanguageTypes
Example: LangDetectUTF8=True
See Also: AutoDetectLanguagesAtIndex