KillDuplicatesOption

For the DREADDDATA index action, you can add the KillDuplicates option directly to the end of the #DREENDDATA string at the end of your data. For example, #DREENDDATAREFERENCE uses the REFERENCE option to remove duplicates.

NOTE: If you set the KillDuplicates action parameter as well, the Content component ignores the #DREENDDATA option.

This parameter determines how the Content component handles duplicate documents. It allows you to prevent the same document or document content from being stored in Content more than once.

Use one of the following options:

NONE Allows duplicate documents in the Content index. Content does not replace or delete documents.
REFERENCE Replaces an existing document with the new document if the documen tot index has the same value in its DREREFERENCE field.
REFERENCEMATCHN

Replaces the existing document with the new document if the content of the document is more than N percent similar to the existing document. Content determines the similarity by comparing the content of the SourceType fields in the document, or the IndexIndex fields if no SourceType fields are configured.

NOTE: This method can deduplicate only documents that are already synced in the Content component index. It cannot deduplicate similar documents in the same index job.

FieldName

Replaces the existing document with the new document if the document to index contains a ReferenceType field named FieldName that has the same content as the FieldName field in the existing document.

You can specify multiple ReferenceType fields in this option, separated by a plus symbol (+) or a space. In this case, Content deletes documents that contain any of the specified fields with identical content. You must percent-encode any punctuation characters in the field name.

NOTE: You identify fields as ReferenceType fields by using field processes in the Content component configuration file. If you list multiple fields in the same PropertyFieldCSVs parameter where you list the FieldName for deduplication, Content uses all the fields to eliminate duplicate documents. If you want to define multiple ReferenceType fields, but do not want to use all fields for duplicate elimination, set up multiple field processes. Refer to the Knowledge Discovery Administration Guide.

ReferenceField,GREATER:VersionField

Replaces the existing document with the new document if the document to index contains a ReferenceType field named ReferenceField that has the same content as the ReferenceField in the existing document, and if the VersionField in the document to index has a higher value than the VersionField in the existing document. For XML documents, you must fully qualify the path of the XML field that you want to use as the version field (you cannot use wildcard values).

VersionField must contain a positive integer value, but you do not need to configure it as a numeric field. If only one of the incoming and current documents has a valid value in the VersionField, Content keeps the version with a valid VersionField. When both documents have the same VersionField, Content keeps the existing document.

NOTE: When you index IDX documents, for the version comparison to work correctly, the value in the field that you use as the VersionField must be listed in quotation marks (""). That is, the field must have the following format in the IDX:

#DREFIELD MyField="N"

Content treats existing documents with a missing or non-numeric value in the VersionField as having a version number of negative infinity. It treats a new document with a missing or non-numeric value in the VersionField as having a version number of 0.

You can postfix any of these options with =2, to apply the KillDuplicates process to all Content databases (rather than only to the database into which the current IDX or XML file is being indexed).

If you do not set KillDuplicatesOption, it defaults to the option specified for KillDuplicates in the Content configuration file [Server] section. You can also set the following option:

NOOP Content uses the KillDuplicates setting in its configuration file [Server] section to determine how it treats duplicate text.
Actions: DREADDDATA
Type: String
Default:  
Example: To set the required KillDuplicates option, append it directly to the #DREENDDATA tag:

DREADDDATA?[optionalParameters]Data#DREENDDATAREFERENCE\n\n
In this example, KillDuplicates is set to REFERENCE.
See Also: KeepExisting
KillDuplicates
KillDuplicatesDB