KillDuplicatesOption

For the DREADDDATA index action, you can add the KillDuplicates option directly to the end of the #DREENDDATA string at the end of your data. For example, #DREENDDATAREFERENCE uses the REFERENCE option to remove duplicates.

NOTE: If you set the KillDuplicates action parameter as well, the Content component ignores the #DREENDDATA option.

This parameter determines how the Content component handles duplicate documents. It allows you to prevent the same document or document content from being stored in Content more than once.

Use one of the following options:

NONE Duplicate documents are allowed in Content and are not replaced or deleted.
REFERENCE If the document being indexed has the same DREREFERENCE field value as a document that already exists in the Content component, Content deletes the existing document and replaces it with the new document.
REFERENCEMATCHN

If the content of the document being indexed is more than N percent similar to the content of a document that already exists in the Content database, Content deletes the existing document and replaces it with the new document. Content determines the similarity by comparing the content of the SourceType fields in the document, or the Index fields if no SourceType fields are configured.

NOTE: This method can deduplicate only on documents that are already synced in the index. It cannot deduplicate similar documents in the same index job.

FieldName

If the document being indexed contains a FieldName reference field with the same value as the FieldName reference field in a document that already exists in Content, Content deletes the existing document and replaces it with the new document.

To specify multiple Reference fields, separate the fields with a plus sign (+) or a space. Content deletes documents that contain any of the specified fields with identical content. You must percent-encode any punctuation characters in the field name.

NOTE: Fields are identified as Reference fields by field processes in the Content configuration file. If you use a FieldName Reference field to eliminate duplicate documents, Content automatically reads any fields listed alongside this field for the PropertyFieldCSVs parameter in the field process, and also uses these fields to eliminate duplicate documents. If you want to define multiple Reference fields but do not want them all to be used for document elimination, you must set up multiple field processes (see Configure a Field Process).

ReferenceField,GREATER:VersionField

If the document being indexed contains a ReferenceField reference field with the same value as the ReferenceField reference field in a document that already exists in Content, Content checks the VersionField for the documents, and keeps the copy of the document with the highest value in the VersionField. For XML documents, you must fully qualify the path of the XML field that you want to use as the version field (you cannot use wildcard values).

VersionField must contain a positive integer value, but you do not need to configure it as a numeric field. If only one of the incoming and current documents has a valid value in the VersionField, Content keeps the version with a valid VersionField. When both documents have the same VersionField, Content keeps the existing document.

NOTE: When you index IDX documents, for the version comparison to work correctly, the value in the field that you use as the VersionField must be listed in quotation marks (""). That is, the field must have the following format in the IDX:

#DREFIELD MyField="N"

Content treats existing documents with a missing or non-numeric value in the VersionField as having a version number of negative infinity. It treats a new document with a missing or non-numeric value in the VersionField as having a version number of 0.

You can postfix any of these options with =2, to apply the KillDuplicates process to all Content databases (rather than only to the database into which the current IDX or XML file is being indexed).

If you do not set KillDuplicatesOption, it defaults to the option specified for KillDuplicates in the Content configuration file [Server] section. You can also set the following option:

NOOP Content uses the KillDuplicates setting in its configuration file [Server] section to determine how it treats duplicate text.
Actions: DREADDDATA
Type: String
Default:  
Example: To set the required KillDuplicates option, append it directly to the #DREENDDATA tag:

DREADDDATA?[optionalParameters]Data#DREENDDATAREFERENCE\n\n
In this example, KillDuplicates is set to REFERENCE.
See Also: KeepExisting
KillDuplicates
KillDuplicatesDB