PutInfoArchive
Archives information into OpenText InfoArchive.
The processor inserts data into InfoArchive in the form of SIP packages. The fields in the SIP descriptor are described in the InfoArchive documentation.
Properties
Name | Default Value | Description |
---|---|---|
IDOL License Service |
An IdolLicenseServiceImpl that provides a way to communicate with a Knowledge Discovery License Server. |
|
InfoArchive API URL | The base URL of the InfoArchive REST API. | |
InfoArchive Application ID | The InfoArchive application ID. | |
InfoArchive DSS Application | A value to use in the application element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive DSS Entity | A value to use in the entity element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive DSS Holding | A value to use in the holding element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive DSS PDI Schema | A value to use in the pdi_schema element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive DSS Producer | A value to use in the producer element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive XSL Transform Path |
The path to a file containing an XSL transform to use to convert the Knowledge Discovery document metadata into the PDI file, used in the Submission Information Package (SIP). |
|
OAuth2 Site Name | To configure OAuth authentication, right-click the processor and click ADVANCED, and follow the instructions on the OAUTH SETUP tab. | |
OAuth2 Sites File | To configure OAuth authentication, right-click the processor and click ADVANCED, and follow the instructions on the OAUTH SETUP tab. | |
Document Registry Service | A DocumentRegistryServiceImpl controller service that manages and updates a document registry database. This ensures that documents are indexed in the correct order. | |
OAuth Database Service | A DatabaseServiceImpl to use to store OAuth tokens in an external database. This is necessary if you want to use the processor on a NiFi cluster. | |
Proxy Configuration Service | A ProxyConfigurationServiceImpl that specifies the proxy server to use. | |
SSL Config Service | An optional IdolSSLConfigServiceImpl that specifies the settings to use to index documents over SSL/TLS. | |
InfoArchive Batch Size | 100 | The number of Knowledge Discovery documents to include in each Submission Information Package (SIP). |
InfoArchive Batch Timeout | 10 mins | The maximum amount of time to wait for documents before sending a Submission Information Package (SIP) to InfoArchive. |
InfoArchive DSS Base Retention Date | +0 |
A value to use in the Specify a time duration (an offset from the time when the processor runs), for example "-3 weeks". |
InfoArchive DSS ID | A value to use in the id element of Submission Information Package (SIP) descriptors generated by the processor. |
|
InfoArchive DSS Priority | 0 | A value to use in the priority element of Submission Information Package (SIP) descriptors generated by the processor. |
InfoArchive DSS Production Date | +0 |
A value to use in the Specify a time duration (an offset from the time when the processor runs), for example "-1 day". |
InfoArchiveOrderByDateFormats | "ISO-8601" | The date format to expect, when you set InfoArchiveOrderByFieldType to Date . Specify a standard Knowledge Discovery date format. |
InfoArchiveOrderByFieldPath | The path to a document metadata field to use to sort documents, before creating SIP files. | |
InfoArchiveOrderByFieldType | String |
The type of data contained in the field specified by InfoArchiveOrderByFieldPath. The field value is converted to this type for sorting. Set this property to one of the following values:
|
InfoArchivePartitionByFieldPath | The path to a document metadata field to use to partition documents, before creating SIP files. | |
InfoArchivePartitionByRegex | (.*) |
A regular expression, with a single match group that extracts the part of the field value to use to partition documents. The following example could be used to partition documents by the year, month, and day contained within an ISO-8601 timestamp. (Two documents created on different days would be inserted in different SIP packages, but documents created on the same day might be in the same SIP package). ([-\d]+)T.* |
Relationships
Name | Description |
---|---|
success | FlowFiles that were indexed successfully. |
failure | FlowFiles that were not indexed successfully. |
Sort and Partition Data
A deployment of InfoArchive can contain a large amount of data. To assist InfoArchive in storing the data efficiently, the documents must be sorted and partitioned (by a relevant key such as created or modified date) before being sent to InfoArchive. OpenText recommends sorting documents before sending them to the PutInfoArchive processor, but if this is not possible then the processor has features to sort and partition documents. If you decide to use these options, OpenText recommends that the input queue to the PutInfoArchive processor is large (many times the InfoArchiveBatchSize) because this will allow the processor a large range of documents to choose from when creating batches.
To enable sorting, specify the path of the field to sort on by setting InfoArchiveOrderByFieldPath. For example, you could choose a field that contains a timestamp, so that InfoArchive receives the documents ordered by creation date.
To enable partitioning, set InfoArchivePartitionByFieldPath which specifies the path of the field to use for partitioning. Partitioning means that two documents with a different field value are not included in the same SIP file.
TIP: When partitioning by timestamp you might want to extract a sub-string of the timestamp. For example, you might want documents created on different days to be inserted in different partitions, but it would be unnecessary to partition documents created a few seconds apart. To use a substring of a field value, use the property InfoArchivePartitionByRegex.