Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

It is sometimes convenient to update CollectionSpace data directly in the database, using SQL, rather than going through the CollectionSpace UI or REST API. After making changes this way, the Nuxeo full text index will usually need to be updated. Otherwise, the new values will not be visible to full text (keyword) search. The "Reindex Full Text" batch job may be used to accomplish the necessary reindexing.

...

The batch job is adapted from the nuxeo-reindex-fulltext Nuxeo plugin. There is no public API in Nuxeo that just recomputes the full text for a document. The only way to trigger that logic is to save the document. So, for each document to be reindexed, the batch job makes a temporary change to an arbitrary field (dc:title), saves the document, changes the field back to its original value, and saves the document again. This is done using a low-level Nuxeo session, so that last modified data date won't be changed, and Nuxeo event handlers won't fire.

...

Use the Services REST API to install the batch job in a tenant, by posting XML to /cspace-services/batch. An example curl command follows, where the XML is in a file called install-payload.xml. Replace <username>, <password>, and <hostname> with appropriate values.

Code Block

curl -X POST -i -u "<username>:<password>" https://<hostname>/cspace-services/batch -T install-payload.xml

...

The XML shown registers the Reindex Full Text batch job to run on all doctypes known to the PAHMA 3.3 tenant. Doctypes in other tenants and versions may vary. The appropriate doctypes may be found by logging into the deployment's server. In the tomcat installation directory, there is a directory cspace/config/services/tenants/<tenantname>, which contains the file tenant-bindings.merged.xml. In that file, each doctype is represented by a service:object tag, and the doctype name is specified in the name attribute of that tag.

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document name="batch">
	<ns2:batch_common xmlns:ns2="http://collectionspace.org/services/batch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
		<name>Reindex Full Text</name>
		<notes>Recomputes the indexed full text of all or specified records.</notes>
		<className>org.collectionspace.services.batch.nuxeo.ReindexFullTextBatchJob</className>
		<supportsNoContext>true</supportsNoContext>
		<supportsSingleDoc>true</supportsSingleDoc>
		<supportsDocList>true</supportsDocList>
		<supportsGroup>false</supportsGroup>
		<createsNewFocus>false</createsNewFocus>
		<forDocTypes>
			<forDocType>Acquisition</forDocType>
			<forDocType>Batch</forDocType>
			<forDocType>CSNote</forDocType>
			<forDocType>Citation</forDocType>
			<forDocType>Citationauthority</forDocType>
			<forDocType>Claim</forDocType>
			<forDocType>Conceptauthority</forDocType>
			<forDocType>Conceptitem</forDocType>
			<forDocType>Contact</forDocType>
			<forDocType>Dimension</forDocType>
			<forDocType>Group</forDocType>
			<forDocType>Intake</forDocType>
			<forDocType>Loanin</forDocType>
			<forDocType>Loanout</forDocType>
			<forDocType>Locationauthority</forDocType>
			<forDocType>Locationitem</forDocType>
			<forDocType>ObjectExit</forDocType>
			<forDocType>Organization</forDocType>
			<forDocType>Orgauthority</forDocType>
			<forDocType>Person</forDocType>
			<forDocType>Personauthority</forDocType>
			<forDocType>Placeauthority</forDocType>
			<forDocType>Placeitem</forDocType>
			<forDocType>PublicItem</forDocType>
			<forDocType>Report</forDocType>
			<forDocType>Taxon</forDocType>
			<forDocType>Taxonomyauthority</forDocType>
			<forDocType>Vocabulary</forDocType>
			<forDocType>Vocabularyitem</forDocType>
			<forDocType>CollectionObject</forDocType>
			<forDocType>Blob</forDocType>
			<forDocType>Media</forDocType>
			<forDocType>Movement</forDocType>
			<forDocType>Relation</forDocType>
		</forDocTypes>
	</ns2:batch_common>
</document>

...

An example curl command to execute the batch job follows, where the XML payload is in a file called reindex_payload.xml. Replace <username>, <password>, and <hostname> with appropriate values. Replace <csid> with the csid of the Reindex Full Text batch job.

Code Block

curl -X POST -i -u "<username>:<password>" https://<hostname>/cspace-services/batch/<csid> -T reindex_payload.xml

...

To reindex a single document through the REST API, post XML to the batch job, containing the csid of the document, and its doctype. An example payload follows:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>single</mode>
	<docType>CollectionObject</docType>
	<singleCSID>5c33d85a-e4f2-41f3-b187</singleCSID>
</ns2:invocationContext>

...

To reindex a list of documents through the REST API, post XML to the batch job, containing a list of document csids, and the doctype of the documents. All of the documents will be reindexed in a single database transaction. An example payload follows:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>list</mode>
	<docType>CollectionObject</docType>
	<listCSIDs>
		<csid>5c33d85a-e4f2-41f3-b187</csid>
		<csid>3716f6ff-132c-49d1-87cb</csid>
		<csid>716be5a9-c8e3-40a6-959d</csid>
	</listCSIDs>
</ns2:invocationContext>

...

All documents of a certain doctype may be reindexed in a single batch invocation. To do this, post XML to the batch job, containing the doctype to be reindexed. An example payload follows:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>nocontext</mode>
	<docType>CollectionObject</docType>
</ns2:invocationContext>

...

Additional doctypes to be reindexed may be specified as parameters. An example payload follows:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>nocontext</mode>
	<docType>CollectionObject</docType>
	<params>
		<param>
			<key>docType</key>
			<value>Acquisition</value>
		</param>
		<param>
			<key>docType</key>
			<value>Loanin</value>
		</param>
		<param>
			<key>docType</key>
			<value>Loanout</value>
		</param>
		<param>
			<key>docType</key>
			<value>Person</value>
		</param>
	</params>
</ns2:invocationContext>

...

In no-context mode, if no doctypes are specified, all known doctypes will be reindexed. The following XML payload reindexes all documents in the system, one doctype at a time:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>nocontext</mode>
</ns2:invocationContext>

...

The following example reindexes CollectionObjects, from batch number 10 through batch number 20, with a batch size of 1000, and a pause of 100 ms between batches:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns2:invocationContext xmlns:ns2="http://collectionspace.org/services/common/invocable">
	<mode>nocontext</mode>
	<params>
		<param>
			<key>batchSize</key>
			<value>1000</value>
		</param>
		<param>
			<key>startBatch</key>
			<value>10</value>
		</param>
		<param>
			<key>endBatch</key>
			<value>20</value>
		</param>
		<param>
			<key>batchPause</key>
			<value>100</value>
		</param>
		<param>
			<key>docType</key>
			<value>CollectionObject</value>
		</param>
	</params>
</ns2:invocationContext>

...

It may be necessary to update the batch job after installation, to add registered doctypes, or change the name or notes. To do this, use the REST API to PUT an XML payload to the batch job. An example curl command follows, where the XML is in a file called update-payload.xml. Replace <username>, <password>, and <hostname> with appropriate values. Replace <csid> with the csid of the Reindex Full Text batch job.

Code Block

curl -X PUT -i -u "<username>:<password>" https://<hostname>/cspace-services/batch/<csid> -T update-payload.xml

An example XML payload is shown below, which changes the <forDocTypes> of the batch job:

Code Block

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document name="batch">
	<ns2:batch_common xmlns:ns2="http://collectionspace.org/services/batch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
		<forDocTypes>
			<forDocType>Acquisition</forDocType>
			<forDocType>Claim</forDocType>
			<forDocType>CollectionObject</forDocType>
		</forDocTypes>
	</ns2:batch_common>
</document>

...