Imports Service Home

The imports service will accept uploads in a specific XML format, and will import the records found in that request. Successfully imported records become available in nuxeo shell, nuxeo workbench, and cspace.

If you are importing Location / Movement / Inventory records, or any other types of records that can be "locked" and prevented from further modification, please see the special note on Specifying the lifecycle for imported records.

The request

Sending the request:

(In both of the following examples, be sure to substitute the actual name of your import file for myimportfile.xml.)

In CollectionSpace v2.2 and above, by submitting your XML input file as an 'application/xml' part of a 'multipart/form-data' payload:

curl http://localhost:8180/cspace-services/imports?type=xml
    -i
    -u admin@core.collectionspace.org:Administrator
    -F "file=@myimportfile.xml;type=application/xml"

In any recent version of CollectionSpace, including v2.2 and above, by submitting your XML input file as an 'application/xml' payload:

curl -X POST http://localhost:8180/cspace-services/imports
     -i
     -u admin@core.collectionspace.org:Administrator
     -H "Content-Type: application/xml"
     -T myimportfile.xml

Large Payloads and Timeouts

Large payloads can cause CollectionSpace to timeout. A timeout will result in no data being imported into CollectionSpace. The default timeout for all requests to CollectionSpace is defined by your administrator in the file at ${CSPACE_JEESERVER_HOME}/webapps/cspace-services/META-INF/context.xml.

You can also specify the timeout for an individual request by specifying a URL query parameter. The query parameter is "impTimout" with units of seconds and overrides the default value. For example, to make a request that times out after 1 hour, use the following URL:

http://localhost:8180/cspace-services/imports?impTimout=3600

Example Request

The example requests above both uploaded an XML-based import file named myimportfile.xml, from the local filesystem to a CollectionSpace server.

Here's an example, in turn, of what an import file looks like. This import file creates two Object Exit records, with each record containing data in two fields, exitNumber and exitNote:

<?xml version="1.0" encoding="UTF-8"?>
<imports>
  <import service="ObjectExit" type="ObjectExit">
    <schema xmlns:objectexit_common="http://collectionspace.org/services/objectexit" name="objectexit_common">
      <exitNumber>OE2010.1</exitNumber>
      <exitNote>Puedo comer vidrio, no me hace daño. Je peux manger du verre, ça ne me fait pas mal.</exitNote>
    </schema>
  </import>
  <import service="ObjectExit" type="ObjectExit">
    <schema xmlns:objectexit_common="http://collectionspace.org/services/objectexit" name="objectexit_common">
      <exitNumber>OE2010.2</exitNumber>
      <exitNote>This USASCII OBJECT has been declared a World Heritage Nuisance and must be disposed of forthwith - Sebastian</exitNote>
    </schema>
  </import>
</imports>

Example Result

Here is an example response from a success request to import the above payload:

Example Response
<?xml version="1.0"?>
<import>
    <msg>SUCCESS</msg>
    <importedRecords>
        <importedRecord>
            <doctype>Personauthority</doctype>
            <csid>11111111-2222-3333-4444-123456789012</csid>
        </importedRecord>
        <importedRecord>
            <doctype>Personauthority</doctype>
            <csid>2b1d79d9-e240-46e6-9932-74cc45b54e73</csid>
        </importedRecord>
    </importedRecords>
    <status>Success</status>
    <totalRecordsImported>2</totalRecordsImported>
    <numRecordsImportedByDocType>
        <numRecordsImported>
            <docType>Personauthority</docType>
            <numRecords>2</numRecords>
        </numRecordsImported>
    </numRecordsImportedByDocType>
    <report>READ: /usr/local/share/apache-tomcat-7.0.64/temp/imports-109403651390819578/Personauthorities/11111111-2222-3333-4444-123456789012/document.xml /Personauthorities/11111111-2222-3333-4444-123456789012 READ: /usr/local/share/apache-tomcat-7.0.64/temp/imports-109403651390819578/Personauthorities/2b1d79d9-e240-46e6-9932-74cc45b54e73/document.xml /Personauthorities/2b1d79d9-e240-46e6-9932-74cc45b54e73 </report>
</import>

 

The following example import file creates two Person Authority records. (Each authority can, in turn, contain zero or more person terms.) The first Person Authority in the payload will be assigned a random CollectionSpace ID (CSID), while the second will be created with a supplied ID.

<?xml version="1.0" encoding="UTF-8"?>
<imports>
    <import seq="1" service="Personauthorities" type="Personauthority">
        <schema xmlns:personauthorities_common="http://collectionspace.org/services/person" name="personauthorities_common">
            <personauthorities_common:displayName>American Poets</personauthorities_common:displayName>
            <personauthorities_common:shortIdentifier>americanpoets</personauthorities_common:shortIdentifier>
            <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType>
            <personauthorities_common:refName>urn:cspace:core.collectionspace.org:Personauthorities:name(americanpoets)'American Poets'</personauthorities_common:refName>
        </schema>
    </import>
    <import seq="2" service="Personauthorities" type="Personauthority" CSID="11111111-2222-3333-4444-123456789012">
        <schema xmlns:personauthorities_common="http://collectionspace.org/services/person" name="personauthorities_common">
            <personauthorities_common:displayName>French Poets</personauthorities_common:displayName>
            <personauthorities_common:shortIdentifier>frenchpoets</personauthorities_common:shortIdentifier>
            <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType>
            <personauthorities_common:refName>urn:cspace:core.collectionspace.org:Personauthorities:name(frenchpoets)'French Poets'</personauthorities_common:refName>
        </schema>
    </import>
</imports>

Sample Requests

There are also two samples of Import files included in CollectionSpace's source code tree, for testing. These files can be used to import, respectively:

Structure of the request:

The body of an imports request is an XML document, which contains data for one or more CollectionSpace records to import.
There are three main elements in this document:

Element

Notes

<imports>

A container element for multiple records to be imported.

  <import>

Identifies an import of one record into a particular CollectionSpace service (e.g. Cataloging, Acquisitions, Loans Out, Persons, Organizations). You can set attributes for this record as shown below.

    <schema>

Contains the actual data for one part of that record, conforming to a particular schema. You can have multiple <schema> elements (parts) per record. These might include schema elements for a record's common part - the default set of fields for that type of record - and (optionally) one or more extension parts. This data conforms to the format used by the Nuxeo import and export services.

(The format of the <schema> element is discussed further below in How to determine the correct values to put into a /imports/import/schema element )

Attribute values in the request

Here are the available attributes for the three main elements:

Element

Attribute

Required/Optional

Value

imports

–

 

import

service

Required

This routes you to the correct service.

 

type

Required

This declares your document type as Nuxeo sees it.

 

CSID

Optional

This may be absent. If present, it must be a value as returned from UUID. If present, it forces the record to use the CSID as its CSID, which is also the nuxeo document ID.

 

seq

Optional

This value does not appear in an imported document at all.
It is only present to help determine which element caused a problem in the event of bad data.
This number will appear in status reports from the imports service.
Currently, the import is done in file order.

 

createdAt

Optional

This gives the record a user-supplied createdAt timestamp, rather than simply reflecting the date and time when the record was imported. Useful for preserving metadata (such as original creation dates and times) when importing legacy data.

 

createdBy

Optional

This gives the record a user-supplied identifier for the creator of the record. Useful for preserving metadata when importing legacy data.

 

updatedAt

Optional

This gives the record a user-supplied updatedAt timestamp, rather than simply reflecting the date and time when the record was imported. Useful for preserving metadata when importing legacy data.

 

updatedBy

Optional

This gives the record a user-supplied identifier for the last updater of the record. Useful for preserving metadata when importing legacy data.

schema

–

 

Example:

<imports><import service="Personauthorities" type="Personauthority">...

Example:

<imports><import service="Personauthorities" type="Personauthority" CSID="11111111-2222-3333-4444-123456789012">...

Example:

<imports><import service="Intakes" type="Intake">...

Example:

<imports><import service="Intakes" type="Intake" createdAt="2012-05-31T18:18:18Z" createdBy="creator@example.com">...

How to find the service and type values.

You can find the acceptable values for service and type for your tenant in the $CSPACE_JEESERVER_HOME/cspace/config/services/tenants/{tenantname}/tenant-bindings.merged.xml file. (If you are importing records that use a custom schema extension specific to your tenant, you can instead find these values in the tenant-bindings-delta.xml file for your tenant.)

 <tenant:serviceBindings name="Personauthorities" version="0.1">
...
    <service:object
        id="1"
        name="Personauthority"
        version="0.1"
        xmlns:service='http://collectionspace.org/services/common/service'>

How to generate your own CSID value

The Imports service will automatically create unique identifiers (CSIDs) for your records. If you wish, you can instead manually specify the CSIDs to be given to some or all of the records you import, as noted above in Attribute values in the request.

CSID must be a Type 4 UUID. This format can be generated by any number of publicly available tools.
Here's how you do it in Java:

String csid = java.util.UUID.randomUUID().toString();

There is a trivial Java class that you can easily compile and run to generate (CSIDs) here:
https://github.com/collectionspace/Tools/blob/master/scripts/GenerateUUID.java

How to determine the correct values to put into an "/imports/import/schema" element:

  • First, create and save a data record in the a running CollectionSpace system that contains any customizations you have created for your deployment.  If you plan to import data into fields that are part of multi-value or repeating groups (those with a plus-sign icon above the field or fields), make sure the record you have created includes data in those repeating fields.
  • You then need the CSID of the new record, which is not displayed when you have created the record.  Use the keyword search capability (or Find and Edit) to find and display your new record.  The CSID will be displayed at the end of the URL.  An example for a collectionobject record is 6bcc588e-1343-4f91-8409 from the URL:

    http://core.collectionspace.org:8180/collectionspace/ui/core/html/cataloging.html?csid=6bcc588e-1343-4f91-8409
  • You can change this URL directly, or construct one in a new browser tab or window, so that it looks like this instead, with the same CSID but with a different path preceding it:

    http://core.collectionspace.org:8180/cspace-services/collectionobjects/6bcc588e-1343-4f91-8409
  • When entering this URL in your browser, you will be prompted to supply a user name and password. Use the same information that you use to login to the regular CollectionSpace user interface.
  • The appropriate XML record should be returned in the browser window. (Depending on your browser, you may need to "view source" or "view page source" to be able to copy a version of that record that you can readily paste into a text editor, etc.)

The URL for an authority record (person, organization, taxon, and so on) is more complex because you have to include the instance of the authority that the record appears in. If the record is in the default instance of the person authority, for example, the URL will look like:

http://core.collectionspace.org:8180/cspace-services/personauthorities/urn:cspace:name(person)/items/fea3b33b-a504-4a01-be5c

or

http://core.collectionspace.org:8180/cspace-services/personauthorities/cc221187-3199-4397-aa73/items/fea3b33b-a504-4a01-be5c

XML records obtained in this way require an additional transformation step in order to fit the requirements of the Import Service. Specifically, the namespaces are needed for each element. An XSLT can be used to insert this information, or you can edit the resulting schema by hand. For example, foreName element in the person authority will need to be edited from

<foreName>Chris</foreName>

to

<persons_common:foreName>Chris</persons_common:foreName>

This method, using the Nuxeo administrator console, only works in version of CollectionSpace earlier than 2.0

  • Fire up the Nuxeo Workbench:
  • Log in.
  • Click on "Default domain"
  • Click on "Workspaces"
  • Click on a service you want to know the format of, e.g. "Personauthorities"
  • Click on the export icon.
    • On the top right of the nuxeo workbench web page are two icons: one for export, one to print.
    • The export icon looks like this (click on thumbnail to see detail image here):
    • This will save the whole folder as a zip file.
  • Open that zip file and navigate to the document.xml file contained within one of the CSID-named folders.

The XML file that you have obtained will contain many elements and sections that you will not need for your import format. You only want to look at the "/document/schema" element that has an attribute of the schema part you want.

In this example, we are only concerned with /document/schema/

  <document repository="default" id="a382b125-abc8-4792-a532-088dfda4d8a9">
    <system></system>
    <schema name="dublincore">...</schema>
    <schema name="common">...</schema>
    <schema name="collectionspace_core">...</schema>
    <schema name="personauthorities_common">
        <personauthorities_common:displayName>Test Person Authority</personauthorities_common:displayName>
        <personauthorities_common:shortIdentifier>persontest1</personauthorities_common:shortIdentifier>
        <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType>
        <personauthorities_common:refName/>
    </schema>
  </document>

So for Personauthorities, the schema part name is "personauthorities_common".

  <schema name="personauthorities_common">
	<personauthorities_common:displayName>Test Person Authority</personauthorities_common:displayName>
	<personauthorities_common:shortIdentifier>persontest1</personauthorities_common:shortIdentifier>
	<personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType>
	<personauthorities_common:refName/>
  </schema>

This is exactly the format that Nuxeo expects on import.
Sections like this must then be wrapped in an "/imports/import" element. And you may have multiple /imports/import/schema elements as shown above.

Variables supported in expansion of request

${docID}

this will be the CSID.

You may refer to the CSID anywhere in the schema upload.
For example, to show the CSID in the displayName, you would do this in your request file

<personauthorities_common:displayName>Perf Test Person Auth ${docID}</personauthorities_common:displayName>

Remember that the CSID attribute is not required, and should be absent if empty. If you provide it, it becomes the docID and the CSID that cspace uses.
If you provide it, it will expand where ${docID} is found. If you don't provide it, ${docID} will be expanded with a value generated by the system.

Internal variables

These are supported in the expansion of the service-document.xml template:

https://github.com/collectionspace/services/blob/master/services/imports/service/src/main/resources/templates/service-document.xml

You would not use these in the imports request, but be aware that these are reserved words, when present with the dollar-sign and braces.

${Schema}

this will be the body of each of the schema parts you send in.
You would NOT use this in a schema upload.

${ServiceType}

this will be the service type, such as "Personauthority"

${ServiceName}

this will be the service name, such as "Personauthorities"

Where to find error logs

Nuxeo errors show up in one of these two files:

$CSPACE_JEESERVER_HOME/logs/cspace-services.log
$CSPACE_JEESERVER_HOME/logs/catalina.out

where $CSPACE_JEESERVER_HOME=/usr/local/share/apache-tomcat-6.0.33, this would be:

/usr/local/share/apache-tomcat-6.0.33/logs/cspace-services.log
/usr/local/share/apache-tomcat-6.0.33/logs/catalina.out

This error doesn't seem to apply:

[org.nuxeo.ecm.core.io.impl.AbstractDocumentReader:60] no ID for document, won't add org.nuxeo.ecm.core.io.impl.ExportedDocumentImpl@1c7aae9d

Results

The result of the POST is a report showing which files were imported, and which workspaces they were imported to.
For example, our POST above results in this being sent as the reponse:

HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.2.3.GA (build: SVNTag=JBoss_4_2_3_GA date=200807181439)/JBossWeb-2.0
Set-Cookie: JSESSIONID=5DB37B19B46D9F2BE3F94074FA55F84B; Path=/
Content-Type: application/xml
Content-Length: 295
Date: Thu, 14 Apr 2011 19:56:47 GMT

<?xml ?><import><msg>SUCCESS</msg><report></report>
READ: /private/var/folders/gj/gjyPnAPuH7qFUS27+IsYwk+++TM/-Tmp-/imports-fefc12c1-cd53-4860-b692-05daa0f9a7a9/CollectionObjects/5d15f3eb-0600-4582-bab9-452df9c7d189/document.xml

/CollectionObjects/5d15f3eb-0600-4582-bab9-452df9c7d189
</import>

This means that a schema was wrapped in a document, placed in a temp dir on the cspace server, and imported into the nuxeo workspace called /CollectionObjects/ and given the documentID 5d15f3eb-0600-4582-bab9-452df9c7d189.
You will get one such line for each schema part imported.

Importing records exported from a CollectionSpace instance

Note: If you have exported records via the Nuxeo workbench, using the steps described in How to determine the correct values to put into an "/imports/import/schema" element, you will have downloaded a .zip file. Expand that file, and you will get many individual directories - one per record - each containing a single XML file named document.xml.

If you wish to import these records, either into the same CollectionSpace system or a different system, the following Ruby script can help pull the relevant data out of these export files into a single import file, in the correct structure for submitting that file to the Imports Service:

https://github.com/collectionspace/Tools/blob/master/scripts/pkg-records-for-import.rb

To install Ruby (1.8 or higher), as required by this script:

  • Aptitude-compatible Linux (e.g. Debian, Ubuntu): sudo aptitude install ruby or sudo apt-get install ruby-full
  • Yum-compatible Linux (e.g. RedHat, Fedora): sudo yum install ruby
  • Microsoft Windows: RubyInstaller for Windows
  • Mac OS X: Ruby 1.8 or higher is included with Mac OS X 10.4 Tiger and above

To run this script:

  • Save it into a new file, within the folder that contains one or more sub-folders, each containing a record to be imported.
  • Edit the values in the Configuration section near the top of the script.
  • Run the script at your command prompt via the command ruby pkg-records-for-import.rb

Note that the script needs to be configured via constants, and that some cleanup or other manual modification of records may still be required after running the script, as described in comments near the top of the script.

Specifying the lifecycle for imported records

By default, imported records are given the default lifecycle. (A "lifecycle" is a specific set of states into which a record can be placed, such as active and deleted, and the associated set of transitions between those states.)

If you are importing Location / Movement / Inventory records, or any other types of records that can be 'locked' and prevented from further modification, you will need to ensure that those records are given the cs_locking lifestyle, which includes a locked state.

Before importing such records:

  • Edit the file $CSPACE_JEESERVER_HOME/cspace/config/services/resources/templates/service-document.xml
  • Change the value of <lifecycle-policy> to cs_locking; e.g.

    <document repository="default" id="${docID}">
      <system>
    ...
        <lifecycle-policy>cs_locking</lifecycle-policy>
    

After importing such records:

  • Edit the file $CSPACE_JEESERVER_HOME/cspace/config/services/resources/templates/service-document.xml
  • Change the value of <lifecycle-policy> to cs_default

These changes take effect immediately and do not require a server restart.

Allowing the lifecycle to be applied to a set of imported records to be specified by the user, at import time, is covered in issue CSPACE-5306.