Imports Service Home
The imports service will accept uploads in a specific XML format, and will import the records found in that request. Successfully imported records become available in nuxeo shell, nuxeo workbench, and cspace.
If you are importing Location / Movement / Inventory records, or any other types of records that can be "locked" and prevented from further modification, please see the special note on Specifying the lifecycle for imported records.
The request
Sending the request:
(In both of the following examples, be sure to substitute the actual name of your import file for myimportfile.xml
.)
In CollectionSpace v2.2 and above, by submitting your XML input file as an 'application/xml' part of a 'multipart/form-data' payload:
curl http://localhost:8180/cspace-services/imports?type=xml -i -u admin@core.collectionspace.org:Administrator -F "file=@myimportfile.xml;type=application/xml"
In any recent version of CollectionSpace, including v2.2 and above, by submitting your XML input file as an 'application/xml' payload:
curl -X POST http://localhost:8180/cspace-services/imports -i -u admin@core.collectionspace.org:Administrator -H "Content-Type: application/xml" -T myimportfile.xml
Large Payloads and Timeouts
Large payloads can cause CollectionSpace to timeout. A timeout will result in no data being imported into CollectionSpace. The default timeout for all requests to CollectionSpace is defined by your administrator in the file at ${CSPACE_JEESERVER_HOME}/webapps/cspace-services/META-INF/context.xml.
You can also specify the timeout for an individual request by specifying a URL query parameter. The query parameter is "impTimout" with units of seconds and overrides the default value. For example, to make a request that times out after 1 hour, use the following URL:
http://localhost:8180/cspace-services/imports?impTimout=3600
Example Request
The example requests above both uploaded an XML-based import file named myimportfile.xml,
from the local filesystem to a CollectionSpace server.
Here's an example, in turn, of what an import file looks like. This import file creates two Object Exit records, with each record containing data in two fields, exitNumber
and exitNote
:
<?xml version="1.0" encoding="UTF-8"?> <imports> <import service="ObjectExit" type="ObjectExit"> <schema xmlns:objectexit_common="http://collectionspace.org/services/objectexit" name="objectexit_common"> <exitNumber>OE2010.1</exitNumber> <exitNote>Puedo comer vidrio, no me hace daño. Je peux manger du verre, ça ne me fait pas mal.</exitNote> </schema> </import> <import service="ObjectExit" type="ObjectExit"> <schema xmlns:objectexit_common="http://collectionspace.org/services/objectexit" name="objectexit_common"> <exitNumber>OE2010.2</exitNumber> <exitNote>This USASCII OBJECT has been declared a World Heritage Nuisance and must be disposed of forthwith - Sebastian</exitNote> </schema> </import> </imports>
Example Result
Here is an example response from a success request to import the above payload:
<?xml version="1.0"?> <import> <msg>SUCCESS</msg> <importedRecords> <importedRecord> <doctype>Personauthority</doctype> <csid>11111111-2222-3333-4444-123456789012</csid> </importedRecord> <importedRecord> <doctype>Personauthority</doctype> <csid>2b1d79d9-e240-46e6-9932-74cc45b54e73</csid> </importedRecord> </importedRecords> <status>Success</status> <totalRecordsImported>2</totalRecordsImported> <numRecordsImportedByDocType> <numRecordsImported> <docType>Personauthority</docType> <numRecords>2</numRecords> </numRecordsImported> </numRecordsImportedByDocType> <report>READ: /usr/local/share/apache-tomcat-7.0.64/temp/imports-109403651390819578/Personauthorities/11111111-2222-3333-4444-123456789012/document.xml /Personauthorities/11111111-2222-3333-4444-123456789012 READ: /usr/local/share/apache-tomcat-7.0.64/temp/imports-109403651390819578/Personauthorities/2b1d79d9-e240-46e6-9932-74cc45b54e73/document.xml /Personauthorities/2b1d79d9-e240-46e6-9932-74cc45b54e73 </report> </import>
Â
The following example import file creates two Person Authority records. (Each authority can, in turn, contain zero or more person terms.) The first Person Authority in the payload will be assigned a random CollectionSpace ID (CSID), while the second will be created with a supplied ID.
<?xml version="1.0" encoding="UTF-8"?> <imports> <import seq="1" service="Personauthorities" type="Personauthority"> <schema xmlns:personauthorities_common="http://collectionspace.org/services/person" name="personauthorities_common"> <personauthorities_common:displayName>American Poets</personauthorities_common:displayName> <personauthorities_common:shortIdentifier>americanpoets</personauthorities_common:shortIdentifier> <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType> <personauthorities_common:refName>urn:cspace:core.collectionspace.org:Personauthorities:name(americanpoets)'American Poets'</personauthorities_common:refName> </schema> </import> <import seq="2" service="Personauthorities" type="Personauthority" CSID="11111111-2222-3333-4444-123456789012"> <schema xmlns:personauthorities_common="http://collectionspace.org/services/person" name="personauthorities_common"> <personauthorities_common:displayName>French Poets</personauthorities_common:displayName> <personauthorities_common:shortIdentifier>frenchpoets</personauthorities_common:shortIdentifier> <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType> <personauthorities_common:refName>urn:cspace:core.collectionspace.org:Personauthorities:name(frenchpoets)'French Poets'</personauthorities_common:refName> </schema> </import> </imports>
Sample Requests
There are also two samples of Import files included in CollectionSpace's source code tree, for testing. These files can be used to import, respectively:
- A Cataloging record, with all or nearly all of its standard fields, including repeating (multivalued) fields: https://github.com/collectionspace/services/blob/master/services/imports/service/src/test/resources/requests/collectionobject-request.xml
- Two Person Authority records (identical to the import file also shown in one of the examples above): https://github.com/collectionspace/services/blob/master/services/imports/service/src/test/resources/requests/authority-request.xml
Structure of the request:
The body of an imports request is an XML document, which contains data for one or more CollectionSpace records to import.
There are three main elements in this document:
Element | Notes |
---|---|
<imports> | A container element for multiple records to be imported. |
 <import> | Identifies an import of one record into a particular CollectionSpace service (e.g. Cataloging, Acquisitions, Loans Out, Persons, Organizations). You can set attributes for this record as shown below. |
  <schema> | Contains the actual data for one part of that record, conforming to a particular schema. You can have multiple |
(The format of the <schema>
element is discussed further below in How to determine the correct values to put into a /imports/import/schema element )
Attribute values in the request
Here are the available attributes for the three main elements:
Element | Attribute | Required/Optional | Value |
---|---|---|---|
imports | – |  | |
import | service | Required | This routes you to the correct service. |
 | type | Required | This declares your document type as Nuxeo sees it. |
 | CSID | Optional | This may be absent. If present, it must be a value as returned from UUID. If present, it forces the record to use the CSID as its CSID, which is also the nuxeo document ID. |
 | seq | Optional | This value does not appear in an imported document at all. |
 | createdAt | Optional | This gives the record a user-supplied createdAt timestamp, rather than simply reflecting the date and time when the record was imported. Useful for preserving metadata (such as original creation dates and times) when importing legacy data. |
 | createdBy | Optional | This gives the record a user-supplied identifier for the creator of the record. Useful for preserving metadata when importing legacy data. |
 | updatedAt | Optional | This gives the record a user-supplied updatedAt timestamp, rather than simply reflecting the date and time when the record was imported. Useful for preserving metadata when importing legacy data. |
 | updatedBy | Optional | This gives the record a user-supplied identifier for the last updater of the record. Useful for preserving metadata when importing legacy data. |
schema | – |  |
Example:
<imports><import service="Personauthorities" type="Personauthority">...
Example:
<imports><import service="Personauthorities" type="Personauthority" CSID="11111111-2222-3333-4444-123456789012">...
Example:
<imports><import service="Intakes" type="Intake">...
Example:
<imports><import service="Intakes" type="Intake" createdAt="2012-05-31T18:18:18Z" createdBy="creator@example.com">...
How to find the service and type values.
You can find the acceptable values for service and type for your tenant in the $CSPACE_JEESERVER_HOME/cspace/config/services/tenants/{tenantname}/tenant-bindings.merged.xml
file. (If you are importing records that use a custom schema extension specific to your tenant, you can instead find these values in the tenant-bindings-delta.xml
file for your tenant.)
- The service attribute value can be found as the name attribute of theÂ
tenant:serviceBindings
element. - The type can be found as the name attribute of theÂ
service:object
element.
https://raw.githubusercontent.com/collectionspace/services/master/services/common/src/main/cspace/config/services/tenants/tenant-bindings-proto-unified.xml
<tenant:serviceBindings name="Personauthorities" version="0.1"> ... <service:object id="1" name="Personauthority" version="0.1" xmlns:service='http://collectionspace.org/services/common/service'>
How to generate your own CSID value
The Imports service will automatically create unique identifiers (CSIDs) for your records. If you wish, you can instead manually specify the CSIDs to be given to some or all of the records you import, as noted above in Attribute values in the request.
CSID must be a Type 4 UUID. This format can be generated by any number of publicly available tools.
Here's how you do it in Java:
String csid = java.util.UUID.randomUUID().toString();
There is a trivial Java class that you can easily compile and run to generate (CSIDs) here:
https://github.com/collectionspace/Tools/blob/master/scripts/GenerateUUID.java
How to determine the correct values to put into an "/imports/import/schema" element:
- First, create and save a data record in the a running CollectionSpace system that contains any customizations you have created for your deployment. If you plan to import data into fields that are part of multi-value or repeating groups (those with a plus-sign icon above the field or fields), make sure the record you have created includes data in those repeating fields.
You then need the CSID of the new record, which is not displayed when you have created the record. Use the keyword search capability (or Find and Edit) to find and display your new record. The CSID will be displayed at the end of the URL. An example for a collectionobject record is 6bcc588e-1343-4f91-8409 from the URL:
http://core.collectionspace.org:8180/collectionspace/ui/core/html/cataloging.html?csid=6bcc588e-1343-4f91-8409
You can change this URL directly, or construct one in a new browser tab or window, so that it looks like this instead, with the same CSID but with a different path preceding it:
http://core.collectionspace.org:8180/cspace-services/collectionobjects/6bcc588e-1343-4f91-8409
- When entering this URL in your browser, you will be prompted to supply a user name and password. Use the same information that you use to login to the regular CollectionSpace user interface.
- The appropriate XML record should be returned in the browser window. (Depending on your browser, you may need to "view source" or "view page source" to be able to copy a version of that record that you can readily paste into a text editor, etc.)
The URL for an authority record (person, organization, taxon, and so on) is more complex because you have to include the instance of the authority that the record appears in. If the record is in the default instance of the person authority, for example, the URL will look like:
http://core.collectionspace.org:8180/cspace-services/personauthorities/urn:cspace:name(person)/items/fea3b33b-a504-4a01-be5c
or
http://core.collectionspace.org:8180/cspace-services/personauthorities/cc221187-3199-4397-aa73/items/fea3b33b-a504-4a01-be5c
XML records obtained in this way require an additional transformation step in order to fit the requirements of the Import Service. Specifically, the namespaces are needed for each element. An XSLT can be used to insert this information, or you can edit the resulting schema by hand. For example, foreName element in the person authority will need to be edited from
<foreName>Chris</foreName>
to
<persons_common:foreName>Chris</persons_common:foreName>
This method, using the Nuxeo administrator console, only works in version of CollectionSpace earlier than 2.0
- Fire up the Nuxeo Workbench:
- Locally, if you have started Nuxeo:
- On CollectionSpace's nightly server:
- Log in.
- Click on "Default domain"
- Click on "Workspaces"
- Click on a service you want to know the format of, e.g. "Personauthorities"
- Click on the export icon.
- On the top right of the nuxeo workbench web page are two icons: one for export, one to print.
- The export icon looks like this (click on thumbnail to see detail image here):
- This will save the whole folder as a zip file.
- Open that zip file and navigate to the document.xml file contained within one of the CSID-named folders.
The XML file that you have obtained will contain many elements and sections that you will not need for your import format. You only want to look at the "/document/schema" element that has an attribute of the schema part you want.
In this example, we are only concerned with /document/schema/
<document repository="default" id="a382b125-abc8-4792-a532-088dfda4d8a9"> <system></system> <schema name="dublincore">...</schema> <schema name="common">...</schema> <schema name="collectionspace_core">...</schema> <schema name="personauthorities_common"> <personauthorities_common:displayName>Test Person Authority</personauthorities_common:displayName> <personauthorities_common:shortIdentifier>persontest1</personauthorities_common:shortIdentifier> <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType> <personauthorities_common:refName/> </schema> </document>
So for Personauthorities, the schema part name is "personauthorities_common".
<schema name="personauthorities_common"> <personauthorities_common:displayName>Test Person Authority</personauthorities_common:displayName> <personauthorities_common:shortIdentifier>persontest1</personauthorities_common:shortIdentifier> <personauthorities_common:vocabType>PersonAuthority</personauthorities_common:vocabType> <personauthorities_common:refName/> </schema>
This is exactly the format that Nuxeo expects on import.
Sections like this must then be wrapped in an "/imports/import" element. And you may have multiple /imports/import/schema elements as shown above.
Variables supported in expansion of request
${docID} | this will be the CSID. |
You may refer to the CSID anywhere in the schema upload.
For example, to show the CSID in the displayName, you would do this in your request file
<personauthorities_common:displayName>Perf Test Person Auth ${docID}</personauthorities_common:displayName>
Remember that the CSID attribute is not required, and should be absent if empty. If you provide it, it becomes the docID and the CSID that cspace uses.
If you provide it, it will expand where ${docID} is found. If you don't provide it, ${docID} will be expanded with a value generated by the system.
Internal variables
These are supported in the expansion of the service-document.xml
template:
You would not use these in the imports request, but be aware that these are reserved words, when present with the dollar-sign and braces.
${Schema} | this will be the body of each of the schema parts you send in. |
${ServiceType} | this will be the service type, such as "Personauthority" |
${ServiceName} | this will be the service name, such as "Personauthorities" |
Where to find error logs
Nuxeo errors show up in one of these two files:
$CSPACE_JEESERVER_HOME/logs/cspace-services.log
$CSPACE_JEESERVER_HOME/logs/catalina.out
where $CSPACE_JEESERVER_HOME=/usr/local/share/apache-tomcat-6.0.33, this would be:
/usr/local/share/apache-tomcat-6.0.33/logs/cspace-services.log
/usr/local/share/apache-tomcat-6.0.33/logs/catalina.out
This error doesn't seem to apply:
[org.nuxeo.ecm.core.io.impl.AbstractDocumentReader:60] no ID for document, won't add org.nuxeo.ecm.core.io.impl.ExportedDocumentImpl@1c7aae9d
Results
The result of the POST is a report showing which files were imported, and which workspaces they were imported to.
For example, our POST above results in this being sent as the reponse:
HTTP/1.1 100 Continue HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Powered-By: Servlet 2.4; JBoss-4.2.3.GA (build: SVNTag=JBoss_4_2_3_GA date=200807181439)/JBossWeb-2.0 Set-Cookie: JSESSIONID=5DB37B19B46D9F2BE3F94074FA55F84B; Path=/ Content-Type: application/xml Content-Length: 295 Date: Thu, 14 Apr 2011 19:56:47 GMT <?xml ?><import><msg>SUCCESS</msg><report></report> READ: /private/var/folders/gj/gjyPnAPuH7qFUS27+IsYwk+++TM/-Tmp-/imports-fefc12c1-cd53-4860-b692-05daa0f9a7a9/CollectionObjects/5d15f3eb-0600-4582-bab9-452df9c7d189/document.xml /CollectionObjects/5d15f3eb-0600-4582-bab9-452df9c7d189 </import>
This means that a schema was wrapped in a document, placed in a temp dir on the cspace server, and imported into the nuxeo workspace called /CollectionObjects/ and given the documentID 5d15f3eb-0600-4582-bab9-452df9c7d189.
You will get one such line for each schema part imported.
Importing records exported from a CollectionSpace instance
Note: If you have exported records via the Nuxeo workbench, using the steps described in How to determine the correct values to put into an "/imports/import/schema" element, you will have downloaded a .zip
file. Expand that file, and you will get many individual directories - one per record - each containing a single XML file named document.xml
.
If you wish to import these records, either into the same CollectionSpace system or a different system, the following Ruby script can help pull the relevant data out of these export files into a single import file, in the correct structure for submitting that file to the Imports Service:
https://github.com/collectionspace/Tools/blob/master/scripts/pkg-records-for-import.rb
To install Ruby (1.8 or higher), as required by this script:
- Aptitude-compatible Linux (e.g. Debian, Ubuntu):
sudo aptitude install ruby
orsudo apt-get install ruby-full
- Yum-compatible Linux (e.g. RedHat, Fedora):
sudo yum install ruby
- Microsoft Windows: RubyInstaller for Windows
- Mac OS X: Ruby 1.8 or higher is included with Mac OS X 10.4 Tiger and above
To run this script:
- Save it into a new file, within the folder that contains one or more sub-folders, each containing a record to be imported.
- Edit the values in the
Configuration
section near the top of the script. - Run the script at your command prompt via the command
ruby pkg-records-for-import.rb
Note that the script needs to be configured via constants, and that some cleanup or other manual modification of records may still be required after running the script, as described in comments near the top of the script.
Specifying the lifecycle for imported records
By default, imported records are given the default lifecycle. (A "lifecycle" is a specific set of states into which a record can be placed, such as active and deleted, and the associated set of transitions between those states.)
If you are importing Location / Movement / Inventory records, or any other types of records that can be 'locked' and prevented from further modification, you will need to ensure that those records are given the cs_locking
lifestyle, which includes a locked
state.
Before importing such records:
- Edit the file
$CSPACE_JEESERVER_HOME/cspace/config/services/resources/templates/service-document.xml
Change the value of
<lifecycle-policy>
tocs_locking
; e.g.<document repository="default" id="${docID}"> <system> ... <lifecycle-policy>cs_locking</lifecycle-policy>
After importing such records:
- Edit the file
$CSPACE_JEESERVER_HOME/cspace/config/services/resources/templates/service-document.xml
- Change the value of
<lifecycle-policy>
tocs_default
These changes take effect immediately and do not require a server restart.
Allowing the lifecycle to be applied to a set of imported records to be specified by the user, at import time, is covered in issue CSPACE-5306.