Wiki Markup |
---|
Draft and notes |
...
Note: Process and documentation for initializing an authority required. |
...
1. Create at least one sample record each of your various record types. |
...
2. Export those records using the Web-based administrative interface to Nuxeo, using Laramie's step-by-step instructions here: |
...
<[http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-Howtodeterminethecorrectvaluestoputintoan%22%2Fimports%2Fimport%2Fschema%22element%3A]> |
...
(The Nuxeo web-based administrative interface comes with CSpace installation, and runs on port 8080). |
...
(Note: We need to check to see if complex catalog record is correctly returned, e.g., nested repeating elements |
...
, and having multiple schemas, having ampersands and other character entities.) (Note: Need to confirm how we change nuxeo administrator password, both in nuxeo and in services configuration.) |
...
3. Use the Ruby script near the end of that document to take the |
...
records you export in step 2, and convert them to a format that the |
...
import service can ingest. You can do this manually, as well, but a |
...
script makes this easier. |
...
(Glen created an 'ed' script for the same purpose; you can ask him |
...
for that, if you wish.) |
...
If you use the Ruby script, you'll need to do two things:
a. A one-time task: install Ruby on your system, if it isn't already
present. There are examples / links here on how to do so:
...
Note: We should rewrite this in groovy or something more standard. In the meantime, script should be in subversion. Multiple copies are sitting around. Aron will create a location for this, in a scripts directory. If you use the Ruby script, you'll need to do two things: a. A one-time task: install Ruby on your system, if it isn't already present. There are examples / links here on how to do so: <[http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-ImportingrecordsexportedfromaCollectionSpaceinstance]> |
...
b. Edit the three variables near the top of the script to reflect |
...
the specifics of the record type you're importing. Step 2 should give |
...
you the information you need to fill in these values: |
...
servicename = "Persons" |
...
recordtype = "Person" |
...
schemas = \[ "persons_common" \] |
...
In the case of records where pertinent data is stored in more than |
...
one schema; e.g. a common schema, like collectionobjects_common, and |
...
one or more extension schemas, like collectionobjects_naturalhistory, |
...
the values of the 'schemas' variable might look like this: |
...
schemas = \[ "collectionobjects_common", |
...
"collectionobjects_naturalhistory" \] |
...
(Scripting contributions to make these variables command-line |
...
parameters are welcome \:-) |
...
4. If you need to generate your own CSIDs for authority terms, see |
...
Glen's comment at the bottom, which provides useful additions to that |
...
Ruby script. |
...
5. Manually do whatever cleanup may be needed of characters or |
...
character sequences in the data itself that may trip up the import |
...
service, either via search and replace, or using a 'sed' script, |
...
BBEdit text factory, etc. From what I recall, I had to do the |
...
following:
a. Search for each instance of an ampersand, which likely would
indicate an XML character entity, and determine what substitutions
might need to be made:
* Per Susan, instances of& (which result in an ampersand
character being imported) each needed to be replaced by this literal
string:
&
* Instances of< and> needed to be replaced as well. Since
those appeared to simply be used as brackets, in the UCJEPS records we
were given, I replaced these with [ and ], rather than doing
something like the above.
* I didn't spot either of the other predefined XML entities in the
UCJEPS data - " or ' - but those would also need to be
treated similarly.
As well, you might look for dollar signs, which are triggers for
macro interpolation. I didn't happen to run across any of those in
the UCJEPS Person records, and so don't know first-hand how you might
munge those.
...
following: # unescaped special XML characters need to be turned into XML entities - &, <, >, ', " Should be done in groovy or ruby script. # XML entities need to be doubled (e.g., in the ruby or groovy script) (Should be fixed in ruby script, replacing with &amp\;) # As well, you might look for dollar signs, which are triggers for macro interpolation. I didn't happen to run across any of those in the UCJEPS Person records, and so don't know first-hand how you might munge those. Maybe that is only an issue if the format is ${sometext}, so this might not be an issue. 6. Perform the import (curl, or wrap into the ruby-groovy script). Notes: * Currently, import performance seems to slow down with large record sets. Glen splits these into batches of 5k to 10k records. * File system is filling up also. |