Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
Draft and notes

...



Note: Process and documentation for initializing an authority required.

...





1. Create at least one sample record each of your various record types.

...



2. Export those records using the Web-based administrative interface to Nuxeo, using Laramie's step-by-step instructions here:

...



<[http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-Howtodeterminethecorrectvaluestoputintoan%22%2Fimports%2Fimport%2Fschema%22element%3A]>

...



(The Nuxeo web-based administrative interface comes with CSpace installation, and runs on port 8080).

...

&nbsp;

(Note: We need  to check to see if complex catalog record is correctly returned, e.g.,  nested repeating elements

...

, and having multiple schemas, having ampersands and other character entities.)

(Note: Need to confirm how we change nuxeo administrator password, both in nuxeo and in services configuration.)

...




3. Use the Ruby script near the end of that document to take the

...

 records you export in step 2, and convert them to a format that the

...

 import service can ingest. You can do this manually, as well, but a

...

 script makes this easier.

...

&nbsp; (Glen created an 'ed' script for the same purpose; you can ask him

...

 for that, if you wish.)

...

If you use the Ruby script, you'll need to do two things:

a. A one-time task: install Ruby on your system, if it isn't already

present. There are examples / links here on how to do so:

...



Note: We should rewrite this in groovy or something more standard.  In the meantime, script should be in subversion.  Multiple copies are sitting around.  Aron will create a location for this, in a scripts directory.

If you use the Ruby script, you'll need to do two things:

a. A one-time task: install Ruby on your system, if it isn't already present. There are examples / links here on how to do so:

<[http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-ImportingrecordsexportedfromaCollectionSpaceinstance]>

...



b. Edit the three variables near the top of the script to reflect

...

 the specifics of the record type you're importing. Step 2 should give

...

 you the information you need to fill in these values:

...



servicename = "Persons"

...



recordtype = "Person"

...



schemas = \[ "persons_common" \]

...




In the case of records where pertinent data is stored in more than

...

 one schema; e.g. a common schema, like collectionobjects_common, and

...

 one or more extension schemas, like collectionobjects_naturalhistory,

...

 the values of the 'schemas' variable might look like this:

...




schemas = \[ "collectionobjects_common",

...

 "collectionobjects_naturalhistory" \]

...




(Scripting contributions to make these variables command-line

...

 parameters are welcome \:-)

...





4. If you need to generate your own CSIDs for authority terms, see

...

 Glen's comment at the bottom, which provides useful additions to that

...

 Ruby script.

...




5. Manually do whatever cleanup may be needed of characters or

...

 character sequences in the data itself that may trip up the import

...


service, either via search and replace, or using a 'sed' script,

...

 BBEdit text factory, etc. From what I recall, I had to do the

...

following:

a. Search for each instance of an ampersand, which likely would

indicate an XML character entity, and determine what substitutions

might need to be made:

* Per Susan, instances of&    (which result in an ampersand

character being imported) each needed to be replaced by this literal

string:

&

* Instances of<    and>    needed to be replaced as well. Since

those appeared to simply be used as brackets, in the UCJEPS records we

were given, I replaced these with [ and ], rather than doing

something like the above.

* I didn't spot either of the other predefined XML entities in the

UCJEPS data - " or ' - but those would also need to be

treated similarly.

As well, you might look for dollar signs, which are triggers for

macro interpolation. I didn't happen to run across any of those in

the UCJEPS Person records, and so don't know first-hand how you might

munge those.

...

 following:
# unescaped special XML characters need to be turned into XML entities - &, <, >, ', " Should be done in groovy or ruby script.
# XML entities need to be doubled (e.g., in the ruby or groovy script) (Should be fixed in ruby script, replacing with &amp;amp\;)
# As well, you might look for dollar signs, which are triggers for macro interpolation. I didn't happen to run across any of those in the UCJEPS Person records, and so don't know first-hand how you might munge those. Maybe that is only an issue if the format is ${sometext}, so this might not be an issue.








6. Perform the import (curl, or wrap into the ruby-groovy script).

Notes:
* Currently, import performance seems to slow down with large record sets.&nbsp; Glen splits these into batches of 5k to 10k records.
* File system is filling up also.