Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markupinfo

Draft

and notes

with notes and next steps

Note:

...

Process

...

and

...

documentation

...

for

...

initializing

...

an

...

authority

...

required. Currently, Glen creates a record in the CSpace UI which initializes the default authority and creates a CSID.  That CSID can then be used in the steps below.

1. Create at least one sample record each of your various record types. Make sure you have data in any repeating fields or field groups.

2. Export those records using the Web-based administrative interface to Nuxeo, using Laramie's step-by-step

...

instructions

...

here:

...

<

...

http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-Howtodeterminethecorrectvaluestoputintoan%22%2Fimports%2Fimport%2Fschema%22element%3A

...

>

Notes:

  • The Nuxeo web-based

...

  • administrative

...

  • interface

...

  • comes

...

  • with

...

  • CSpace

...

  • installation,

...

  • and

...

  • runs

...

  • on

...

  • port

...

  • 8080

...

  • We need to check to see if complex catalog record is correctly returned, e.g.,

...

  • nested

...

  • repeating

...

  • elements,

...

  • and

...

  • having

...

  • multiple

...

  • schemas,

...

  • having

...

  • ampersands

...

  • and

...

  • other

...

  • character

...

  • entities.

...

  •   Note: It looks like you want to export a record that has data in all fields, but especially nested (repeating) structures.  Otherwise the  XML file might not have the XML elements for the nested structure.  See Talend-ETL work for UCB deployments notes as well.
  • Need to confirm how we change nuxeo administrator password, both in nuxeo and in services configuration.
  • This Nuxeo-based export file can also be used to seed the XML tree for outputs created via Talend Open Studio

3. Use the Ruby script near the end of that document to take the records you export in step 2, and convert them to a format that the import service can ingest. You can do this manually, as well, but a script makes this easier.  (Glen created an 'ed' script for the same purpose; you can ask him for that, if you wish.)

Note: We should rewrite this in groovy or something more standard. In the meantime, the ruby script should be in subversion. Multiple copies are sitting around. Aron will create a location for this, in a scripts directory.

If you use the Ruby script, you'll need to do two things:

a. A one-time task: install Ruby on your system, if it isn't already present. There are examples / links here on how to do so:

<http://wiki.collectionspace.org/display/collectionspace/Imports+Service+Home#ImportsServiceHome-ImportingrecordsexportedfromaCollectionSpaceinstance

...

>

...

b.

...

Edit

...

the

...

three

...

variables

...

near

...

the

...

top

...

of

...

the

...

script

...

to

...

reflect

...

the

...

specifics

...

of

...

the

...

record

...

type

...

you're

...

importing.

...

Step

...

2

...

should

...

give

...

you

...

the

...

information

...

you

...

need

...

to

...

fill

...

in

...

these

...

values:

...

servicename

...

=

...

"Persons"

...

recordtype

...

=

...

"Person"

...

schemas

...

=

...

[

...

"persons_common"

...

]

...

In

...

the

...

case

...

of

...

records

...

where

...

pertinent

...

data

...

is

...

stored

...

in

...

more

...

than

...

one

...

schema;

...

e.g.

...

a

...

common

...

schema,

...

like

...

collectionobjects_common,

...

and

...

one

...

or

...

more

...

extension

...

schemas,

...

like

...

collectionobjects_naturalhistory,

...

the

...

values

...

of

...

the

...

'schemas'

...

variable

...

might

...

look

...

like

...

this:

...

schemas

...

=

...

[

...

"collectionobjects_common",

...

"collectionobjects_naturalhistory"

...

]

...

(Scripting

...

contributions

...

to

...

make

...

these

...

variables

...

command-line

...

parameters

...

are

...

welcome

...

:-)

...

4.

...

If

...

you

...

need

...

to

...

generate

...

your

...

own

...

CSIDs

...

for

...

authority

...

terms,

...

see

...

Glen's

...

comment

...

at

...

the

...

bottom,

...

which

...

provides

...

useful

...

additions

...

to

...

that

...

Ruby

...

script.

...

5.

...

Manually

...

do

...

whatever

...

cleanup

...

may

...

be

...

needed

...

of

...

characters

...

or

...

character

...

sequences

...

in

...

the

...

data

...

itself

...

that

...

may

...

trip

...

up

...

the

...

import

...


service,

...

either

...

via

...

search

...

and

...

replace,

...

or

...

using

...

a

...

'sed'

...

script,

...

BBEdit

...

text

...

factory,

...

etc.

...

From

...

what

...

I

...

recall,

...

I

...

had

...

to

...

do

...

the

...

following:

...

    1. un-escaped special XML characters need to be turned into XML entities - &, <, >, ', " Should be done in groovy or ruby script.
    2. XML entities need to be doubled (e.g.,

...

    1. in

...

    1. the

...

    1. ruby

...

    1. or

...

    1. groovy

...

    1. script)

...

    1. (Should

...

    1. be

...

    1. fixed

...

    1. in

...

    1. ruby

...

    1. script,

...

    1. replacing

...

    1. with

...

    1. &amp;

...

    1. )
    2. As well,

...

    1. you

...

    1. might

...

    1. look

...

    1. for

...

    1. dollar

...

    1. signs,

...

    1. which

...

    1. are

...

    1. triggers

...

    1. for

...

    1. macro

...

    1. interpolation.

...

    1. I

...

    1. didn't

...

    1. happen

...

    1. to

...

    1. run

...

    1. across

...

    1. any

...

    1. of

...

    1. those

...

    1. in

...

    1. the

...

    1. UCJEPS

...

    1. Person

...

    1. records,

...

    1. and

...

    1. so

...

    1. don't

...

    1. know

...

    1. first-hand

...

    1. how

...

    1. you

...

    1. might

...

    1. munge

...

    1. those.

...

    1. Maybe

...

    1. that

...

    1. is

...

    1. only

...

    1. an

...

    1. issue

...

    1. if

...

    1. the

...

    1. format

...

    1. is

...

    1. ${sometext},

...

    1. so

...

    1. this

...

    1. might

...

    1. not

...

    1. be

...

    1. an

...

    1. issue.

...

6.

...

Perform

...

the

...

import

...

(curl,

...

or

...

wrap

...

into

...

the

...

ruby-groovy

...

script).

...

Notes:

...

  • Currently,

...

  • import

...

  • performance

...

  • seems

...

  • to

...

  • slow

...

  • down

...

  • with

...

  • large

...

  • record

...

  • sets.

...

  •   Glen

...

  • splits

...

  • these

...

  • into

...

  • batches

...

  • of

...

  • 5k

...

  • to

...

  • 10k

...

  • records.

...

  • File

...

  • system

...

  • is

...

  • filling

...

  • up

...

  • also.