This page contains information about data ETL work for PAHMA's deployment of CollectionSpace. |
Please see our current list of questions and meeting notes.
Total fields in TMS and TMSThesaurus: 3242 (3111 + 131)
Fields to be mapped to CSpace from TMS and TMSThesaurus: 763 (699 + 64) or less
Plan is to do data migration in three phases:
Job |
Description |
Inputs |
Outputs |
Notes |
---|---|---|---|---|
tms_obj_v0.1_all.ktr |
Object records first job |
11 TMS queries |
text |
CSpace 1.3 schema. For Object records run this job first. See merge_object_data.ktr for notes on sequence for running object transformations, starting with tms_obj_v0.1_all (collection site is a job with 3 transformations) |
tms_obj_collsite1.ktr |
Object records -- collection site job#1 |
3 TMS queries (object site, provenance from object, textentries) |
matching text file & non-mtching tex tfile |
Extract sitename/sitenote/provenance and merge with textentry places |
tms_obj_collsite2.ktr |
Object records -- collection site job#2 |
3 Thesaurus files (tmsthesaurus..terms, tms..thesxrefs, tmsthesaurus..termmaster) |
text file |
Take records failed to match with textentry places (last step, tms_obj_collsite1.ktr) and merge them with thesaurus lookup (provenance & card header geog info) |
tms_obj_collsite3.ktr |
Object records -- collection site job#3 |
2 text file |
text file |
Merge the last two steps to create a single collection site file with objectid, sitename & sitenote |
coll_site_job.kjb |
Sequences "collection site" job runs |
3 Kettle jobs for "collection site" |
resulting file from tms_obj_collsite3.ktr |
Enforce the 3 jobs for object "collection site" to be run in sequential manner |
tms_obj_collector.ktr |
Object records -- collector |
4 TMS queries (object collector, collection date, field collection number, date) |
text file "tms_obj_collector" |
Extract/merge to create file w/ objectid, collectorRefName, collectionDate, fieldnum, collectionnote |
tms_obj_production.ktr |
Create object records' production info |
4 TMS queries (production person/org, date, reason-tourist, place-minting) and TSM Thesaurus of culture term |
text file |
Merge object production person/org RefName/role with production date (made/photo/pub/mint/ruler), culture, reason and minting place |
merge_object_data.ktr |
Creates final Object records |
4 text files (main object, collection site, collector, production) |
text |
run this after creating: |
reference.ktr |
Creates object's reference string |
3 TMS queries (referencemaster, refobjxref, object) and a CSID text file |
text |
Creating "ref" by concatenating title/subtitle/placepublished/copyright... in referencemaster & refobjxref DBs, then merge/join w/ "ref" (created by normalizing pubreference & bibliography) from "object" DB. At last step, merge in CSID.(link key objectid=tmsID). |
tms_inscription.ktr |
Creates objects inscription information |
3 TMS queries (TextEntries, Constituents, object) and a CSID text file |
text |
Merge object's (1st/single) TextTypeID/TextEntry (i.e.Inscription Content)/Inscriber/MTB_dated/MTB_Language/MTB_Interpretation/MTB_...) with CSID (link key objectid=tmsID). |
media.ktr |
Creates Media Handling records |
|
text |
CSpace 1.6 schema. |
acq_xml.ktr |
Creates Acquisition records (PAHMA accessions) |
8 TMS queries, 1 txt file of accession methods from another job |
XML file (/home/sstone/cspace_general/import/acq_import.xml), CSID output (/tmp/acq_csids) |
Note: Collector refname should be collecting event here. |
loan_in.ktr |
Creates Loan In records |
|
text |
CSpace 1.3 schema |
loan_out.ktr |
Creates Loan Out records |
|
text |
CSpace 1.3 schema |
loc_move.ktr |
Creates Location and Movement Control records |
|
text |
CSpace 1.3 schema |
object_exit.ktr |
Creates Object Exit records |
|
text |
CSpace 1.3 schema |
|
|
|
|
|
obsolete jobs |
|
|
|
|
person_authority.ktr |
Creates Person authority records |
3 TMS queries (person, birth, death) |
person.txt (7012 records in file dated Feb 11, 2011) |
Mapping is pretty good. Legacy data has many duplicates with alternative spellings. MTB has created a field that identifies the preferred name for each record. We should take advantage of that to help create synonym relationships. Job can select distinct Salutation and NameTitle values too. |
org_authority.ktr |
Creates Organization authority records |
2 TMS queries (org, foundation date) |
org.txt (1577 records in file dated Feb 14, 2011) |
Mapping is pretty good. Legacy data has many duplicates with alternative spellings. MTB has created a field that identifies the preferred name for each record. We should take advantage of that to help create synonym relationships. Can produce Org Functions values too. |