UCBG-CollectionSpace data mapping, v2.0
With this iteration, we are targeting the data migration mappings for the migration of SAGE data into a CollectionSpace instance, based on the version 2.0 release of CollectionSpace. This is an interim release Changes from the previous mapping are in blue.
Person
Field mapping
SQL
View: cspace_person_2_0
Notes
- Temporarily excluding shortIdentifers 0, 60397 because they are not unique in the view. They are associated with multiple Orgs (which are going into the Groups field). Break into a repeating group and merge in separately.
Talend/ETL
bgimportperson1 0.2: Revising earlier talend job (based on 1.8) to change refname format, remove ampersand handling, and include collectionspace_core:uri element
Organization
Field mapping
SQL
View:
Notes
- Skipping for now
- Multiple records per org because many orgs have multiple contacts
- Probably need to extend Org to have repeating PersonContacts information
Talend/ETL
Taxonomy
Field mapping
SQL
View: cspace_taxonomy
Notes
- Not getting author shortIDs in view, just names. For now putting them in Taxon Notes. Adjust view or get them separately in SQL.
- Do we need to adjust the format of scientific names?
- Lots of other Sage data for Names to add
Talend/ETL
bgtaxonimport2 0.2
Accessions
Field mapping
SQL
View: xdb_accession_qv
Notes
- Need dropdown values for form
- Dead Flag Using expression (row1.dead_flag==0) ? "no" : "yes" but should be "alive" "dead"
- Need to add dead_date, structured
- Need to add accession date
- Need hybrid flag
- Need requested by
- Is Source from a vocab? Need field (wait for Donor block being added for UCJEPS)
- Add More Notes to Accession Notes
- Geog location is often Earth so we know geography needs some work
- For now, putting Source in Brief Description along with Material type
- Put Accession Notes in Comments: Rename field
- What is Plant Status?
- What is Data Quality?
- record_status drop down "1"
- Do we need to fix the form of taxonomic names?
Talend/ETL
bgaccessionimport 0.1
Garden Locations
Field mapping
SQL
View: cspace_location
Notes
- Type (storage location type) field expects a dynamic controlled list with URN-formatted values. For now just loading the raw data fields from the locationType field. Nothing will display.
Talend/ETL
bgimportgardenlocation 0.1
Plant Locations (Movement and Location procedure)
Field mapping
SQL
View: xdb_plant_location_qv
Notes
- xdb_plant_location_qv.action_code map to Reason For Move, and rename? Right now it's a numeric code in Sage. What are decode values? Ignoring for now.
Talend/ETL
bgplantlocations 0.1
Relationships between Accessions (Cataloging) and Plant Locations (Movement)
Field mapping
SQL
Text file: plantlocids.txt, produced in Talend bgplantlocations job
Notes
- Text file has CSID from Plant Location record and CSID from Accession record
Talend/ETL
bgplantlocations 0.1
- Talend job creates two XML outputs (cataloging-to-movement, movement-to-cataloging)
Propagation
Field mapping
SQL
Control lists:
- Pot Type: select pot_type from pot_type_code order by pot_type;
- Pot Size: select pot_size from pot_size_code order by pot_size;
- Treatment Type: select treatment_type_name from treatment_type order by treatment_type_name;
Notes
Prop Type:
- lookup table: prop_type_code
- propagation.prop_type = pro_type_code.prop_type
- 21 values in prop_type_code
Reason for Prop: propagation.purpose
- database has a rule that limits the values to: 'class use', 'research', 'garden collection',
'distribution', 'conservation' - 5 distinct values in propagation.purpose
Extra Seeds: propagation.extra_seed_flag
- boolean (0,1)
Activity Type:
- lookup table: activity_type_code
- activity_type_code.activity_type where growth_history.activity_type_id = activity_type_code.activity_type_id
- 24 values in activity_type_code
Medium:
- lookup table: medium
- growth_history.medium_name = medium.medium_name
- 1302 values in medium
Pot Type:
- lookup table: pot_type
- growth_history.pot_type = pot_type_code.pot_type
- 35 values in pot_type_code
Pot Size:
- lookup table: pot_size_code
- pot_size_code.pot_size where growth_history.pot_size_id = pot_size_code.pot_size_id
- 69 values in pot_size_code
Treatment Type:
- lookup table: treatment_type
- treatment_type.treatment_type_name where treatment.treatment_type_id = treatment_type.treatment_type_id
- 91 values in treatment_type
Concentration: treatment.concentration
- varchar(10) field
- no rules or constraints on this field
- 80 distinct values in treatment.concentration