UCJEPS-CollectionSpace data mapping, v2.3
With this iteration, we are targeting the data migration mappings for the migration of SMaSCH data into a CollectionSpace instance, based on the version 2.3 release of CollectionSpace. Although there will be additional changes supported in the future, this is the major migration and launch release.
Cataloging
Core Specimen data
See attachment for accession mapping. The grid below is not complete.
Field mapping
SMaSCH |
CollectionSpace object schema |
CS User Interface Title (** = different from UI) |
Note |
accession.accession_id |
object_number |
Accession Number ** |
Change field title |
institution.inst_name |
collection |
Collection |
Dropdown (Collection values) |
accession.coll_num_prefix + coll_number + coll_num_suffix |
other_number |
Number |
Moved to Field Collection Number. |
"collector number" |
other_number_type |
Number Type |
Not needed if we move this to Field Collection Number. |
accession.notes |
comments |
Comments |
|
objkind.kind |
form (Object Description Information) |
Form |
Dropdown (Form values) |
taxon_fullname.fullname |
title |
Taxon ** |
Change field title |
accession.phenology (decoded) |
phase (Object Description Information) |
Phenology ** |
Change field title. Dropdown (Phenology values) |
accession.early_jdate (calc) |
field_collection_date_earliest |
Field Collection Date Earliest |
Custom |
accession.late_jdate (calc) |
field_collection_date_latest |
Field Collection Date Latest |
Custom |
accession.datestring |
field_collection_date |
Field Collection Date |
We are converting this to the structured date format |
committee.committee_abbr |
field_collection_collector |
Field Collection Collector |
|
accession.loc_lat_decimal |
field_loc_lat_decimal |
Field Location Latitude Decimal |
 |
accession.loc_long_decimal |
field_loc_long_decimal |
Field Location Longitude Decimal |
|
accession.loc_place (calc) |
field_collection_place (?) |
Field Location Verbatim** |
Will map to verbatim field in schema extension for now. When the Place Authority is available, we will work on creating authority records as well. |
accession.loc_county |
field_loc_county |
Field Location County |
Custom |
accession.loc_state |
field_loc_state |
Field Location State |
Custom |
accession.loc_country |
field_loc_country (temp custom) |
Field Location Country |
Custom |
accession.loc_elevation |
field_loc_elevation (temp custom) |
Field Location Country |
Custom |
accession.coll_num_prefix + coll_number + coll_num_suffix |
? |
Field collection number |
|
committee.committee_abbr |
? |
Field collection collector |
|
"catalog date" |
date_association |
Date Association |
Fixed text. Do not load into 1.0. May end up in Associated Date information eventually, or create date/modify date. |
accession.catalog_date |
catalog_date |
Date Text |
Fixed text. Do not load into 1.0. May end up in Associated Date information eventually, or create date/modify date. |
SQL query
Using three views to get subsets of the records, identical in their selects, except for the beginning of their accession number:
- cspace_collobj_uc_2_3b
- cspace_collobj_jeps_2_3b
- cspace_collobj_others_2_3b
These views make the minimal joins needed since refnames (collectors, collection number assignors, filed_in_cabinet genus) will be looked up using text files coming from earlier Talend jobs.Â
ETL
- Talend: GET_COLLOBJ_UC, GET_COLLOBJ_JEPS, GET_COLLOBJ_OTHERS
- Inputs
- cspace_collobj_uc/jeps/others views
- ucjeps.cspace.berkeley.edu: finds accessions already in CSpace (so we don't add those a second time)
- all org-collectors to get refnames for collectors (ucjepsorgs-all.txt -- this file created by combining two files by hand)
- all org-collectors to get refnames for collection number assignors (ucjepsorgs-all.txt -- this file created by combining two files by hand)
- all taxon-taxon records to get refnames for filed_in_cabinet values (smasch-taxonids.txt -- this file created by combining two files by hand)
- locations_from_smasch.txt -- a text file provided by Dick Moe that combines several locality fields in SMaSCH into one location field for each accession. Dick uses this to produce location data for CCH.
- Outputs
- objectsincspacealready-others.txt -- objects already in CSpace that will need to be updated separately
- collobj-others-add.xml -- main XML output to be merged with other files
- collobject-others-ids.txt -- text file with legacy_id, CSID, uri, and some refnames for bookkeeping
Notes
Controlled list values
See these links
- https://github.com/cspace-deployment/application/blob/ucjeps_2.3/tomcat-main/src/main/resources/tenants/ucjeps/local-collectionobject.xml
- https://github.com/cspace-deployment/application/blob/ucjeps_2.3/tomcat-main/src/main/resources/defaults/naturalhistory-collectionobject.xml
- https://github.com/cspace-deployment/application/blob/ucjeps_2.3/tomcat-main/src/main/resources/defaults/base-collectionobject.xml
Cataloging data sets for dropdowns and controlled lists
- Collection values:
- Jepson Herbarium
- University Herbarium, University of California, Berkeley (UC)
- University of California
- University of California- Santa Barbara
- Herbarium (UCSC)
- DHN
- Form values:
- Illustration
- Mounted on Paper
- Photocopy
- Photograph
- Stored in a Box or Bag
- Phase (Phenology) values:
- Cone
- Flowering
- Flowering/Fruiting
- Fruiting
- Spores/Sporangia
- Vegetative (non-reproductive)
- Loaned Object Status values:
- Active (Unknown)
- Active (All Out)
- Active (All In)
- Active (Partial)
- Active (Discrepancy)
- Cancelled (Unknown)
- Cancelled (All Out)
- Cancelled (All In)
- Cancelled (Partial)
- Cancelled (Discrepancy)
- Other Number Type values:
- Other herbaria accession number
- Copied from accession number
- Exsiccatae number
- Project number
- Genbank number
- Destructive sampling number
- Internal cross reference
- Miscellaneous
MERGE: Scientific Determination History (repeating within Cataloging schema)
Combines the name as filed and the annotation_history table from SMASCH.
Field mapping
See the determinations worksheet in the attached accessions mapping document
SQL
With some dates fixed, all determinations are now in cspace_determ_2_3.
Variant determinations: vname_id field recalculated to be "unverified"||vname_id in order to avoid collisions with taxon_id
ETL
Talend job MERGE_COLLOBJ_DETERMINATIONS (0.2)
Run job three times to modify SQL on view to get subsets: UC, JEPS, OTHER. These will map against Collection Object subsets.
JEPS:
where object_number like 'JEPS%'
order by object_number, legacy_id
UC:
where substring(object_number, 1, patindex('%[A-Z][0-9]%', object_number)) = 'UC'
order by object_number, legacy_id
OTHER:
where substring(object_number, 1, patindex('%[A-Z][0-9]%', object_number)) in ('UCSC', 'UCLA', 'UCSB', 'DHN')
order by object_number, legacy_id
Inputs
- org_determiners (org-determiners vocabulary refnames)
- all_taxon (text file with default taxonomy refnames) and taxon_unverified (text file with unverified/verified taxonomy refnames) are merged (tUnite)
Notes
View modified to remove time from aa.determined_date.
MERGE: Type Kind information (repeating within Cataloging schema)
Field mapping
See the type kind worksheet in the attached accessions mapping document
SQL
Using this query with view
SELECT legacy_id spectype_id, rtrim(object_number) accession_id, type_kind, REFERENCE, figure, determined_by_id, determined_by, notes, basionym_id, verified_basionym FROM dbo.cspace_spectype_2_3 where object_number <> '' order by object_number, legacy_id desc
create view cspace_spectype_2_3 as
select
   ti.typeinfo_id as legacy_id,
   rtrim(ti.accession_id) as object_number,
   tk.kind as type_kind,
   case
       when ti.place_pub is not null and ti.place_pub != ''
               and ti.year_pub is not null and ti.year_pub not in ('', '0', 'unkn')
           then ti.place_pub + ', ' + ti.year_pub
       when (ti.place_pub is null or ti.place_pub = '')
               and ti.year_pub is not null and ti.year_pub not in ('', '0', 'unkn')
           then year_pub
       else null
   end as reference,
   ti.figure as figure,
   ti.typ_committee_id as determined_by_id,
   c.committee_abbr as determined_by,
   case
       when ti.notes is null and v.descr is not null
           then 'Type Voucher: ' + v.descr
       when ti.notes is null and v.descr is null
           then null
       when ti.notes is not null and v.descr is not null
           then ti.notes + '; Type Voucher: ' + v.descr
       when ti.notes is not null and v.descr is null
           then ti.notes
   end as notes,
   ti.basionym_id as basionym_id,
   tf.fullname as verified_basionym
from
   types_info ti,
   typekind tk,
   committee c,
   voucher v,
   taxon_fullname tf
where ti.typekind_id *= tk.typekind_id
and ti.typ_committee_id *= c.committee_id
and ti.accession_id *= v.accession_id
and v.vouchkind_id = 56
and ti.basionym_id *= tf.taxon_id
ETL
Talend job MERGE_COLLOBJ_SPECTYPE
Notes
MERGE: Text annotations (repeating within Cataloging schema)
Note that in SMaSCH, a table named voucher was used to capture text notes of various kinds. Many of these are being cleaned up and put in other places, but some will remain.
Field mapping
See vouchers worksheet in attached accession mapping document
SQL
SELECT Â Â Â object_number, Â Â Â annotation_type, Â Â Â annotation_note, Â Â Â annotation_date FROM Â Â Â dbo.cspace_anno_2_3 ORDER by object_number, annotation_date
create view cspace_anno_2_3 as select rtrim(v.accession_id) as object_number, vk.kind as annotation_type, v.descr as annotation_note, v.voucher_date as annotation_date from voucher v, vouchkind vk where v.vouchkind_id *= vk.vouchkind_id and v.vouchkind_id in (24, 29, 41, 56, 73, 78, 79) union select rtrim(ti.accession_id) as object_number, 'holotype location' as annotation_type, ti.inst_abbr as annotation_note, ti.mod_date as annotation_date from types_info ti where ti.inst_abbr is not null and ti.inst_abbr != '' union select aa.accession_id as object_number, 'additional taxa' as annotation_type, tf.fullname as annotation_note, aa.folder_name_date as annotation_date from annotation_asfiled aa, taxon_fullname tf where aa.addl_taxon_id *= tf.taxon_id and addl_taxon_id is not null and addl_taxon_id != 0
ETL
Using the SQL above directly for now. Rtrim the accession_id.
Talend job MERGE_COLLOBJ_ANNOTATIONS
Notes
MERGE: Hybrid Parents (repeating within Cataloging schema)
From annotation_asfiled table
Field mapping
See annotations tab in attached accessions mapping document
SQL
SELECT annotation_asfiled.accession_id, annotation_asfiled.parent_name_id, (case when parent_name_modifier_kind.taxon_modifier=' ' then null else parent_name_modifier_kind.taxon_modifier end) as parent_modifier FROM annotation_asfiled, parent_name_modifier_kind where annotation_asfiled.parent_name_modifier_id = parent_name_modifier_kind.taxon_modifier_id and parent_name_id <> 0 order by accession_id, parent_seqno
ETL
Talend job MERGE_COLLOBJ_HYBRIDPARENTS
Notes
MERGE: Other Numbers (repeating within Cataloging schema)
From accession and voucher tables
Field mapping
See voucher worksheet in attached accession mapping document.
SQL
SELECT Â Â Â legacy_id, Â Â Â other_number, Â Â Â other_number_type FROM Â Â Â dbo.cspace_othernum_2_3 ORDER by legacy_id
create view cspace_othernum_2_3 as select rtrim(v.accession_id) as legacy_id, v.descr as other_number, case when v.vouchkind_id = 45 then 'GenBank Code' when v.vouchkind_id = 55 then 'Other Label Number' when v.vouchkind_id = 71 then 'U.C. Botanical Garden Accession Number' end as other_number_type from voucher v where v.vouchkind_id in (45, 55, 71)
ETL
Talend job MERGE_COLLOBJ_OTHERNUM
Notes
45 = 'genbank code', 55 = 'other label numbers', 71 = 'U.C. Botanical Garden'
Merge collection objects and load
UPDATE: Collection Object - URI (done)
Use SQL to update collectionspace_core:uri for existing collection objects.
Update collection object records already in ucjeps.cspace (Phase II)
There are approximately 99 collection objects in ucjeps.cspace already. All are UC. Check to see if any have been modified in CSpace and update as needed.
Procedures
Insert loan object status values into dynamic controlled list
UCJEPS needs approximately 12 new values in a dynamic controlled list that is used by both loans in and loans out. These can be loaded via POST commands and should be ready to go.
Loans Out
Field mapping
See attachment for field mapping. This grid is outdated.
SMaSCH |
CollectionSpace loans out schema |
CS User Interface Title (** = different from UI) |
Note |
loan_event.uc_loan_num + loan_event.jeps_loan_num |
Loan out number |
 |
smasch has two types of loan identifiers. |
loan_event.curr_inst |
Borrower |
 |
Probably should keep this as borrowing institution (organization authority) |
loan_event:Â Â Â Â Â Â Â Â Â |
Borrower's contact |
 |
Probably should keep this as borrowing agent (person authority) |
loan_event.inhouse_notes + loan_event.noteworthy_inclusions |
Loan out note |
 |
 |
loan_event.loan_status + |
Loaned object status |
 |
It looks like this field is not currently active. smasch keeps track of two status types: |
Loan date? |
 |
Loan Out Date |
Does SMaSCH have a loan date? |
SQL query
view:
Notes
MERGE: Loan Out Items (repeating group within Loan Out procedure records)
From loan_accn_ids table
Field mapping
SQL
ETL
Notes
MERGE: Loan Taxonomy (repeating group within Loan Out procedure records)
From several fields in loan_event table
Field mapping
SQL
ETL
Notes
MERGE: Loan Out Returns (repeating group within Loan Out procedure records)
From return_detail table
Field mapping
SQL
ETL
Notes
MERGE: Loan Out Transfers (repeating group within Loan Out procedure records)
From transfer_event table and possibly from loan_event table (orig_inst, orig_for_lname, orig_for_fname, orig_for_position)
Field mapping
SQL
ETL
Notes
Merge loan out information and load
Object to Loan Out Relationships (Phase II)
These will be added as detail items. Relationship records might not be needed.
Relationships between UC and JEPS loans that have the same loan_id
Relationship records need to be created between separate loan out records to handle UC and JEPS loans that have the same loan_id.
Field mapping
SQL
ETL
Notes
Borrows (Loans In)
Field mapping
See attachment for field mapping.
SQL
Notes
MERGE: Loan In Taxonomy (repeating group within Loan In procedure records)
From several fields in borrow table
Field mapping
SQL
ETL
Notes
MERGE: Loan In Returns (repeating group within Loan In procedure records)
From borrow_returns table
Field mapping
SQL
ETL
Notes
MERGE: Loan In Transfers (repeating group within Loan In procedure records)
From several fields in borrow_xfer_out table
Field mapping
SQL
ETL
Notes
Merge loans in information and load
Handle unmatched borrow_return records
There are 20 records in borrow_return whose borrow_id does not match with one in the main borrow table. They look like they could match on lender_loan_num and institution. It will probably be easiest to update these by hand.
Handle unmatched borrow_xfer_out records
There are3 records in borrow_xfer_out whose borrow_id does not match with one in the main borrow table. They look like they could match on lender_loan_num and institution. It will probably be easiest to update these by hand.
Location (Location and Movement procedure)
Use annotation_asfiled.collectionkind_id (Name of Collection, e.g., Main, Reference, Type) to indicate storage of specimen
Field mapping
SQL
ETL
Notes
Object-Location Relationship records
Field mapping
SQL
ETL
Notes
Vocabularies (load these first)
Person vocabulary (default person-person)
Field mapping
See agents worksheet in authorities mapping file for current field mapping.
SQL query
View: cspace_person_2_3.
Note: View modified to exclude the persons from loans and borrows. Those are now handled separately.
ETL
Talend job GET_PERSON_ADDS adds new person records.
Updates to be found later.
Obsolete 5/18: Talend job diff_person3 compares SMaSCH view (cspace_persons_2_0) to ucjeps.cspace (persons_common) on legacy_id, displayname, bio_notes, birthdate, and deathdate. May 12: Found 3 new records and 2 updates. This can also be confirmed by viewing person table in SMaSCH in table order, noting the records added or updated in reverse chronological order.
Talend job GET_PERSON_V1 (Hoffman) was used to load data initially:
- Input: Query view
- See  SMaSCH data mapping notes for some special data handling
- Output: XML file for loading (batches of 3000)
- Output: List of CSIDs (to facilitate deleting)
- Output: CSV file of basic identifiers (for record keeping and as input to one of the Org loading jobs).
- 8945 rows
Notes
Status: SMaSCH persons records already loaded, plus persons derived from borrow and loan tables. However, these might have been updated: Person records might be updated, and borrow and loan records might be added or changed. Person records last queried from SMaSCH on 3/8/2012 at 9:32.
Creates URI
5/18/2012
New view persons_2_3 includes people from borrows, loans, transfers, etc, so new approach required.
Excluding records where forename and surname are both null.
Too many non-unique names picked up by the select distinct statements for the trans_person table. Need to fix.
Those records are static from transaction tables, so they will have to be refreshed when it's time to switchover!
ADD records can be identifying by comparing legacy_id in SMaSCH view to person_common.shortidentifier in CSpace.
UPDATES can be found two ways: 1) using my person_diff approach AFTER new records have been added or 2) doing a table scan on smasch.person and looking in database order.
ucjepspersonids-all.txt is used by many other jobs. Create it by concatenating the ID's output from original load (in CRH archive2 directory) with new adds from this batch.
UPDATE: Person - URI (done)
Existing person records need collectionspace_core:uri. REST update does not work on the collectionspace_core schema, so this has to be done in SQL.
Delete loan persons loaded already into ucjeps.cspace
See if any are used already. Can use short identifiers in a certain range.
Add person records parsed from various loans and borrows tables
In SMaSCH, there are several unstructured person fields. These are being parsed and then loaded as unique person authority records.
Organization-Collectors vocabulary (default organization vocabulary)
From SMaSCH committee table where committee_func= "coll"
Field mapping
See attachment (agents worksheet in authority mapping file) for current field mapping
SQL query
Use view: cspace_org_2_0 but limit to func="coll"; link to committee table to get collection number assignor (links back into committee where func=coll) and data source and to limit the query to get collection committee records added since last load.
ETL
Talend job GET_COLLECTOR_ADDS, just finds adds.Â
Talend job GET_COLLECTOR_UPDATES finds 10 records to update (by hand presumably, output format is for Import Service do not run).
Talend job GET_ORG_V2 (Hoffman) was used to load initially:
- See SMaSCH data mapping notes for some special handling
- Input: Query view
- Input: ucjepspersonout-ids.txt from GET_PERSON_V1 (to get refnames of organization contacts from person authority)
- Output: XML file for loading (batches of 4000)
- Output: List of CSIDs (to facilitate deleting)
- Output: CSV file of basic identifiers (for record keeping).
- 49845 rows
Talend job (Cheng):
- Input: Text file from Dick Moe, non-colliding-collectors.in, containing names of seaweed collectors
- 1546 rows
- Already loaded into ucjeps.cspace
Notes
Need to update existing collector committees also with collection number assignor
Creating URI in ADD and UPDATE
UPDATE: Organization - URI (done)
Existing organization records need collectionspace_core:uri. REST update does not work on the collectionspace_core schema, so this has to be done in SQL.
UPDATE: Organization - collection number assignors
Update existing collectors with their collection number assignor from SMaSCH. REST api.
Organization-Nomenclature Committee vocabulary (organization-nomenclature vocabulary)
From SMaSCH committee table where committee_func= "nomen".
Field mapping
See notes on Organization-Collectors above.
SQL query
ETL
Talend job GET_ORG_NOMEN, gets all nomenclature committees and adds to new org-nomenclature vocabulary
Notes
Creates URI
Need to remove existing nomenclature records from default organization vocabulary and update refnames as needed.
Organization-Determination Committee vocabulary (organization-determination vocabulary)
From SMaSCH committee table where committee_func= "annot".
Field mapping
See notes on Organization-Collectors above.
SQL query
select legacy_id, display_name, long_name, short_name, foundation_place, contact_name, org_function, substring(contact_name, (patindex('%SMASCH%', dbo.cspace_org_2_0.contact_name)+6), ((patindex('%)%', dbo.cspace_org_2_0.contact_name))-(patindex('%SMASCH%', dbo.cspace_org_2_0.contact_name)+6))) contact_id, coll_num_person_id, data_src_name from dbo.cspace_org_2_0, dbo.committee, data_source where legacy_id=convert(varchar, committee_id) and dbo.committee.data_src_id=data_source.data_src_id and committee_func = 'annot
ETL
Talend job GET_ORG_DETERMINATION gets all determination committees for adding to new org-determination vocabulary
Notes
Creating URI
Need to remove existing determination records from default organization vocabulary and update refnames as needed.
Organization-Type Assertion Committee vocabulary (organization-typeassertion vocabulary)
From SMaSCH committee table where committee_func= "typ".
Field mapping
See notes on Organization-Collectors above.
SQL query
select legacy_id, display_name, long_name, short_name, foundation_place, contact_name, org_function, substring(contact_name, (patindex('%SMASCH%', dbo.cspace_org_2_0.contact_name)+6), ((patindex('%)%', dbo.cspace_org_2_0.contact_name))-(patindex('%SMASCH%', dbo.cspace_org_2_0.contact_name)+6))) contact_id, coll_num_person_id, data_src_name from dbo.cspace_org_2_0, dbo.committee, data_source where legacy_id=convert(varchar, committee_id) and dbo.committee.data_src_id=data_source.data_src_id and committee_func = 'typ
ETL
Talend job GET_ORG_TYPEASSERTION gets all committees of function "typ" for loading into new org-typeassertion vocabulary
Notes
Creates URI
Need to remove existing type assertion records from default organization vocabulary and update refnames as needed.
Clean up organization-organization (collectors) vocabulary in CSpace
Remove nomen, determ, type, and inst values loaded previously.
Organization-Institution vocabulary (organization-institute vocabulary)
Field mapping
See the agents worksheet in the authorities mapping attachment
Use the institute table
SQL query
Access the institute table directly, including the contact schema fields.
Might need to transform the short identifiers and refnames for institutes whose abbreviation starts with an asterisk. No, they seem to act OK.
SELECT dbo.institute.inst_abbr, dbo.institute.inst_name, dbo.institute.inst_address1, dbo.institute.inst_address2, dbo.institute.inst_city, dbo.institute.inst_state, dbo.institute.inst_zipcode, dbo.institute.inst_country, dbo.institute.location, dbo.institute.exch_bal_total, dbo.institute.exch_bal_vasc, dbo.institute.exch_bal_myc, dbo.institute.exch_bal_phyc, dbo.institute.exch_bal_bry, dbo.institute.exch_bal_types, dbo.institute.exch_bal_correction, dbo.institute.areas_they_want, dbo.institute.taxa_they_want, dbo.institute.areas_they_have, dbo.institute.taxa_they_have, dbo.institute.swap_for_lit, dbo.institute.avg_desid, dbo.institute.avg_specs, dbo.institute.phone, dbo.institute.FAX, dbo.institute.URL, dbo.institute.email FROM dbo.institute
ETL
Talend job GET_ORG_INSTITUTION creates XML outputs for org-institution vocabulary and contacts schema.
Notes
Creating URI in Institution and Contact
Need to remove existing institution records from default organization vocabulary and update refnames as needed.
Add institutions parsed from loans and borrows tables
There are approximately 12 institutions in various loans and borrows tables whose abbreviation does not match any in the institute table.
Common Names vocabulary (taxon-common vocabulary)
Note: Load before taxon-taxon vocabulary
Use the common_name table.
Create collectionspace_core:URI
Field mapping
See the taxonomy worksheet in the attached authorities mapping attachment
SQL
SELECT common_name_id, name common_name, dbo.common_name.data_src_id, dbo.data_source.data_src_name FROM dbo.common_name, dbo.data_source where dbo.common_name.data_src_id = dbo.data_source.data_src_id order by common_name_id
ETL
Talend job GET_TAXONOMY_COMMON
Notes
Controlled list values for the Name Source field on taxonomy vocabularies (taxon-taxon, taxon-common, and taxon-unverified) are documented at (they are identical to the display value):
Taxonomy vocabulary (default taxon-taxon vocabulary)
NOTE: Need to load the taxon-common vocabulary first!
The taxon_name table is the primary source for this information. Some information coming from taxon_fullname and taxon_noauth_name. Note the repeating information for Taxonomy Authors and Common Names described below.
Note: Check to see if inAuth value is identical on ucjeps2 and ucjeps.cspace. They are: 87036424-e55f-4e39-bd12
Field mapping
See the taxonomy worksheet in the authorities mapping attachment
SQL
Using the view cspace_taxonomy_2_3.
ETL
Talend job GET_TAXONOMY_V1 (0.2)
Input: basionym_lookup.txt (replace with all_taxon?)
Notes
Basionym values handled (skipped if 0 or equal to taxon_id).
Controlled list values for the Name Source field on taxonomy vocabularies (taxon-taxon, taxon-common, and taxon-unverified) are documented in
UPDATE: Taxon - URI
Existing taxonomy records need collectionspace_core:uri. REST update does not work on the collectionspace_core schema, so this has to be done in SQL.
9292 records have a valid basionym
MERGE: Common Names for Taxonomic Terms (repeating within Taxonomy schema for taxon-taxon vocabulary)
Field mapping
See the taxonomy worksheet in the authorities mapping attachment
Use the smasch_common_names table.
Get Common Name refnames from the taxon-common output
SQL
SELECT taxon_id, common_name_id FROM dbo.smasch_common_names order by taxon_id
ETL
Talend job MERGE_TAXON_COMMON
Notes
MERGE: Authors for Taxonomic Terms (repeating within Taxonomy schema for taxon-taxon vocabulary)
Field mapping
See the taxonomy worksheet in the authorities mapping attachment.
From taxon_name table (or a new view)
Get author refnames from the org-nomenclature output
SQL
SELECT taxon_id, author_id, 'author' author_type FROM dbo.taxon_name where author_id > 0 union SELECT taxon_id, ascr_auth_id, 'ascribed author' author_type FROM dbo.taxon_name where ascr_auth_id > 0 union SELECT taxon_id, p_author_id, 'parenthetical author' author_type FROM dbo.taxon_name where p_author_id > 0 union SELECT taxon_id, p_ascr_auth_id, 'parenthetical ascribed author' author_type FROM dbo.taxon_name where p_ascr_auth_id > 0 order by taxon_id, author_type
ETL
Talend job MERGE_TAXON_AUTHORS
Notes
Taxonomy Hierarchical Relations
Field mapping
See taxonomy worksheet in authorities mapping file.
Use taxon_name fields (taxon_id and parent_id) to create hierarchical relationship records. Get their refnames from the taxon-taxon output.
SQL
Notes
Merge default taxonomy information and load
Unverified Taxonomic Names vocabulary (taxon-unverified vocabulary)
Field mapping
See the taxonomy worksheet in the authorities mapping attachment
Use variant_taxon_name table.
SQL
SELECT vname_id, variant_name, dbo.variant_taxon_name.data_src_id, dbo.data_source.data_src_name FROM dbo.variant_taxon_name, dbo.data_source where dbo.variant_taxon_name.data_src_id = dbo.data_source.data_src_id order by vname_id
ETL
Talend job GET_TAXONOMY_UNVERIFIED
Notes
Changing legacy_id to "unverified" + vname_id so that taxon-variant and taxon-taxon lookup files can be combined for MERGE_COLLOBJ_DETERMINATIONS