UCBG-CollectionSpace data mapping, v2.4
With this iteration, we are targeting the data migration mappings for the migration of SAGE data into a CollectionSpace instance, based on the version 2.4 release of CollectionSpace. This launch is intended to support the full migration of the UC Botanical Garden SAGE system into a production CollectionSpace system. Changes from the previous mapping are in blue.
Authorities
Person (default person vocabulary) - Lam (SQL ready), Talend dev (Yuteh)
Tentatively, we will have one person vocabulary, mapping the agents from SAGE that are also in the persons table.
There might be some need to produce merge parts, for instance if we need to create, preferred and non-preferred terms for individual records.
Field mapping
SQL
select    p.person_id as short_identifier,    an.name as display_name,    p.birth_date as birth_date,    p.death_date as death_date,    datepart(year, p.birth_date) as birth_date_year,    datepart(month, p.birth_date) as birth_date_month,    datepart(day, p.birth_date) as birth_date_day,    case when p.birth_date is null then null        else str_replace(convert(char(10), p.birth_date, 102), '.', '-')            + 'T00:00:00Z'    end as birth_date_Esclr,    case when p.birth_date is null then null        else str_replace(convert(char(10), p.birth_date, 102), '.', '-')            + 'T23:59:59Z'    end as birth_date_Lsclr,    datepart(year, p.death_date) as death_date_year,    datepart(month, p.death_date) as death_date_month,    datepart(day, p.death_date) as death_date_day,    case when p.death_date is null then null        else str_replace(convert(char(10), p.death_date, 102), '.', '-')            + 'T00:00:00Z'    end as death_date_Esclr,    case when p.death_date is null then null        else str_replace(convert(char(10), p.death_date, 102), '.', '-')            + 'T23:59:59Z'    end as death_date_Lsclr,    p.person_remark as name_note from    person p,    agent_name an where p.person_id = an.agent_id and an.name_type = 'pref' and p.person_id > 0
Notes
- Â
Talend/ETL
bgimportperson1 0.2: Revising earlier talend job (based on 1.8) to change refname format, remove ampersand handling, and include collectionspace_core:uri element
New Talend job by Yuteh: BG_person 0.1
MERGE: Person Terms (repeating group) - Lam (SQL ready), Talend job (yuteh)
E.g., Different name forms (preferred, alias)
Field mapping
SQL
-- kewabbr as preferred names when they are duplicates of preferred name select    an.agent_id as short_identifier,    an.name as display_name,    an.name as term_name,    an.name_type as term_type,    'accepted' as term_status,    'English' as language,    an_source.name as source,    p.data_source_id as source_id,    p.prefix as title,    p.first_name as forename,    p.middle_name as middle_name,    p.last_name as surname,    p.suffix as name_addition,    case        when p.first_name is null and p.middle_name is null            then substring(p.last_name, 1, 1)        when p.first_name is not null and p.middle_name is null            then substring(p.first_name, 1, 1) + substring(p.last_name, 1, 1)        when p.first_name is null and p.middle_name is not null            then substring(p.middle_name, 1, 1) + substring(p.last_name, 1, 1)        else substring(p.first_name, 1, 1) +            substring(p.middle_name, 1, 1) +            substring(p.last_name, 1, 1)    end as initials,    1 as preferred_fg from    agent_name an,    agent_name an2,    person p,    agent_name an_source,    agent_name_role anr where p.person_id = an.agent_id and an.agent_id = an2.agent_id and an2.name_type = 'pref' and an.name_type = 'kewabbr' and an.name = an2.name and p.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' and p.person_id > 0 union -- preferred names except those that are duplicates with a kewabbr select    an.agent_id as short_identifier,    an.name as display_name,    an.name as term_name,    an.name_type as term_type,    'accepted' as term_status,    'English' as language,    an_source.name as source,    p.data_source_id as source_id,    p.prefix as title,    p.first_name as forename,    p.middle_name as middle_name,    p.last_name as surname,    p.suffix as name_addition,    case        when p.first_name is null and p.middle_name is null            then substring(p.last_name, 1, 1)        when p.first_name is not null and p.middle_name is null            then substring(p.first_name, 1, 1) + substring(p.last_name, 1, 1)        when p.first_name is null and p.middle_name is not null            then substring(p.middle_name, 1, 1) + substring(p.last_name, 1, 1)        else substring(p.first_name, 1, 1) +            substring(p.middle_name, 1, 1) +            substring(p.last_name, 1, 1)    end as initials,    1 as preferred_fg from    agent_name an,    person p,    agent_name an_source,    agent_name_role anr where p.person_id = an.agent_id and an.name_type = 'pref' and p.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' and p.person_id > 0 and an.agent_name_id not in (    select an_pref.agent_name_id    from person p, agent_name an_pref, agent_name an_kew    where p.person_id = an_pref.agent_id    and an_pref.agent_id = an_kew.agent_id    and an_pref.name_type = 'pref'    and an_kew.name_type = 'kewabbr'    and an_pref.name = an_kew.name) union -- alternate names that are not duplicates of pref name select    an.agent_id as short_identifier,    an.name as display_name,    an.name as term_name,    an.name_type as term_type,    'accepted' as term_status,    'English' as language,    an_source.name as source,    p.data_source_id as source_id,    p.prefix as title,    p.first_name as forename,    p.middle_name as middle_name,    p.last_name as surname,    p.suffix as name_addition,    case        when p.first_name is null and p.middle_name is null            then substring(p.last_name, 1, 1)        when p.first_name is not null and p.middle_name is null            then substring(p.first_name, 1, 1) + substring(p.last_name, 1, 1)        when p.first_name is null and p.middle_name is not null            then substring(p.middle_name, 1, 1) + substring(p.last_name, 1, 1)        else substring(p.first_name, 1, 1) +            substring(p.middle_name, 1, 1) +            substring(p.last_name, 1, 1)    end as initials,    0 as preferred_fg from    agent_name an,    agent_name an2,    person p,    agent_name an_source,    agent_name_role anr where p.person_id = an.agent_id and an.agent_id = an2.agent_id and an2.name_type = 'pref' and an.name_type != 'pref' and an.name != an2.name and p.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' and p.person_id > 0 order by short_identifier, preferred_fg desc;
Notes
Talend/ETL
New Talend job by Yuteh: BG_personTerm 0.1
Organization (default Org vocabulary will be UCBG institutions) - Lam (SQL ready), Talend dev (yuteh)
We will have the following three organization vocabularies:
- organizations (institutions from SAGE organizations plus the voucher institutions)
- collectors (coming from the separate SAGE table)
- groups (mostly KEW names for nomenclature)
Field mapping
SQL
select convert(varchar(5), org_id) as short_identifier, org_name as display_name, org_remark as notes from organization union select distinct voucher_number as short_identifier, case when voucher_name is null then voucher_number||' (voucher institution)' else voucher_name end as display_name, null as notes from voucher_number_code
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_org 0.1
MERGE: Organization Terms (repeating group) - Lam (SQL ready), Talend dev (Yuteh)
Field mapping
SQL
select    convert(varchar(5), o.org_id) as short_identifier,    an.name as display_name,    an.name_type as term_type,    'accepted' as term_status,    an.name as term_name,    'English' as term_language,    an_source.name as term_source,    an_source.agent_id as term_source_id,    an.name as main_body_name,    o.dept_unit as additions_to_name,    1 as preferred_fg from    organization o,    agent_name an,    agent_name an_source,    agent_name_role anr where o.org_id = an.agent_id and an.name_type = 'pref' and o.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' union select    convert(varchar(5), o.org_id) as short_identifier,    an.name as display_name,    an.name_type as term_type,    'accepted' as term_status,    an.name as term_name,    'English' as term_language,    an_source.name as term_source,    an_source.agent_id as term_source_id,    an.name as main_body_name,    o.dept_unit as additions_to_name,    0 as preferred_fg from    organization o,    agent_name an,    agent_name an2,    agent_name an_source,    agent_name_role anr where o.org_id = an.agent_id and an.agent_id = an2.agent_id and an.name_type != 'pref' and an2.name_type = 'pref' and an.name != an2.name and o.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' union select voucher_number as short_identifier, case when voucher_name is null then voucher_number||' (voucher institution)' else voucher_name end as display_name, 'pref' as term_type, 'accepted' as term_status, case when voucher_name is null then voucher_number||' (voucher institution)' else voucher_name end as term_name, 'English' as term_language, null as term_source, null as term_source_id, case when voucher_name is null then voucher_number||' (voucher institution)' else voucher_name end as main_body_name, null as additions_to_name, 1 as preferred_fg from voucher_number_code order by short_identifier, preferred_fg desc
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_orgTerm 0.1
MERGE: Organization Associated Persons (repeating group) - Lam (SQL ready), Talend dev (Yuteh)
Field mapping
SQL
-- need to verify person_id = 0 records. select    op.org_id as legacy_org_id,    op.person_id as legacy_affperson_id,    an.name as affiliated_person,    'urn:cspace:botgarden.cspace.berkeley.edu:personauthorities:name(person):item:name(' + convert(varchar(10), op.person_id) + ')'''+ an.name + ''''    as affiliated_person_refname,    'member' as affiliated_person_type from    org_person op,    agent_name an where op.person_id = an.agent_id and an.name_type = 'pref'
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_orgAssocPsn 0.1
Collectors (Organization vocabulary) - Lam (SQL ready), Talend dev (Yuteh)
Note: Only one name in SAGE so no repeating groups
Field mapping
SQL
select    cc.collector_id as short_identifier,    cc.collector_name as display_name,    cc.collector_name as term_name,    'English' as term_language,    'pref' as term_type,    'accepted' as term_status,    cc.collector_name as main_body_name,    1 as preferred_fg from collector_code cc where collector_id != 1 order by cc.collector_name
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_collector 0.1
Groups (Organization Vocabulary) - Lam (SQL ready), Talend dev (Yuteh)
Field mapping
SQL
select    group_id as short_identifier,    group_name as display_name,    group_remark as notes from    group_master order by group_id
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_group 0.1
MERGE: Groups Terms (repeating group) - Lam (SQL ready), Talend dev (Yuteh)
Field mapping
SQL
-- kewabbr group names that are duplicates of pref group names select    gm.group_id as short_identifier,    an.name as display_name,    an.name_type as term_type,    'accepted' as term_status,    an.name as term_name,    'English' as term_language,    an.name as term_source,    an.agent_id as term_source_id,    an.name as main_body_name,    1 as preferred_fg from    group_master gm,    agent_name an,    agent_name an2,    agent_name an_source,    agent_name_role anr where gm.group_id = an.agent_id and an.agent_id = an2.agent_id and an.name_type = 'kewabbr' and an2.name_type = 'pref' and an.name = an2.name and gm.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type= 'pref' and anr.role_type = 'datasrc' union -- pref group names except for pref names that are duplicates of kewabbr select    gm.group_id as short_identifier,    an.name as display_name,    an.name_type as term_type,    'accepted' as term_status,    an.name as term_name,    'English' as term_language,    an.name as term_source,    an.agent_id as term_source_id,    an.name as main_body_name,    1 as preferred_fg from    group_master gm,    agent_name an,    agent_name an_source,    agent_name_role anr where gm.group_id = an.agent_id and gm.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type= 'pref' and anr.role_type = 'datasrc' and an.name_type = 'pref' and an.agent_name_id not in (    select an_pref.agent_name_id    from group_master gm, agent_name an_pref, agent_name an_kew    where gm.group_id = an_pref.agent_id    and an_pref.agent_id = an_kew.agent_id    and an_pref.name_type = 'pref'    and an_kew.name_type = 'kewabbr'    and an_pref.name = an_kew.name) union -- non-pref group names that are not duplicates of pref name select    gm.group_id as short_identifier,    an.name as display_name,    an.name_type as term_type,    'accepted' as term_status,    an.name as term_name,    'English' as term_language,    an.name as term_source,    an.agent_id as term_source_id,    an.name as main_body_name,    0 as preferred_fg from    group_master gm,    agent_name an,    agent_name an2,    agent_name an_source,    agent_name_role anr where gm.group_id = an.agent_id and an.agent_id = an2.agent_id and an.name_type != 'pref' and an2.name_type = 'pref' and an.name != an2.name and gm.data_source_id = an_source.agent_id and an_source.agent_name_id = anr.agent_name_id and an_source.name_type = anr.name_type and an_source.name_type = 'pref' and anr.role_type = 'datasrc' order by short_identifier, preferred_fg desc;
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_groupTerm 0.1
MERGE: Groups Associated Persons (repeating group) - Lam (SQL ready), Talend dev (Yuteh)
Field mapping
SQL
--need to verify person_id = 0 records select    gp.group_id as legacy_org_id,    gp.person_id as legacy_affperson_id,    an.name as affiliated_person,    'urn:cspace:botgarden.cspace.berkeley.edu:personauthorities:name(person):item:name('        + convert(varchar(10), gp.person_id) + ')''' + an.name + ''''    as affiliated_person_refname,    'member' as affiliated_person_type from    group_person gp,    agent_name an where gp.person_id = an.agent_id and an.name_type = 'pref'
Notes
Talend/ETL
New Talend job by Yuteh: BG_org_groupAssocPsn 0.1
Taxonomy (default taxonomy vocabulary for verified, and possibly unverified, names) - Lam (SQL ready), Yuteh (Talend)
Note: This query requires a table be refreshed on SAGE, cspace_lexicon_cv.
Field mapping
SQL
-- records that do not have scope notes and are in taxon_name table select    convert(varchar(10), l.lexicon_id) as short_identifier,    l.term as display_name,    case        when l.node_type in (            'cl', 'div', 'f', 'fam', 'gen', 'or', 'sect', 'sp',            'subcl', 'subgen', 'subsect', 'subsp', 'var')        then l.node_type        else null    end as taxon_rank,    fam.term as family,    case when fam.term is null or fam.term = ''        then null        else            'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('            + convert(varchar(10), fam.ancestor_id) + ')''' + fam.term + ''''    end as family_refname, tn.notho_taxon as hybrid_fg,    case        when tn.access_code is not null then            case                when tn.access_code = 0 then 'Unrestricted'                when tn.access_code = 1 then 'Restricted'                when tn.access_code = 2 then 'Dead'            end        else            case                when l.access_code = 0 then 'Unrestricted'                when l.access_code = 1 then 'Restricted'                when l.access_code = 2 then 'Dead'            end    end as access_restrictions,    null as taxon_note                                                         from    lexicon l,    taxon_name tn,    (select tc.descendant_id, tc.ancestor_id, l.term        from trans_closure tc, lexicon l        where tc.ancestor_id = l.lexicon_id        and l.node_type = 'fam') fam where l.lexicon_id *= tn.taxon_id and l.lexicon_id *= fam.descendant_id and l.node_type not in ('common', 'bioname', 'na', 'root', 'cv') and l.lexicon_id not in (select lexicon_id from scope_notes) and l.lexicon_id not in (    select lexicon_id from lexicon_term_relation    where relation_type in ('TRV', 'TNA')) and lexicon_id not in ( 40017020, 40017429, 40031851, 40031856, 40031857, 40032337, 40012018, 40016672, 40010160, 40015069, 40015070, 40029985, 40014456, 40014457, 40014462, 40014463, 40030806, 40014465, 40014466, 202870, 40025519, 100064, 40012457, 40031859, 40013623, 20100090, 40032453, 40032457, 40034294, 40014967, 40014471, 40014472, 20201711, 20201785, 40014468, 40014469, 20100116, 40016773, 40005868, 40033585, 40033586, 40025813, 315305, 40014791, 40014792, 202729, 40032253, 40032254, 20100160, 40030847, 40032250, 40032251, 40024254, 40030854, 40030855, 40033591, 40033592, 20202247, 40014474, 40014475, 202261, 40021891, 40014477, 40014478, 40012355, 40017959, 40005017, 40014459, 40014460, 40030850, 40014480, 40014481, 200287, 40022148, 40022149, 40014995, 40011239, 40011242, 40031768, 40034171, 40011246, 40019386, 40028973, 40029003, 402302, 40020726, 40030845, 40030846, 40033588, 40033589, 314246, 20201549, 40011993, 20401262, 20401279, 20401277, 20401278, 20401291, 20401273, 40021845, 40020725, 40014483, 40014484, 40021609, 40005593, 309267, 201171, 40010997, 40020903) union -- records that have scope notes select    convert(varchar(10), l.lexicon_id) as short_identifier,    l.term as display_name,    case        when l.node_type in (            'cl', 'div', 'f', 'fam', 'gen', 'or', 'sect', 'sp',            'subcl', 'subgen', 'subsect', 'subsp', 'var')        then l.node_type        else null    end as taxon_rank,    fam.term as family,    case when fam.term is null or fam.term = ''        then null        else            'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('            + convert(varchar(10), fam.ancestor_id) + ')''' + fam.term + ''''    end as family_refname, tn.notho_taxon as hybrid_fg,    case        when tn.access_code is not null then            case                when tn.access_code = 0 then 'Unrestricted'                when tn.access_code = 1 then 'Restricted'                when tn.access_code = 2 then 'Dead'            end        else            case                when l.access_code = 0 then 'Unrestricted'                when l.access_code = 1 then 'Restricted'                when l.access_code = 2 then 'Dead'            end    end as access_restrictions,    convert(varchar(255), sn.scope_notes) as taxon_note from    lexicon l,    taxon_name tn,    (select tc.descendant_id, tc.ancestor_id, l.term        from trans_closure tc, lexicon l        where tc.ancestor_id = l.lexicon_id        and l.node_type = 'fam') fam,    scope_notes sn where l.lexicon_id *= tn.taxon_id and l.lexicon_id *= fam.descendant_id and l.node_type not in ('common', 'bioname', 'na', 'root', 'cv') and l.lexicon_id = sn.lexicon_id and l.lexicon_id not in (    select lexicon_id from lexicon_term_relation    where relation_type in ('TRV', 'TNA')) union -- cultivar records without scope notes select    convert(varchar(10), c.lexicon_id) as short_identifier,    c.new_term as display_name,    'cv' as taxon_rank,    fam.term as family,    case when fam.term is null or fam.term = ''        then null        else            'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('            + convert(varchar(10), fam.ancestor_id) + ')''' + fam.term + ''''    end as family_refname, tn.notho_taxon as hybrid_fg,    case        when tn.access_code is not null then            case                when tn.access_code = 0 then 'Unrestricted'                when tn.access_code = 1 then 'Restricted'                when tn.access_code = 2 then 'Dead'            end        else            case                when c.access_code = 0 then 'Unrestricted'                when c.access_code = 1 then 'Restricted'                when c.access_code = 2 then 'Dead'            end    end as access_restrictions,    null as taxon_note from    cspace_lexicon_cv c,    taxon_name tn,    (select tc.descendant_id, tc.ancestor_id, l.term        from trans_closure tc, lexicon l        where tc.ancestor_id = l.lexicon_id        and l.node_type = 'fam') fam where c.lexicon_id *= tn.taxon_id and c.lexicon_id *= fam.descendant_id and c.lexicon_id not in (select lexicon_id from scope_notes) and c.lexicon_id not in (    select lexicon_id from lexicon_term_relation    where relation_type in ('TRV', 'TNA')) union -- cultivar records that have scope notes select    convert(varchar(10), c.lexicon_id) as short_identifier,    c.new_term as display_name,    'cv' as taxon_rank,    fam.term as family,    case when fam.term is null or fam.term = ''        then null        else            'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('            + convert(varchar(10), fam.ancestor_id) + ')''' + fam.term + ''''    end as family_refname, tn.notho_taxon as hybrid_fg,    case        when tn.access_code is not null then            case                when tn.access_code = 0 then 'Unrestricted'                when tn.access_code = 1 then 'Restricted'                when tn.access_code = 2 then 'Dead'            end        else            case                when c.access_code = 0 then 'Unrestricted'                when c.access_code = 1 then 'Restricted'                when c.access_code = 2 then 'Dead'            end    end as access_restrictions,    convert(varchar(255), sn.scope_notes) as taxon_note from    cspace_lexicon_cv c,    taxon_name tn,    (select tc.descendant_id, tc.ancestor_id, l.term        from trans_closure tc, lexicon l        where tc.ancestor_id = l.lexicon_id        and l.node_type = 'fam') fam,    scope_notes sn where c.lexicon_id *= tn.taxon_id and c.lexicon_id *= fam.descendant_id and c.lexicon_id = sn.lexicon_id and c.lexicon_id not in (    select lexicon_id from lexicon_term_relation    where relation_type in ('TRV', 'TNA')) order by short_identifier;
Notes
- Not getting author shortIDs in view, just names. For now putting them in Taxon Notes. Adjust view or get them separately in SQL.
- Do we need to adjust the format of scientific names?
- Lots of other Sage data for Names to add
- SQL query above is a union of names without scope notes and names with scope notes. the query that added an outer join with the scope notes table took over 50 minutes to run, so it was easier to split it out into a union of two queries.
- SQL query above excludes the following node_types: bioname, common, na, root. It also excludes TRE, TRV, and TNA terms. TNA terms will be added as an alternate name of type 'no author name'. TRE names are all bionames, so the bioname exclusion will also exclude TRE names.
- This query requires a table be refreshed on SAGE, cspace_lexicon_cv.
Talend/ETL
bgtaxonimport2 0.2
MERGE: Taxon Terms (repeating group) - Lam (SQL ready), Yuteh (Talend)
Field mapping
SQL
Note: This query requires a table be refreshed on SAGE, cspace_lexicon_cv.
-- verified names select    convert(varchar(10), l.lexicon_id) as short_identifier,    l.term as display_name,    l.term as term_name,    'Verified Taxonomic Name' as term_type,    an.name as term_source,                                                       an.agent_id as term_source_uid,    'accepted' as term_status,    case        when tn.catalog_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    1 as preferred_flag from    lexicon l,    taxon_name tn,    agent_name an,    agent_name_role anr where l.lexicon_id = tn.taxon_id and tn.data_src_id = an.agent_id and an.agent_id = anr.agent_id and an.agent_name_id = anr.agent_name_id and an.name_type = anr.name_type and an.name_type = 'pref' and anr.role_type = 'datasrc' and l.node_type not in ('common', 'bioname', 'na', 'root', 'cv') union -- no author names select    convert(varchar(10), rl.lexicon_id) as short_identifier,    l.term as display_name,    l.term as term_name,    'Taxon No Author Name' as term_type,    null as term_source,    null as term_source_uid,    'accepted' as term_status,    case        when l.valid_catalog_term_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    0 as preferred_flag from    lexicon l,    lexicon_term_relation ltr,    lexicon rl where l.lexicon_id = ltr.lexicon_id and ltr.relation_type = 'TNA' and ltr.related_lexicon_id = rl.lexicon_id and l.node_type != 'cv' and l.term != rl.term union -- remaining 821 terms that are not verified terms and are not no author names: select    convert(varchar(10), l.lexicon_id) as short_identifier,    l.term as display_name,    l.term as term_name,    'Unverified Name' as term_type,    null as term_source,    null as term_source_uid,    'accepted' as term_status,    case        when l.valid_catalog_term_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    0 as preferred_flag from    lexicon l where l.lexicon_id not in (    select taxon_id from taxon_name) and l.node_type not in ('bioname', 'common', 'root', 'na', 'cv') and l.lexicon_id not in (    select lexicon_id from lexicon_term_relation where relation_type = 'TNA') and l.lexicon_id not in (    select lexicon_id from lexicon_term_relation where relation_type = 'TRV') union -- cv verified names select    convert(varchar(10), c.lexicon_id) as short_identifier,    c.new_term as display_name,    c.new_term as term_name,    'Verified Taxonomic Name' as term_type,    an.name as term_source,    an.agent_id as term_source_uid,    'accepted' as term_status,    case        when tn.catalog_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    1 as preferred_flag from    cspace_lexicon_cv c,    taxon_name tn,    agent_name an,    agent_name_role anr where c.lexicon_id = tn.taxon_id and tn.data_src_id = an.agent_id and an.agent_id = anr.agent_id and an.agent_name_id = anr.agent_name_id and an.name_type = anr.name_type and an.name_type = 'pref' and anr.role_type = 'datasrc' union -- cv no author names select    convert(varchar(10), rl.lexicon_id) as short_identifier,    c.new_term as display_name,    c.new_term as term_name,    'Taxon No Author Name' as term_type,    null as term_source,    null as term_source_uid,    'accepted' as term_status,    case        when c.valid_catalog_term_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    0 as preferred_flag from    cspace_lexicon_cv c,    lexicon_term_relation ltr,    cspace_lexicon_cv rl where c.lexicon_id = ltr.lexicon_id and ltr.relation_type = 'TNA' and ltr.related_lexicon_id = rl.lexicon_id and c.new_term != rl.new_term union -- remaining 821 terms that are not verified terms and are not no author names: select    convert(varchar(10), c.lexicon_id) as short_identifier,    c.new_term as display_name,    c.new_term as term_name,    'Unverified Name' as term_type,    null as term_source,    null as term_source_uid,    'accepted' as term_status,    case        when c.valid_catalog_term_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    0 as preferred_flag from    cspace_lexicon_cv c where c.lexicon_id not in (    select taxon_id from taxon_name) and c.lexicon_id not in (    select lexicon_id from lexicon_term_relation where relation_type = 'TNA') and c.lexicon_id not in (    select lexicon_id from lexicon_term_relation where relation_type = 'TRV') union -- grex names select    convert(varchar(10), l.lexicon_id) as short_identifier,    rtrim(str_replace(substring(l.term, 1, patindex('% ''%''', l.term)), ' cv. ', ' ')) as display_name,    rtrim(str_replace(substring(l.term, 1, patindex('% ''%''', l.term)), ' cv. ', ' ')) as term_name,    'Grex Name' as term_type,    an.name as term_source,    an.agent_id as term_source_uid,    'accepted' as term_status,    case        when tn.catalog_flag = 1 then 'valid'        else 'invalid'    end as taxonomic_status,    0 as preferred_flag from    lexicon l,    taxon_name tn,    agent_name an,    agent_name_role anr where l.lexicon_id = tn.taxon_id and tn.data_src_id = an.agent_id and an.agent_id = anr.agent_id and an.agent_name_id = anr.agent_name_id and an.name_type = anr.name_type and an.name_type = 'pref' and anr.role_type = 'datasrc' and l.node_type = 'cv' and l.term like '%cv. _%''%''' and l.term not like 'cv. %' and l.lexicon_id != 40025687 order by preferred_flag desc, short_identifier;
Notes
- Note: This query requires a table be refreshed on SAGE, cspace_lexicon_cv.
Talend/ETL
MERGE: Taxon Authors (repeating group) - Lam (SQL ready)
Field mapping
SQL
select    tn.taxon_id as legacy_id,    an.agent_id as agent_id,    tn.author_id as author_name_id,    an.name as author_name,    'author' as author_type,    1 as author_order from    taxon_name tn,    agent_name an where tn.author_id = an.agent_name_id and an.name_type = 'kewabbr' and tn.author_id <> 5 union select    tn.taxon_id as legacy_id,    an.agent_id as agent_id,    tn.ascr_auth_id as author_name_id,    an.name as author_name,    'ascribed author' as author_type,    2 as author_order from    taxon_name tn,    agent_name an where tn.ascr_auth_id = an.agent_name_id and an.name_type = 'kewabbr' and tn.ascr_auth_id <> 5 union select    tn.taxon_id as legacy_id,    an.agent_id as agent_id,    tn.p_author_id as author_name_id,    an.name as author_name,    'parenthetical author' as author_type,    3 as author_order from    taxon_name tn,    agent_name an where tn.p_author_id = an.agent_name_id and an.name_type = 'kewabbr' and tn.p_author_id <> 5 union select    tn.taxon_id as legacy_id,    an.agent_id as agent_id,    tn.p_ascr_auth_id as author_name_id,    an.name as author_name,    'parenthetical ascribed author' as author_type,    4 as author_order from    taxon_name tn,    agent_name an where tn.p_ascr_auth_id = an.agent_name_id and an.name_type = 'kewabbr' and tn.p_ascr_auth_id <> 5 order by legacy_id, author_order
Notes
Talend/ETL
MERGE: Taxon Common Names (repeating group) - Lam (SQL ready)
Field mapping
Need to get relationship type to and map to values in CSpace: Preferred, Secondary, Rejected
SQL
select    ltr.related_lexicon_id as legacy_id,    ltr.lexicon_id as common_name_id,    l.term as common_name,    'English' as language,    case when ltr.relation_type = 'TCN' then 'preferred'         when ltr.relation_type = 'TCNS' then 'secondary'         when ltr.relation_type = 'TCNR' then 'rejected'         end as common_name_type from    lexicon_term_relation ltr,    lexicon l where ltr.lexicon_id = l.lexicon_id and ltr.relation_type in ('TCN', 'TCNS', 'TCNR')
Notes
Talend/ETL
MERGE: Taxon Attributes (repeating group) - Lam (SQL ready)
E.g., dimensions, habitat, etc.
Field mapping
SQL
select    pa.lexicon_id as legacy_id,    pa.date_entered as attribute_date,    pa.agent_id as recorded_by_id,    rb.name as recorded_by_name,    pa.plant_height as height,    pa.plant_width as width,    pa.plant_dbh as dbh,    pa.plant_habit as habitat,    pa.climate_rating as climate_rating,    ca.agent_id as conservation_agent_id,    ca.name as conservation_agent_name,    cc.conserv_cat as convservation_code,    case     when ca.name = 'California Native Plant Society' then 'CNPS'     when ca.name = 'not applicable' then ''     when ca.name = 'International Union for Conservation of Nature and Natural Resources - 1997' then 'IUCN-1997' --  when ca.name = 'Convention on International Trade in Endangered Species' then 'CNPS' when ca.name = 'Convention on International Trade in Endangered Species' then 'CITES'     when ca.name = 'Rhododendron Species Foundation' then 'RSF' --  when ca.name = 'International Union for Conservation of Nature and Natural Resources - 2006' then 'CNPS' when ca.name = 'International Union for Conservation of Nature and Natural Resources - 2006' then 'IUCN-2006'     when ca.name = 'South African National Biodiversity Institute' then 'SANBI'     when ca.name = 'UNEP World Conservation Monitoring Center - CITES' then 'UNEP-CITES'     when ca.name = 'New York Botanical Garden' then 'NYBG'     when ca.name = 'US Fish & Wildlife Service' then 'USFWS'     when ca.name = 'California Department of Fish & Game' then 'CADFG'     when substring(cc.comp_key, 1, 5) = '60801' or ca.name = 'IUCN/SSC Global Tree Specialist Group, Red List of Rhododendrons - 2011' then 'IUCN-Rhododendrons-2011'     when substring(cc.comp_key, 1, 5) = '60804' or ca.name = 'International Union for Conservation of Nature and Natural Resources - 2011.1' then 'IUCN-2011.1'     when substring(cc.comp_key, 1, 5) = '60849' or ca.name = 'International Union for Conservation of Nature and Natural Resources - 2011.2' then 'IUCN-2011.2'     when substring(cc.comp_key, 1, 5) = '60905' or ca.name = 'International Union for Conservation of Nature and Natural Resources - 2012.1' then 'IUCN-2012.1'     when substring(cc.comp_key, 1, 5) = '60917' or ca.name = 'International Union for Conservation of Nature and Natural Resources - 2012.2' then 'IUCN-2012.2'     when ca.name = 'Non-conservation publication' then 'Non-conservation pub'     when cc.conserv_id in (38, 39, 41, 42, 47) then ''     end as conservCode_append,    pa.frost_sensitive as frost_sensitive,    pa.medicinal_flag as medicinal_use,    pa.economic_flag as economic_use from    plant_attr pa,    agent_name rb,    agent_name ca,    conserv_code cc,    agent_name_role anr where pa.agent_id = rb.agent_id and rb.name_type = 'pref' and pa.conserv_id = cc.conserv_id and cc.comp_key *= anr.comp_key and cc.role_type *= anr.role_type and cc.role_type = 'conserv' and anr.agent_id *= ca.agent_id and anr.agent_name_id *= ca.agent_name_id and anr.name_type *= ca.name_type and ca.name_type = 'pref'
Notes
Talend/ETL
MERGE: Taxon Related Terms (repeating group) - Lam (SQL ready)
Might be needed for relationships to synonyms, unverified terms, verified terms (?), to hybrid parent names (?), etc.
Field mapping
SQL
select    ltr.related_lexicon_id as legacy_id,    ltr.lexicon_id as related_term_id,    l.term as related_term,    ltr.relation_type as relation_type    'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('        + convert(varchar(10), ltr.lexicon_id) + ')''' + l.term + ''''        as related_term_refname from    lexicon_term_relation ltr,    lexicon l where ltr.lexicon_id = l.lexicon_id and ltr.relation_type in ('FHP', 'MHP', 'SYN') and l.node_type != 'cv' union -- cv relations select    ltr.related_lexicon_id as legacy_id,    ltr.lexicon_id as related_term_id,    c.term as related_term,    ltr.relation_type as relation_type    'urn:cspace:botgarden.cspace.berkeley.edu:taxonomyauthority:name(taxon):item:name('        + convert(varchar(10), ltr.lexicon_id) + ')''' + c.term + ''''        as related_term_refname from    lexicon_term_relation ltr,    cspace_lexicon_cv c where ltr.lexicon_id = c.lexicon_id and ltr.relation_type in ('FHP', 'MHP', 'SYN') order by legacy_id, relation_type;
Notes
Talend/ETL
Taxonomic Hierarchy (hierarchical relations) - Lam (SQL ready)
Based on parent taxa identified in taxon table
Field mapping
SQL
select    taxon_id as short_identifier,    parent_id as parent_id from    taxon_name tn,    lexicon l where tn.taxon_id = l.lexicon_id and l.node_type not in ('common', 'bioname', 'na', 'root') and parent_id <> 0 order by short_identifier;
Notes
Talend/ETL
Common Taxonomic Names (common vocabulary in taxonomy authority) - Lam (SQL ready), Yuteh (Talend)
Should get Access Code in this SQL too, right?
Field mapping
SQL
-- this query includes 16 common name duplicates. -- duplicates are referenced in lexicon_term_relation.lexicon_id where relation_type in ('TCN', 'TCNS', 'TCNR') select    l.lexicon_id as short_identifier,    l.term as display_name,    case                when l.access_code = 0 then 'Unrestricted'                when l.access_code = 1 then 'Restricted'                when l.access_code = 2 then 'Dead'    end as access_restrictions from    lexicon l where l.node_type = 'common';
Notes
exact duplicates (across term, access_code, valid_catalog_term_flag):
select count(*), term
from lexicon where node_type = 'common'
group by term, access_code, valid_catalog_term_flag
having count(*) > 1
order by term
2 Arnica
2 California-Lilac
2 crimson glory vine
2 Hakeke, Mountain Holly
2 Horoeka, Lancewood
2 Horopito, Pepper Tree
2 Lion's Spoor
2 Monkeyflower
2 Mountain Rimu, Pigmy pine
2 Mountain totara, Snow totara
2 Papapa, Snowberry
2 Tanekaha, Celery pine
2 Tarata, Lemonwood
2 Wi Kura, Narrow-leaved Snow Tussock
These two names are term duplicates (access_code or valid_catalog_term_flag are different)
2 SECTION CHONIASTRUM
2 ti kauka, Cabbage tree
Here are the duplicates that are referenced in lexcon_term_relation for common names:
select distinct l.lexicon_id, l.term
from lexicon l, lexicon_term_relation ltr
where l.lexicon_id = ltr.lexicon_id
and relation_type in ('TCN', 'TCNS', 'TCNR')
and l.term in (
       select term
       from lexicon
       where node_type = 'common'
       group by term
       having count(*) > 1)
order by term, lexicon_id;
40000115 Arnica
40002353 Arnica
40000552 California-Lilac
40000588 California-Lilac
40006165 Crimson Glory Vine
40001587 Hakeke, Mountain Holly
40001588 Hakeke, Mountain Holly
40001687 Horoeka, Lancewood
40001688 Horoeka, Lancewood
40001690 Horopito, Pepper Tree
40001691 Horopito, Pepper tree
40024379 Lion's Spoor
40024418 Lion's Spoor
40002311 Monkeyflower
40002314 Monkeyflower
40002384 Mountain Rimu, Pigmy pine
40002385 Mountain Rimu, Pigmy Pine
40002391 Mountain totara, Snow totara
40002392 Mountain totara, Snow totara
40002615 Papapa, Snowberry
40002616 Papapa, Snowberry
40003741 Tanekaha, Celery pine
40003742 Tanekaha, Celery pine
40003751 Tarata, Lemonwood
40003752 Tarata, Lemonwood
40003809 Ti kauka, Cabbage tree
40004151 Wi Kura, Narrow-leaved Snow Tussock
40004152 Wi Kura, Narrow-leaved snow tussock
Note: 40004151 and 40004152 are duplicates because of case insensitivity in Sybase db.
Talend/ETL
Plant Sales taxonomic vocabulary (plantsales vocabulary in taxonomy authority)
We will be constructing a taxonomy vocabulary (third instance) called Plant Sales that will be used to support the Pot Tag procedure. This will be made up of the unique taxonomic names in SAGE's volunteer_label table (using the formatted name field).
Question: Will we also use family names from the volunteer_label table?
Field Mapping
...
SQL
/* -- View to get smallest label_id as the short identifier for the formatted_taxon name. -- Manually excludes 7 sets of duplicates due to the taxon name being a valid genus name as well as a species name. -- The name in volunteer_label.formatted_taxon is the genus name, but due to the Sybase db being case insensitive, it creates a false match. create view cspace_formatted_taxon_id_rank as select    min(vl.label_id) as label_id,    vl.formatted_taxon,    l.node_type,    l.term from volunteer_label vl, lexicon l where vl.formatted_taxon *= l.term and l.node_type not in ('bioname', 'common', 'na', 'root') and l.lexicon_id not in (    20300328, 20300815, 20302343, 20302378, 40022302, 20304254, 20304427) group by vl.formatted_taxon, l.node_type, l.term; */ -- Query to get the plant sales taxon main record. -- There is only one term name, so no need for a second query for term names for xml merge. select distinct    id_rank.label_id as short_identifier,    vl.formatted_taxon as display_name,    case        when id_rank.node_type is not null then id_rank.node_type        else            case                when vl.formatted_taxon like '% cv. %' then 'cv'                when vl.formatted_taxon like '% var. %' then 'var'                when (vl.formatted_taxon like '% f. %'                    and vl.formatted_taxon not like '% F. Ritter%'                    and vl.formatted_taxon not like '% F. Muell.%'                    and vl.formatted_taxon not like '% F. Dietr.%'                    and vl.formatted_taxon not like '% F. Schmidt%'                    and vl.formatted_taxon not like '% F. Phil.%'                    and vl.formatted_taxon not like '% F. Schwarz%') then 'f'                when vl.formatted_taxon like '% subsp. %' then 'subsp'                else 'sp'            end    end as taxon_rank,    vl.formatted_taxon as term_name,    null as term_type,    'accepted' as term_status,    'valid' as taxonomic_status from    volunteer_label vl,    cspace_formatted_taxon_id_rank id_rank where vl.formatted_taxon *= id_rank.formatted_taxon and vl.formatted_taxon is not null;
Notes
...
Talend/ETL
...
Garden Locations (Storage locations vocabulary) - SQL ready, Talend (Chris)
Field mapping
SQL
SELECT 'garden'||convert(varchar(5), garden_loc_id) shortidentifier, loc_type locationType, ltrim(full_loc_name) locationDisplayName, garden_loc_id legacy_id, parent_id, loc_hier termSourceNote, ltrim(loc_name) locationName, entered_staff_id, date_entered, last_change_staff_id, last_change_date FROM dbo.garden_location
Notes
- Location Types: bed, section, collection
- Location Type format: <locationType>urn:cspace:botgarden.cspace.berkeley.edu:vocabularies:name(locationtype):item:name(drawer)'Drawer'</locationType>
- Vocabulary CSID to persist: c2833911-4bca-49aa-a3f3
- Refname format: <refName>urn:cspace:botgarden.cspace.berkeley.edu:locationauthorities:name(location):item:name(crhbed11350962164634)'crhbed1'</refName>
Talend/ETL
bgimportgardenlocation 0.2 in Chris's Talend botgarden repository
Rename output file from ucbggardenlocation job to gardenlocout.txt!!
Garden Locations Hierarchy (hierarchical relationships) - SQL ready, Talend (Chris)
Field mapping
SQL
Notes
Talend/ETL
Rename output file from ucbggardenlocation job to gardenlocout.txt!!
ucbglocrelations v0.1in Chris's Talend botgarden repository
Job creates 571 relations, but one needs to be removed. garden0/'none' has no broader record but has output in the file. Remove by hand. Looked like:
<import service="Relations" type="Relation" CSID="1fde0aff-dbb0-4446-9745-eaa6df4039ae"> <schema xmlns:relations_common="http://collectionspace.org/relation" name="relations_common"> <relationshipType>hasBroader</relationshipType> <subjectDocumentType>Locationitem</subjectDocumentType> <subjectCsid>96fbf471-1f5f-4dab-8a25-516f0eab3e5b</subjectCsid> <subjectUri>/locationauthorities/dbf254e8-437d-4926-be5d/items/96fbf471-1f5f-4dab-8a25-516f0eab3e5b</subjectUri> <subjectRefName>urn:cspace:botgarden.cspace.berkeley.edu:locationauthorities:name(location):item:name(garden0)'none'</subjectRefName> <objectDocumentType>Locationitem</objectDocumentType> <objectCsid/> <objectUri/> <objectRefName/> </schema> </import>
Place Authority - Lam (SQL), Yuteh (Talend)
Field mapping
SQL
select    gl.geog_lex_id as place_id,    gl.geog_term as display_name,    gtc.geog_type_name as place_type,    gn.notes as place_note from    geog_lexicon gl,    geog_type_code gtc,    geog_name gn where gl.geog_lex_id not in (    select geog_lex_id    from geog_term_relation    where relation_type in ('SYN', 'RQN')) and gl.geog_type not in ('element', 'root') and gl.geog_type = gtc.geog_type and gl.geog_lex_id *= gn.geog_id;
Notes
Talend/ETL
MERGE: Place terms (repeating group)
Field mapping
SQL
-- geography term name repeating group -- verified geographic names select    gl.geog_lex_id as place_id,    gl.geog_term as display_name,    gl.geog_term as term_name,    'English' as term_language,    gn.iso_country_cd as abbreviation,    'Verified Geographic Name' as term_type,    gn.data_src_id as term_source_id,    an.name as term_source,    gn.valid_term_flag as term_status,    1 as preferred_flag from    geog_lexicon gl,    geog_name gn,    agent_name an where gl.geog_lex_id = gn.geog_id and gn.data_src_id = an.agent_id and an.name_type = 'pref' union -- short names select    gn.geog_id as place_id,    gl.geog_term as display_name,    gl.geog_term as term_name,    'English' as term_language,    gn.iso_country_cd as abbreviation,    'Short Name' as term_type,    gn.data_src_id as term_source_id,    an.name as term_source,    0 as term_status,    0 as preferred_flag from    geog_lexicon gl,    geog_name gn,    agent_name an where gl.geog_lex_id = gn.geog_element_id and gn.data_src_id = an.agent_id and an.name_type = 'pref' union -- reverse qualified names and synonyms                                        select    gtr.related_geog_lex_id as place_id,    gl.geog_term as display_name,    gl.geog_term as term_name,    'English' as term_language,    null as abbreviation,    case        when gtr.relation_type = 'RQN' then 'Reverse Qualified Name'        when gtr.relation_type = 'SYN' then 'Synonym'        else null    end as term_type,    null as term_source_id,    null as term_source,    gl.valid_term_flag as term_status,    0 as preferred_flag from    geog_term_relation gtr,    geog_lexicon gl,    geog_lexicon rgl where gtr.geog_lex_id = gl.geog_lex_id and gtr.related_geog_lex_id = rgl.geog_lex_id and gtr.relation_type in ('RQN', 'SYN') and gl.geog_term != rgl.geog_term union -- remaining 71 geog terms that are not verified names, reverse qualified names, or synonyms select    gl.geog_lex_id as place_id,    gl.geog_term as display_name,    gl.geog_term as term_name,    'English' as term_language,    null as abbreviation,    'Unverified Geographic Name' as term_type,    null as term_source_id,    null as term_source,    gl.valid_term_flag as term_status,    1 as preferred_flag from    geog_lexicon gl where gl.geog_lex_id not in (    select geog_lex_id from geog_term_relation where relation_type in ('RQN', 'SYN')) and gl.geog_lex_id not in (    select geog_id from geog_name) and gl.geog_type not in ('element', 'root') order by place_id, preferred_flag, display_name;
Notes
Talend/ETL
Place hierarchy (hierarchical relationships)
Field mapping
SQL
select    gn.geog_id as place_id,    case        when gn.geog_parent_id = 0 then null        else gn.geog_parent_id    end as parent_id from    geog_name gn,    geog_lexicon gl where gl.geog_lex_id = gn.geog_id and gl.geog_type not in ('element', 'root');
Notes
Talend/ETL
Research Projects (Concept vocabulary) - Chris (SQL ready), Talend (Yuteh)
Field mapping
SQL
SELECT project_id legacy_id, 'researchproject'||convert(varchar(20), project_id) shortIdentifier, project_name conceptName, person_id conceptPerson, org_id conceptOrganization, agent_id createdBy, date_entered createdAt, last_change_date updatedAt, last_change_agent_id updatedBy FROM dbo.research_project
Notes
- refname format: urn:cspace:botgarden.cspace.berkeley.edu:conceptauthorities:name(research_ca):item:name(research11350934474341)'research1'
- vocabulary CSID to persist: deb53f97-9829-47b9-99cb
Talend/ETL
New Talend job by Yuteh: BG_concept_researchProj 0.1
Classes (Concept vocabulary) - Chris (SQL ready), Talend (Yuteh)
Field mapping
SQL
SELECT course_id legacy_id, 'class'||convert(varchar(20), course_id) shortIdentifier, course_title conceptName, ad.name||' '||course_num conceptTerm_coursenum, person_id conceptPerson, cu.org_id conceptOrganization_dept, ad.name deptName, inst_id institution_orgParent, ai.name conceptNote_instName, semester conceptQualifier_semester, cu.agent_id createdBy, cu.date_entered createdAt, cu.last_change_date updatedAt, cu.last_change_agent_id updatedBy FROM dbo.class_use cu left outer join agent_name ad on (cu.org_id = ad.agent_id) left outer join agent_name ai on (cu.inst_id = ai.agent_id)
Notes
- Institution (inst_id) is treated as the parent institution of the department (org_id). However, this does not necessarily match what is recorded in the agent tables. We will put the institution name in the scope notes field.
- refname format: urn:cspace:botgarden.cspace.berkeley.edu:conceptauthorities:name(class_ca):item:name(crhtestclass11350934263503)'crhtestclass1'
- vocabulary CSID to persist: 96141839-4bac-4e3d-a511
Talend/ETL
New Talend job by Yuteh: BG_concept_class 0.1
Propagation medium (Concept vocabulary) - Chris (SQL ready), Talend (Yuteh)
Field mapping
SQL
SELECT case when medium_description is null or medium_description = '' then medium_name else medium_description end as conceptDisplayName, medium_name conceptTerm, entered_staff_id, date_entered, last_change_staff_id, last_change_date FROM dbo.medium order by medium_description
Notes
- Construct the shortidentifier as 'medium' + an incrementing number, in Talend
- Refname format: urn:cspace:botgarden.cspace.berkeley.edu:conceptauthorities:name(concept):item:name(medium11350937075031)'medium1'
- Vocabulary CSID to persist: bbe39bb2-a622-4570-a946
Talend/ETL
New Talend job by Yuteh: BG_concept_propgMedium 0.1
Conservation Category (concept vocabulary) - Chris (SQL ready), Talend (Yuteh)
Field mapping
SQL
SELECT Â Â Â conserv_id, Â Â Â 'conserv'||convert(varchar(5), conserv_id) shortIdentifier, Â Â Â comp_key, Â Â Â substring(comp_key, 1, 5) consagent_name_id, Â Â Â an.name agentname, Â Â Â rtrim(conserv_cat) conserv_code, Â Â Â case when an.name = 'California Native Plant Society' then 'CNPS' Â Â Â Â when an.name = 'not applicable' then '' Â Â Â Â when an.name = 'International Union for Conservation of Nature and Natural Resources - 1997' then 'IUCN-1997' --Â Â Â when an.name = 'Convention on International Trade in Endangered Species' then 'CNPS' when an.name = 'Convention on International Trade in Endangered Species' then 'CITES' Â Â Â Â Â when an.name = 'Rhododendron Species Foundation' then 'RSF' --Â Â when an.name = 'International Union for Conservation of Nature and Natural Resources - 2006' then 'CNPS' when an.name = 'International Union for Conservation of Nature and Natural Resources - 2006' then 'IUCN-2006' Â Â Â Â when an.name = 'South African National Biodiversity Institute' then 'SANBI' Â Â Â Â when an.name = 'UNEP World Conservation Monitoring Center - CITES' then 'UNEP-CITES' Â Â Â Â when an.name = 'New York Botanical Garden' then 'NYBG' Â Â Â Â when an.name = 'US Fish & Wildlife Service' then 'USFWS' Â Â Â Â when an.name = 'California Department of Fish & Game' then 'CADFG' Â Â Â Â when substring(comp_key, 1, 5) = '60801' or an.name = 'IUCN/SSC Global Tree Specialist Group, Red List of Rhododendrons - 2011' then 'IUCN-Rhododendrons-2011' Â Â Â Â when substring(comp_key, 1, 5) = '60804' or an.name = 'International Union for Conservation of Nature and Natural Resources - 2011.1' then 'IUCN-2011.1' Â Â Â Â when substring(comp_key, 1, 5) = '60849' or an.name = 'International Union for Conservation of Nature and Natural Resources - 2011.2' then 'IUCN-2011.2' Â Â Â Â when substring(comp_key, 1, 5) = '60905' or an.name = 'International Union for Conservation of Nature and Natural Resources - 2012.1' then 'IUCN-2012.1' Â Â Â Â when substring(comp_key, 1, 5) = '60917' or an.name = 'International Union for Conservation of Nature and Natural Resources - 2012.2' then 'IUCN-2012.2' Â Â Â Â when an.name = 'Non-conservation publication' then 'Non-conservation pub' Â Â Â Â when conserv_id in (38, 39, 41, 42, 47) then '' Â Â Â Â end as conservCode_append, Â Â Â cat_descr conserv_descr, Â Â Â case when red_dot = 1 then 'red dot on label' Â Â Â Â Â end as scope_note_red_dot FROM Â Â Â dbo.conserv_code c left outer join agent_name an on (substring(comp_key, 1, 5) = convert(varchar(20), an.agent_id) and an.name_type = 'pref') -- where conserv_id not in (38, 39, 41, 42, 47) order by comp_key
Notes
Conservation Categories are codes that are tied to Conservation Organizations. Each organization can have its own list of conservation codes, and these can and do change over time. Most have a code of 'rare'.Â
Need to figure out what display name will be. Should the display name in the vocabulary be like: "EN IUCN-1997", "IUCN-1997: Endangered - meets criteria for endangered and is facing a high risk of extinction in the wild". Think about how term completion will work. Might need to derive some organization abbreviations for these display names. Consider these "endangered" categories from different organizations (org, code, description):
EN International Union for Conservation of Nature and Natural Resources - 1997 Taxa in danger of extinction and whose survival is unlikely if the causal factors continue operating EN International Union for Conservation of Nature and Natural Resources - 2006 (null) EN UNEP World Conservation Monitoring Center - CITES (null) EN (null) (null) EN New York Botanical Garden (null) EN (null) (null) EN IUCN/SSC Global Tree Specialist Group, Red List of Rhododendrons - 2011 Endangered - meets criteria for endangered and is facing a high risk of extinction in the wild EN International Union for Conservation of Nature and Natural Resources - 2011.1 endangered EN Non-conservation publication endangered EN International Union for Conservation of Nature and Natural Resources - 2011.2 vulnerable EN International Union for Conservation of Nature and Natural Resources - 2012.1 endangered Endangered US Fish & Wildlife Service Endangered Species Act listing Endangered California Department of Fish & Game (null)
Talend/ETL
New Talend job by Yuteh: BG_concept_conservCatg 0.1
Collection Objects
Accessions - SQL ready (Chris), Talend dev (Chris), Data botgarden-dev loaded (Chris)
Field mapping
SQL
SELECT rtrim(a.accession_number) objectnumber, convert(date, accession_date) accessionDate, case when dead_flag=1 then 'true' else 'false' end as dead_flg, convert(date, dead_date) deadDate, material_type briefDescMaterial, case when data_quality = 1 then 'Complete' when data_quality = 0 then 'Unknown' end as recordStatusQuality, material_source, case when requested_by <> 0 then requested_by end as requested_by, case when collector_id <> 0 then collector_id end as collector_id, rtrim(collector_number) collector_number, collection_date, case when begin_date > 0 then dateadd(dd,(begin_date - 2415021), convert(date, '1900-01-01', 103)) end as begin_date_scalar, case when begin_date > 0 then datepart(year, dateadd(dd,(begin_date - 2415021), convert(datetime, '1900-01-01', 103))) end as begin_date_year, case when begin_date > 0 then datepart(month, dateadd(dd,(begin_date - 2415021), convert(datetime, '1900-01-01', 103))) end as begin_date_month, case when begin_date > 0 then datepart(day, dateadd(dd,(begin_date - 2415021), convert(datetime, '1900-01-01', 103))) end as begin_date_day, case when end_date > 0 then dateadd(dd,(end_date - 2415021), convert(date, '1900-01-01', 103)) end as end_date_scalar, case when end_date > 0 then datepart(year, dateadd(dd,(end_date - 2415021), convert(datetime, '1900-01-01', 103))) end as end_date_year, case when end_date > 0 then datepart(month, dateadd(dd,(end_date - 2415021), convert(datetime, '1900-01-01', 103))) end as end_date_month, case when end_date > 0 then datepart(day, dateadd(dd,(end_date - 2415021), convert(datetime, '1900-01-01', 103))) end as end_date_day, case when an.notes is not null then accession_notes||'; '||an.notes else accession_notes end as accnNotes, case when plant_status='1' then 'true' else 'false' end as rare_flag, case when cbd_restrict_flag=1 then 'true' else 'false' end as cbd_restrict_flag, a.agent_id, a.date_entered, a.last_change_agent_id, a.last_change_date, case when l.access_code = 1 then 'no' else 'yes' end as posttopublic, cs.habitat_descr fieldCollectionNote, cs.veg_type vegetationType, flower_color, fruit_color, security_risk, case when (prov_type_cd = 'G' or cs.geog_lex_id in (2031, 2326)) then 'true' end as cultivar_flag, case when prov_type_cd = 'W' then 'W-wild source' when prov_type_cd = 'Z' then 'Z-cultivated from wild source' when (prov_type_cd = 'G' or cs.geog_lex_id in (2031, 2326)) then 'G-not wild source' when prov_type_cd = 'U' then 'U-unknown' end as provenance_type, case when prop_hist_cd = 'VA' then 'vegetative reproduction' when prop_hist_cd = 'S' then 'sexual reproduction' end as propagation_history, case when class_use_flag = 1 then 'true' else 'false' end as class_use_flg, case when research_flag = 1 then 'true' else 'false' end as research_flg, case when distrib_flag = 1 then 'true' else 'false' end as distrib_flg, case when ignore_red_dot_flag = 1 then 'true' else 'false' end as ignore_red_dot_flg, case when fragrance_flag = 1 then 'true' when fragrance_flag=0 then 'false' end as fragrance_flg, case when breeding_system = 'M' then 'male' when breeding_system = 'F' then 'female' when breeding_system = 'B' then 'both' when breeding_system = 'Q' then 'dioecious and unknown sex' when breeding_system = 'U' then 'unknown' end as breeding_system, case when flower_jan = 0 then 'No' when flower_jan = 1 then 'Some' when flower_jan = 2 then 'Many' end as flowers_jan, case when flower_feb = 0 then 'No' when flower_feb = 1 then 'Some' when flower_feb = 2 then 'Many' end as flowers_feb, case when flower_mar = 0 then 'No' when flower_mar = 1 then 'Some' when flower_mar = 2 then 'Many' end as flowers_mar, case when flower_apr = 0 then 'No' when flower_apr = 1 then 'Some' when flower_apr = 2 then 'Many' end as flowers_apr, case when flower_may = 0 then 'No' when flower_may = 1 then 'Some' when flower_may = 2 then 'Many' end as flowers_may, case when flower_jun = 0 then 'No' when flower_jun = 1 then 'Some' when flower_jun = 2 then 'Many' end as flowers_jun, case when flower_jul = 0 then 'No' when flower_jul = 1 then 'Some' when flower_jul = 2 then 'Many' end as flowers_jul, case when flower_aug = 0 then 'No' when flower_aug = 1 then 'Some' when flower_aug = 2 then 'Many' end as flowers_aug, case when flower_sep = 0 then 'No' when flower_sep = 1 then 'Some' when flower_sep = 2 then 'Many' end as flowers_sep, case when flower_oct = 0 then 'No' when flower_oct = 1 then 'Some' when flower_oct = 2 then 'Many' end as flowers_oct, case when flower_nov = 0 then 'No' when flower_nov = 1 then 'Some' when flower_nov = 2 then 'Many' end as flowers_nov, case when flower_dec = 0 then 'No' when flower_dec = 1 then 'Some' when flower_dec = 2 then 'Many' end as flowers_dec, case when fruit_jan=1 then 'true' else 'false' end as fruit_jan_flg, case when fruit_feb=1 then 'true' else 'false' end as fruit_feb_flg, case when fruit_mar=1 then 'true' else 'false' end as fruit_mar_flg, case when fruit_apr=1 then 'true' else 'false' end as fruit_apr_flg, case when fruit_may=1 then 'true' else 'false' end as fruit_may_flg, case when fruit_jun=1 then 'true' else 'false' end as fruit_jun_flg, case when fruit_jul=1 then 'true' else 'false' end as fruit_jul_flg, case when fruit_aug=1 then 'true' else 'false' end as fruit_aug_flg, case when fruit_sep=1 then 'true' else 'false' end as fruit_sep_flg, case when fruit_oct=1 then 'true' else 'false' end as fruit_oct_flg, case when fruit_nov=1 then 'true' else 'false' end as fruit_nov_flg, case when fruit_dec=1 then 'true' else 'false' end as fruit_dec_flg FROM dbo.accession a left outer join accession_notes an on (a.accession_number = an.accession_number) left outer join collection_site cs on (a.accession_number = cs.accession_number) left outer join dbo.accession_attr aa on (a.accession_number = aa.accession_number) left outer join lexicon l on (a.lexicon_id = l.lexicon_id) where a.accession_number <> '00.0000' order by a.accession_number
Notes
- Exclude from requestedBy:
- urn:cspace:botgarden.cspace.berkeley.edu:personauthorities:name(person):item:name(0)'Owner, Database'
- Exclude from identBy:Â
- urn:cspace:botgarden.cspace.berkeley.edu:personauthorities:name(person):item:name(0)'Owner, Database'
- urn:cspace:botgarden.cspace.berkeley.edu:null:item:name(null)'null'
Talend/ETL
ucbgaccessions 0.5 in Chris's Talend repository, botgarden
Run Steps
- Merge into accessions the fragments for:
- determinations
- collection sites
- accession use
- Change "schema2" and "schema3" to "schema"
- Import
MERGE: Taxonomic Determinations and Hybrid Parents (repeating group) - SQL ready to test (Chris), Talend dev-sample data ready (Chris)
Field mapping
SQL
-- for all accessions: get plant_identity records for non-hybrids, one row per plant_identity record if no hybrid parent; multiple rows per accession, one with accepted_flag=1 SELECT rtrim(accession_number) objectnumber, lexicon_id, case when accepted_flag = 0 then 'false' else 'true' end as accepted_flg, case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, 'false' hybrid_flg, date_entered, case when id_made_date = '3000-01-01 00:00:00' then null else id_made_date end as id_made_date, case when id_made_by > 0 then id_made_by end as id_made_by, id_reference, id_modifier, id_remarks, null parent_id, null parent_qualifier, null fhp_lexicon_id, null mhp_lexicon_id, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(year, id_made_date) end as id_year, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(month, id_made_date) end as id_month, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(day, id_made_date) end as id_day FROM dbo.plant_identity where fhp_lexicon_id = -5 and fhp_lexicon_id = -5 and accession_number <> '00.0000' union -- for all accessions: get plant_identity records for hybrids, usually two rows per plant_identity record if any hybrid parent; multiple rows/pairs per accession with one pair having accepted flag=1. SELECT rtrim(accession_number) objectnumber, case when lexicon_id = -5 then null else lexicon_id end as lexicon_id, case when accepted_flag = 0 then 'false' else 'true' end as accepted_flg, case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, 'true' hybrid_flg, date_entered, case when id_made_date = '3000-01-01 00:00:00' then null else id_made_date end as id_made_date, case when id_made_by > 0 then id_made_by end as id_made_by, id_reference, id_modifier, id_remarks, fhp_lexicon_id parent_id, 'female' parent_qualifier, fhp_lexicon_id, mhp_lexicon_id, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(year, id_made_date) end as id_year, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(month, id_made_date) end as id_month, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(day, id_made_date) end as id_day FROM dbo.plant_identity where fhp_lexicon_id <> -5 union -- get male hybrid parents from plant_identity SELECT rtrim(accession_number) objectnumber, case when lexicon_id = -5 then null else lexicon_id end as lexicon_id, case when accepted_flag = 0 then 'false' else 'true' end as accepted_flg, case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, 'true' hybrid_flg, date_entered, case when id_made_date = '3000-01-01 00:00:00' then null else id_made_date end as id_made_date, case when id_made_by > 0 then id_made_by end as id_made_by, id_reference, id_modifier, id_remarks, mhp_lexicon_id parent_id, 'male' parent_qualifier, fhp_lexicon_id, mhp_lexicon_id, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(year, id_made_date) end as id_year, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(month, id_made_date) end as id_month, case when id_made_date = '3000-01-01 00:00:00' then null else datepart(day, id_made_date) end as id_day FROM dbo.plant_identity where mhp_lexicon_id <> -5 union -- for the 414 where there is no accepted record in plant_identity, get the accession record for non-hybrids, one row per accession record; set accepted flag=1 SELECT rtrim(accession_number) objectnumber, lexicon_id, 'true' accepted_flg, -- assume these are accepted determinations though no record in plant_identity case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, case when hybrid_flag = 0 then 'false' else 'true' end as hybrid_flg, date_entered, -- this is accession.date_entered but probably is correct for this set null id_made_date, null id_made_by, null id_reference, null id_modifier, 'determination from SAGE accession record' id_remarks, null parent_id, null parent_qualifier, null fhp_lexicon_id, null mhp_lexicon_id, null id_year, null id_month, null id_day FROM dbo.accession where fhp_lexicon_id = -5 and fhp_lexicon_id = -5 and accession_number in ( select a.accession_number from accession a left outer join plant_identity p on (a.accession_number = p.accession_number and p.accepted_flag =1) where p.accession_number is null ) union -- for the 414 where there is no accepted record in plant_identity, get the accession record for hybrids, usually two rows per accession record; set accepted flag=1 SELECT rtrim(accession_number) objectnumber, lexicon_id, 'true' accepted_flg, -- assume these are accepted determinations though no record in plant_identity case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, case when hybrid_flag = 0 then 'false' else 'true' end as hybrid_flg, date_entered, -- this is accession.date_entered but probably is correct for this set null id_made_date, null id_made_by, null id_reference, null id_modifier, 'determination from SAGE accession record' id_remarks, fhp_lexicon_id parent_id, 'female' parent_qualifier, fhp_lexicon_id, mhp_lexicon_id, null id_year, null id_month, null id_day FROM dbo.accession where fhp_lexicon_id <> -5 and accession_number in ( select a.accession_number from accession a left outer join plant_identity p on (a.accession_number = p.accession_number and p.accepted_flag =1) where p.accession_number is null ) union -- get male hybrid parents for the query above SELECT rtrim(accession_number) objectnumber, lexicon_id, 'true' accepted_flg, -- assume these are accepted determinations though no record in plant_identity case when aff_lexicon_id = -5 then null else aff_lexicon_id end as aff_lexicon, case when hybrid_flag = 0 then 'false' else 'true' end as hybrid_flg, date_entered, -- this is accession.date_entered but probably is correct for this set null id_made_date, null id_made_by, null id_reference, null id_modifier, 'determination from SAGE accession record' id_remarks, mhp_lexicon_id parent_id, 'male' parent_qualifier, fhp_lexicon_id, mhp_lexicon_id, null id_year, null id_month, null id_day FROM dbo.accession where mhp_lexicon_id <> -5 and accession_number in ( select a.accession_number from accession a left outer join plant_identity p on (a.accession_number = p.accession_number and p.accepted_flag =1) where p.accession_number is null ) order by objectnumber, accepted_flg desc, date_entered desc, parent_qualifier
Notes
So this is now determinations plus hybrids in one big denormalized set. Psuedo-SQL:
1 for all accessions (including the 418 that have no accepted record in plant_identity): get plant_identity records where FHP = -5 and MHP = -5 (one row per plant_identity record if no hybrid parent; multiple rows per accession, usually one with accepted_flag=1; order by accession_number, accepted_flag desc)
union
2 for all accessions (including the 418 that have no accepted record in plant_identity): get plant_identity records where FHP <> -5 or MHP <> -5 (usually two rows per plant_identity record if any hybrid parent; multiple rows/pairs per accession with one pair having accepted flag =1. Order by accession_number, accepted_flag desc, parent_type female before male)
union
3. for the 418 accessions that have no accepted record in plant_identity, get the accession record where FHP = -5 and MHP = -5 (one row per accession record if no hybrid parent; set accepted flag=1? or not; maybe indicate this is a record from accession table; no accepted record in plant_identity)
union
4. for the 418 accessions that have no accepted record in plant_identity, get the accession records where FHP <> -5 or MHP <> -5 (usually two rows per accession record if any hybrid parent; order by accession_number, parent_type female before male; set accepted flag=1)
Talend/ETL
Chris ucbgdeterminations 0.5 in Chris's Talend repository botgarden
MERGE: Collection Sites (not repeating but easier to extract separately) - SQL dev (Chris), Talend dev-sample data ready (Chris)
Including significant cleanup of geography data from SAGE.
Field mapping
These are the rules we are using for cleaning up the geographic data and collection sites in SAGE:
For accessions from years 2000 and later:
- If Collection Site = 'Earth' or Hort, and Accession Notes has parentheses
- Populate the new Taxonomic Range field (in Collection Object, Locality Info) with the Accession Notes info in parentheses (Taxonomic Range data come from accession table, parenthetical text in accession.accession_notes)
- Field collection place is blank
- Field collection county, Field collection state, Field collection country are blank
- If Collection Site = some known non-Earth, non-Hort location
- Taxonomic range information is blank
- Field collection place gets the Collection Site value (which will be controlled by the Place Authority) (Field Collection Place data come from collection_site table, via geog_lexicon.geog_term)
- Field collection county, state, and country will be blank in the cataloging record but will be available via the Place Authority by importing hierarchical Place Authority records.
For accessions from years 1999 and earlier:
- If any of the county, state, country fields has parentheses, use data from
- Taxonomic Range data come from Original Accession Geography, the accession_loc table (accession_loc.country_id, accession_loc.state_id, accession_loc.county_id). Strip parentheses, concatenate, and put into the new Taxonomic range field.
- Field collection place is blank
- Field collection county, Field collection state, Field collection country are blank
- If all three do not have parentheses, and collection site is not Earth or Hort., use data from Collection Site
- Taxonomic range field is blank
- Field Collection Place data come from collection_site table (geog_lexicon.geog_term) Collection Site
- Field collection county, state, and country will be blank in the cataloging record but will be available via the Place Authority by importing hierarchical Place Authority records.
- If all three do not have parentheses; collection site = Earth or Hort.; and there are some legitimate non-range values in country, state, or county
- Taxonomic range field is blank
- Field collection place is blank
- Field collection county, Field collection state, Field collection country get the values from Original Accession Geography (accession_loc)
Note: If collection site = 'hort.' or if country = 'HORT.' then the cultivated checkbox should be checked, and provenance drop down should be "not wild source".
SQL
SELECT rtrim(cs.accession_number) objectnumber, case when lat_deg is not null then convert(varchar,convert(int, lat_deg))||' '||convert(varchar,convert(int, lat_min))||' '||convert(varchar,convert(int, lat_sec))||lat_dir end as vlatitude, case when long_deg is not null then convert(varchar,convert(int, long_deg))||' '||convert(varchar,convert(int, long_min))||' '||convert(varchar,convert(int, long_sec))||long_dir end as vlongitude, latitude declatitude, longitude declongitude, case when UTM_north is not null then convert(varchar,UTM_east)||' mE, '||convert(varchar,UTM_north)||' mN' end as vcoords, elevation velevation, lower_elev minelevation, upper_elev maxelevation, case when lower_elev is not null then 'meters' end as unitselevation, case when geog_detail is not null and geog_distance is null then geog_detail when geog_detail is null and geog_distance is not null then geog_distance when geog_detail is not null and geog_distance is not null then geog_detail||'; '||geog_distance end as vlocation, habitat_descr, veg_type, use_range_flag, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and cs.geog_lex_id <> 0) then cs.geog_lex_id when (cs.geog_lex_id in (2031, 2326)) then 2326 end as geog_lex, case when gl.geog_term = 'hort' then 'hort.' else gl.geog_term end as geog_term, case when (country_name like '%(%' or state_name like '%(%)%' or county_name like '%(%)%') then rtrim(county_name||', '||state_name||', '||country_name) end as taxonomic_range, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and cs.geog_lex_id in (0) and (al.country_id not in (0, 442, 562) or al.state_id <> 0 or al.county_id <> 0)) then country_name end as new_country, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and cs.geog_lex_id in (0) and (al.country_id not in (0, 442, 562) or al.state_id <> 0 or al.county_id <> 0) and al.state_id <> 0) then state_name end as new_state, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and (cs.geog_lex_id < 5953) and (al.country_id not in (0, 442, 562) or al.state_id <> 0 or al.county_id <> 0) and al.county_id not in (0, 14) ) then county_name end as new_county FROM collection_site cs left outer join dbo.accession_loc al on (al.accession_number = cs.accession_number) left outer join country_code cc on (al.country_id = cc.country_id) left outer join state_code sc on (al.state_id = sc.state_id) left outer join county_code c on (al.county_id = c.county_id) left outer join geog_lexicon gl on (cs.geog_lex_id = gl.geog_lex_id) where substring(cs.accession_number, 3, 1) = '.' and cs.accession_number <> '00.0000' union SELECT rtrim(a.accession_number) objectnumber, case when lat_deg is not null then convert(varchar,convert(int, lat_deg))||' '||convert(varchar,convert(int, lat_min))||' '||convert(varchar,convert(int, lat_sec))||lat_dir end as vlatitude, case when long_deg is not null then convert(varchar,convert(int, long_deg))||' '||convert(varchar,convert(int, long_min))||' '||convert(varchar,convert(int, long_sec))||long_dir end as vlongitude, latitude declatitude, longitude declongitude, case when UTM_north is not null then convert(varchar,UTM_east)||' mE, '||convert(varchar,UTM_north)||' mN' end as vcoords, elevation velevation, lower_elev minelevation, upper_elev maxelevation, case when lower_elev is not null then 'meters' end as unitselevation, case when geog_detail is not null and geog_distance is null then geog_detail when geog_detail is null and geog_distance is not null then geog_distance when geog_detail is not null and geog_distance is not null then geog_detail||'; '||geog_distance end as vlocation, habitat_descr, veg_type, use_range_flag, case when (cs.geog_lex_id not in (0, 2031) ) then cs.geog_lex_id when (cs.geog_lex_id = 2031 ) then 2326 end as geog_lex, case when gl.geog_term = 'hort' then 'hort.' else gl.geog_term end as geog_term, case when (cs.geog_lex_id in (0, 2031, 2326) and accession_notes like '%(%)%') then substring( substring( a.accession_notes, (patindex('%(%)%', a.accession_notes)) + 1, char_length(a.accession_notes)), 1, (patindex('%)%', substring( a.accession_notes, (patindex('%(%)%', a.accession_notes)) + 1, char_length(a.accession_notes)))) - 1 ) end as taxonomic_range, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and cs.geog_lex_id =0 and (al.country_id not in (0, 442, 562) or al.state_id <> 0 or al.county_id <> 0)) then country_name end as new_country, case when (country_name not like '%(%' and state_name not like '%(%)%' and county_name not like '%(%)%' and cs.geog_lex_id =0 and (al.country_id not in (0, 442, 562) or al.state_id <> 0 or al.county_id <> 0) and al.state_id <> 0) then state_name end as new_state, case when (county_name not like '%(%)%' and county_name is not null and al.county_id <> 0 and cs.geog_lex_id < 5953) then county_name end as new_county FROM dbo.accession a left outer join accession_loc al on (a.accession_number = al.accession_number) left outer join collection_site cs on (a.accession_number = cs.accession_number) left outer join geog_lexicon gl on (cs.geog_lex_id = gl.geog_lex_id) left outer join country_code cc on (al.country_id = cc.country_id) left outer join state_code sc on (al.state_id = sc.state_id) left outer join county_code c on (al.county_id = c.county_id) where substring(a.accession_number, 5, 1) = '.' order by objectnumber
Notes
Sites can have hort and range..
County, state, country only used in a small handful of cases where they provided useful information not in collection site or taxonomic range. Concatenated onto verbatim locality field.
Talend/ETL
ucbgcollectionsite 0.6 in Chris's Talend repository botgarden
MERGE: Use of Accessions (repeating group) - SQL ready (Chris), Talend dev-sample data ready (Chris)
Research projects and classes
Field mapping
SQL
SELECT rtrim(accession_number) accession_number, 'research' usetype, 'researchproject'||convert(varchar(20), rb.project_id) conceptShortID, rp.project_name conceptName, request_date, filled_date, material_type, notes FROM dbo.research_plant rb join research_project rp on (rp.project_id = rb.project_id) union SELECT rtrim(accession_number) accession_number, 'class' usetype, 'class'||convert(varchar(20), cp.course_id) conceptShortID, cu.course_title conceptName, cp.last_change_date request_date, null filled_date, null material_type, concept notes FROM dbo.class_plant cp join class_use cu on (cp.course_id=cu.course_id) order by accession_number, request_date
Notes
URN format for materialType: <materialType>urn:cspace:botgarden.cspace.berkeley.edu:vocabularies:name(materialtype):item:name(bulb)'bulb'</materialType>
Talend/ETL
ucbgaccessionuse 0.1 in Chris's Talend repository botgarden
Procedures
Plant Locations (LMI procedure, including current location and history/version) and Relationships to Objects - SQL ready (Chris), Talend (Yuteh)
See attached accessions map, current_location and accession_history tables
Field mapping
SQL
Initially, this SQL just gets the current location records, plant_location table.
SELECT null record_id, rtrim(accession_number) accession_number, 'plant_location' datasource, action_date, case when action_code = 0 then 'Dead' when action_code = 1 then 'Planted Out' when action_code = 2 then 'Moved' when action_code = 3 then 'Label Request' when action_code = 4 then 'Theft' when action_code = 5 then 'Other' when action_code = 6 then 'Revived' end as reasonForMove, null previous_location, 'garden'||convert(varchar(5), garden_loc_id) current_location, case when label_req_flag = 1 then 'true' else 'false' end as label_request, label_size, stand_type, label_count, comments, entered_staff_id, date_entered, last_change_staff_id, last_change_date FROM dbo.plant_location where accession_number <> '00.0000' order by accession_number, datasource desc, action_date desc
Notes
Accession History records are being loaded as versioned LMI records. We still need to know how to do that.
Talend/ETL
Uses output from ucbgaccessions Talend job that contains CSIDs and Object Numbers. Rename most recent output file to ucbg-accessions-ids.txt
Also uses a file sage_createupdateby_lookup.txt that contains a few user email addresses to decode the staff ID for the createdBy and updateBy info
ucbgplantlocations 0.1
Run steps
- Movements: Change "schema2" to "schema"
- Movements: Import
- Move-to-object relationships: Import
- Object-to-move relationships: Import
Vouchers and Voucher-Object relationships - SQL ready (Chris), Talend dev (Lam)
Field mapping
Using Loan Out for Voucher
SQL
SELECT voucher_id, -- loanOutNumber accession_number, -- for relationship to Accession voucher_number voucherInstitution, -- get refname for borrower case when h_w_code='h' then 'Horticultural' when h_w_code='w' then 'Wild' end as hortWild, -- decode to h Horticultural or w Wild vouchered_date, case when voucher_early_date>2415021 and voucher_early_date<2556233 and vouchered_date not like ('%.%') then dateadd(dd,(voucher_early_date - 2415021), convert(date, '1900-01-01', 103)) end as voucherDate, -- loanOutDate voucher_early_date, -- ?? voucher_late_date, -- ?? case when vouchered_by <> 0 then vouchered_by end as vouchered_by, case when garden_loc_id <> 0 then garden_loc_id end as garden_loc_id, -- get refname for garden loc label_count numSheets, -- numLent smasch_flag, -- skip case when flowering_flag=1 then 'true' else 'false' end as flowering, -- case 1=true case when fruiting_flag=1 then 'true' else 'false' end as fruiting, -- case 1=true case when sterile_flag=1 then 'true' else 'false' end as sterile, -- case 1=true entered_staff_id createdBy, date_entered createdAt, last_change_staff_id updatedBy, last_change_date updatedAt FROM dbo.voucher
Notes
Still need to get UCJEPS accession number for some number of vouchers
- Uses ID output from job that creates garden locations vocabulary. Make sure file is renamed to ucbg-gardenloc-ids.txt
- Uses ID output from job that creates accessions. Make sure that file is renamed to ucbg-accessions-ids.txt
Talend/ETL
ucbgvoucher 0.1
Run steps
- Vouchers: Change "schema2" and "schema3" to "schema"
- Vouchers: Import
- Voucher-to-object relationships: Import
- Object-to-voucher relationships: Import
Propagation SQL ready (Chris), REST Import development (John)
Field mapping
SQL
Control lists:
- Pot Type: select pot_type from pot_type_code order by pot_type;
- Pot Size: select pot_size from pot_size_code order by pot_size;
- Treatment Type: select treatment_type_name from treatment_type order by treatment_type_name;
Notes
See Botanical propagation data analysis
Prop Type:
- lookup table: prop_type_code
- propagation.prop_type = pro_type_code.prop_type
- 21 values in prop_type_code
Reason for Prop: propagation.purpose
- database has a rule that limits the values to: 'class use', 'research', 'garden collection',
'distribution', 'conservation' - 5 distinct values in propagation.purpose
Extra Seeds: propagation.extra_seed_flag
- boolean (0,1)
Activity Type:
- lookup table: activity_type_code
- activity_type_code.activity_type where growth_history.activity_type_id = activity_type_code.activity_type_id
- 24 values in activity_type_code
Medium:
- lookup table: medium
- growth_history.medium_name = medium.medium_name
- 1302 values in medium
Pot Type:d
- lookup table: pot_type
- growth_history.pot_type = pot_type_code.pot_type
- 35 values in pot_type_code
Pot Size:
- lookup table: pot_size_code
- pot_size_code.pot_size where growth_history.pot_size_id = pot_size_code.pot_size_id
- 69 values in pot_size_code
Treatment Type:
- lookup table: treatment_type
- treatment_type.treatment_type_name where treatment.treatment_type_id = treatment_type.treatment_type_id
- 91 values in treatment_type
Concentration: treatment.concentration
- varchar(10) field
- no rules or constraints on this field
- 80 distinct values in treatment.concentration
Talend/ETL
MERGE: Propagation Activities (repeating group) SQL ready (Chris), REST import (John)
Field mapping
SQL
Notes
See Botanical propagation data analysis
Relationships between Accessions and Propagations; REST import (John)
Field mapping
SQL
Notes
Talend/ETL
Talend/ETL
Distributions and Distribution-Accession relationships (using Object Exit procedure) - SQL dev (Chris), Talend dev (Chris), Data botgarden-dev loaded (Chris)
Field mapping
SQL
SELECT rtrim(accession_number) accession_number, -- for relations e.org_id, -- Exchange Org, get refname o.org_name, convert(date, exchange_date) exchangeDate, -- for display date exchange_date, -- for scalar dates datepart(year, exchange_date) as exchange_date_year, datepart(month, exchange_date) as exchange_date_month, datepart(day, exchange_date) as exchange_date_day, plant_count, e.entered_staff_id, e.date_entered, exchange_comments FROM dbo.exchange e, organization o where e.org_id=o.org_id
Notes
- Uses ID output from job that creates accessions. Make sure that file is renamed to ucbg-accessions-ids.txt
Talend/ETL
ucbgdistributions 0.1
Run steps
- Distributions: Change "schema2" to "schema"
- Distributions: Import
- Distribution-to-object relationships: Import
- Object-to-distribution relationships: Import
Pot Tags (Volunteer Labels) - Lam (SQL), Chris (Talend)
Field mapping
SQL
select vl.label_id as legacyId, vl.accession_number as accessionNumber, vl.formatted_taxon as taxonName, vl.common_name as commonName, vl.label_data as labelData, vl.family as family, vl.country_name as locale, vl.quantity as numberOfLabels, case when vl.label_req_flag = 0 then 'no' when vl.label_req_flag = 1 then 'yes' end as printLabels from volunteer_label vl
Notes
Talend/ETL
ucbgpottags 0.1 creates pot tag records and relationship records to collection objects