Natural science taxonomy controlled vocabulary management

Needed by whom and when

Basic statements about who needs the functionality and by when.

UCJEPS, UC Botanical Garden, and other Natural History collections: Significant functionality needed day one.

Overview

Note: Findings from earlier discussions on this topic are recorded in various places.

Note: Specimen identification user stories are recorded elsewhere.

The scientific taxonomy used by natural history collections is a complex hierarchical controlled vocabulary.  A collection management system needs sufficient capabilities for managing the controlled vocabulary but needs to know that it probably can not do everything that a comprehensive scientific taxonomy management system would want to do.  In most cases, the taxonomy is not poly-hierarchical. 

However, for plant collections, living or otherwise, hybrids have two or more parents.  In some cases, this is more a matter for specimen identification.  That is, a specimen can have two current identifications, and the relationship between those identifications is "hybrid".  However, there are some hybrids that are so accepted that one might want to put the entry into the controlled vocabulary and give it a name.  These are called named hybrids.  Question: What are best practices for named hybrids?  See Plant hybrid names for more information.

There is a lot of discussion in the biodiversity informatics domain about the proper way to model entities and relationships.  For an overview, see for example http://www.gbif.org/informatics/name-services/background/

It is fair to say that standards groups are doing a lot of important work that will facilitate data sharing but that use of these standards is still evolving.  For instance, TDWG (http://tdwg.org) makes the important distinction between a scientific name (e.g., "Aus Bus" as a mock genus-species name) and the scientific usage of that name by particular scientists (e.g., "Aus Bus Linnaeus" vs "Aus Bus Hoffman").  In practice, many systems have the taxonomic controlled vocabularies where the scientific name does have a citation, but because the controlled vocabulary is the authority for that particular scientific collection, it might not record the different uses of that same name.

  • Question: Does UC Botanical Garden allow the same name to be used multiple times with different scientific authors recorded?
  • Question: How do we handle differences between taxonomic trees used in different natural history collections?
  • Question: What are best practices related to higher taxonomy?  That is, some systems do not treat higher levels of taxonomy with rigor because the scientific users are usually most interested in the lower levels (genus and species).  However, when data are shared, higher levels are required in order to disambiguate terms used across kingdoms (for instance) or when collections from multiple domains are aggregated.  Is it important to have a true hierarchical set of relationships from kingdom to sub-species?  Some systems have elected to have a very flat scientific taxonomy system.
  • Note: The user stories below assume that the unique entity in the controlled vocabulary is the combination of name and citation.
  • Note: Natural science taxonomies vary significantly from domain to domain (e.g, from paleontological to botanical collections)  Variations occur for example in the depth of trees and the associated attributes added to terms and even the way that scientific names with author citations are formatted (e.g., see citation formats for botany and zoology and see the botanical usage note from UCJEPS ).
  • Note: For entomological collections such as UCB's Essig Museum, it is common to have a small subset of the collection described at the specimen level.  Instead, the list of taxa represented (essentially the taxonomic vocabulary) in the collection is crucial.  See the "Present in this collection" user story below.  For the Essig, this is what the Species system provides.  (See documentation for the Essig Species system.)
  • Note: Biodiversity research projects increasingly are expecting to have globally unique identifiers for taxonomic terms.  Currently there is no accepted standard, but we should expect that GUIDs will be needed now and in the future.

Common use case: Scientists recognize that one species is really two.  Of the current specimens associated with the old term, some subset will need to be moved to the new term.  Best practice: Create new term based on existing term; perform batch update on selected set of specimens to new taxonomic term.

User stories for definition

Please feel to rewrite these or eliminate completely!  Then move to the prioritized headings below.

  • Natural science taxonomy management: User can add terms to the vocabulary, indicating the parent term. 
    • Some attributes of taxonomic terms are domain-specific (e.g., currency status, author, alternate name forms, remarks, citation).
    • In most UCB collections, only an authorized person will be adding terms, so there is no need to mark them by default as provisional; it would be nice to set status by default to approved rather than provisional.  However, some other collections will require new terms be marked as provisional.
  • Natural science taxonomy management: User can browse the hierarchical taxonomic tree and pivot to associated specimens in the collection
  • Natural science taxonomy management: User can search and navigate scientific taxonomy in a hierarchical taxonomy tree containing terms and relationships.
  • Natural science taxonomy management: User can search the hierarchical taxonomic tree, and then pivot to associated specimens in the collection
  • Natural science taxonomy management: User can search the hierarchical taxonomic tree by synonyms, common names, old names and hybrids, and then pivot to associated specimens in the collection
  • Natural science taxonomy management: User can search the hierarchical taxonomic tree, and then pivot to specimens in the collection identified with that term
  • Natural science taxonomy management: User can see number of specimens in collection that are associated with a taxonomic term.
  • Natural science taxonomy management: User can update attributes on an existing term.
  • Natural science taxonomy management: User with appropriate permissions can rename an existing term under certain circumstances, e.g., if it is a clear misspelling or when an earlier author of the name in question is discovered (see the botanical usage note).
    • Should this be limited to users with some advanced role, or limited to programmers?
    • Should this action be audited?
  • Natural science taxonomy management: User with appropriate permissions can mark a term so that it does not show up as a valid term during subsequent data entry (e.g., during scientific taxonomy identification).
  • Natural science taxonomy management: Present in this collection: User with appropriate permissions can mark a term to indicate that specimens identified to that term do exist in this collection.
  • Natural science taxonomy management: User with appropriate permissions can move an existing term to another location in the taxonomic tree (i.e., can change its parent).  All child terms in the taxonomy vocabulary will be moved with it.  Any identifications or determinations will show the new relationship.
    • Should this action be audited?
  • Natural science taxonomy management: User can identify synonyms for terms.  Note that there are different kinds of synonyms, and presumably other attributes (fields) will be needed by different domains.
    • Nomenclatural synonyms = homotypic synonyms=objective synonyms; synonyms by law.
    • Taxonomic synonyms = heterotypic synonyms=subjective synonym; synonyms by opinion.
  • Natural science taxonomy management: User can identify common names for terms.
  • Natural science taxonomy management: System will create full scientific names with authors in the format that is accepted by that domain (i.e., system must be configurable to produce different formats).
  • Natural science taxonomy management: Administrator can merge in new terms from an established authority.
  • Natural science taxonomy management: Administrator can import data to initialize a taxonomic tree (from a recognized authority or from a set of authority files maintained for the collection).
  • Natural science taxonomy management: Administrator can publish local taxonomic information to other systems using some accepted standard.
  • Natural science taxonomy management: Taxonomic ordering: User can set the order in which terms within a node are displayed.  The desired order in which taxonomies present terms is phylogenetic rather than alphabetical.  However, most systems do display terms alphabetically.  Unless some authority managed the order, it would not get done.
  • Natural science taxonomy management: Administrator can create and view an alternative tree (set of relationships) in order to see what the collection would look like with an alternative controlled vocabulary.  Note, most systems do not allow this.  Instead, this is the kind of research and analysis performed by taxonomists and scientists.
  • Natural science taxonomy management: Possibly out of scope, John Wiezscorek (MVZ) in February 2007 defined some capabilities of a name resolver service:
    • Get all synonyms for name x at same rank as x
    • Get accepted valid name for name x
    • Get all accepted valid names for children of rank y for name x
    • Get accepted valid parent name for name x at rank y
    • Get full accepted valid name hierarchy of name x starting at the same rank as x

Prioritization of user stories

As definitions and priorities are clarified, the user stories above should be moved into relative order below.

Must have for 1.x-MUSEUM (when they go live in system)

Placeholder for required functionality.  As a general rule, functionality that you need and use now should go here or where you have existing data.  However, this is up to the museum.  We will have to balance requirements against resources and timelines.

Placeholder

MUSEUM could wait six to twelve months

What could wait?  These will be re-prioritized at a later date.

Placeholder

MUSEUM would like to have this eventually

These are nice to have but not a near term requirement.

Placeholder