Vocabulary Service Questions

Questions

How do we model the use-of/reference-to a vocabulary item/taxon?

We want to be able to use a vocabulary term (especially for flat vocabularies that are basically enumeration values) in fields of a form (e.g., for the responsible department or person on a Loan). This raises the question of whether, how, and where, we should maintain information that relates the vocabulary to a given entity and/or field. We need such information if we wanted to find all uses of some vocabulary item, across the entire application.

The dumb and brittle approach is to hard code the search to check all the fields where it is used.
Another approach is to model all uses of vocabulary like RDF as a relationship. This makes broad searching easy, but it will likely slow down structured search, and somewhat complicates the model (compared to a direct reference from the entity schema).
Yet another approach is to describe where the vocabulary can be used, allowing for a reflection-like mechanism that search can use to construct field-based search through an entity. This makes the main information modeling simpler, but makes the search model more complex. OTOH, if our users demand a traditional structured search model in which they (can) specify each field to search within, then we'll have to do this anyway.
Where the reference is already within an association (e.g., in a semantic index, or in a Determination service that formally describes a specimen as being of a given taxon), then it would be awkward to impose another level of indirection by associating the vocabulary item with the association, and in turn to the specimen.
OTOH, if we are importing data that is more of the nature of text than associations to vocabularies, and for which the desire is to impose an authority for consistency, then we may want to model the association separately, for several reasons:
1. The process of association may be subject to refinements over time, resulting in a revision of the association.
2. Several algorithms may be used, and may yield varying results that should be modeled independent of one another, and independent of any association made by users.
3. The original text imported should be preserved.

For now, we will assume that fields can be direct references to a vocabulary value, and that we will solve the search problem as part of the general approach to search and reflection. In addition, we will separately consider the problem of indexing a collection (which builds associations between objects and entries in a vocabulary/authority), which has more in common with the Life Science Determination case, than with the case of a structured reference in an object, to some vocabulary.

Can we model rank as an attribute of a Vocabulary Item?

It seems as though rank is a quality of a vocabulary item, however one of the changes a research might make is to move a taxon within a taxonomy. This raises the question of whether that is still the original taxon in a new rank, or whether it is a new taxon. If it is not intrinsic to the taxon/item, then it has to be part of the relation that links it to others, which is kind of messy.

In general, I [Patrick] would argue that a taxon is defined by rank as well as name, publication, etc. There are both species and genus taxa for "bufo", for "buteo", etc., so the rank is distinctive, IMO. This simplifies at least one part of the model.

Where do we configure the association of a given vocabulary to a given schema (and UI form) field?

There has been some discussion recently about how and where to configure the information that XSD cannot reasonable express. However, we will need to make a decision (at least for the near term) for how to bind to a vocabulary namespace as the legal range for a given schema field.

How do we refer to a vocabulary in config files (I [Patrick] think we need some namespace equivalent).
In what config file do we specify previously-configured vocabularies to load into the system?
In what config file do we link from a schema field to one or more vocabularies?

By the same token, we need to let the UI know about that association, so that appropriate UI mechanisms (and App tools?) are available.

In what config file do we link from a UI Form field to the vocabulary(ies) for the associated schema field, or is this automatic in some manner from the App layer?

Finally, we probably need to make clear the class or quality of the vocabulary namespace so the UI can pick the appropriate tools. E.g.,

If the vocabulary is flat (as for a simple enumeration), then a drop-down list is a likely UI mechanism.
If the vocabulary is tree-shaped (as for a taxonomy), then some sort of path-aware type-ahead picker is likely.
If the vocabulary is a faceted ontology, then some variant on the picker may be needed.
If the vocabulary is a graph (e.g., a taxonomy with relations and/or additional context links), then more UI may be required.
If multiple vocabularies are allowed, then some blend will likely be required.
Whether the vocabulary is closed (no new terms allowed), versus open (users can add new terms) will determine associated UI features.