Vocabulary Service Questions

Questions

How do we model the use-of/reference-to a vocabulary item/taxon?

We want to be able to use a vocabulary term (especially for flat vocabularies that are basically enumeration values) in fields of a form (e.g., for the responsible department or person on a Loan). This raises the question of whether, how, and where, we should maintain information that relates the vocabulary to a given entity and/or field. We need such information if we wanted to find all uses of some vocabulary item, across the entire application.

  • The dumb and brittle approach is to hard code the search to check all the fields where it is used.
  • Another approach is to model all uses of vocabulary like RDF as a relationship. This makes broad searching easy, but it will likely slow down structured search, and somewhat complicates the model (compared to a direct reference from the entity schema).
  • Yet another approach is to describe where the vocabulary can be used, allowing for a reflection-like mechanism that search can use to construct field-based search through an entity. This makes the main information modeling simpler, but makes the search model more complex. OTOH, if our users demand a traditional structured search model in which they (can) specify each field to search within, then we'll have to do this anyway.
  • Where the reference is already within an association (e.g., in a semantic index, or in a Determination service that formally describes a specimen as being of a given taxon), then it would be awkward to impose another level of indirection by associating the vocabulary item with the association, and in turn to the specimen.
  • OTOH, if we are importing data that is more of the nature of text than associations to vocabularies, and for which the desire is to impose an authority for consistency, then we may want to model the association separately, for several reasons:
    1. The process of association may be subject to refinements over time, resulting in a revision of the association.
    2. Several algorithms may be used, and may yield varying results that should be modeled independent of one another, and independent of any association made by users.
    3. The original text imported should be preserved.

For now, we will assume that fields can be direct references to a vocabulary value, and that we will solve the search problem as part of the general approach to search and reflection. In addition, we will separately consider the problem of indexing a collection (which builds associations between objects and entries in a vocabulary/authority), which has more in common with the Life Science Determination case, than with the case of a structured reference in an object, to some vocabulary.

Can we model rank as an attribute of a Vocabulary Item?

It seems as though rank is a quality of a vocabulary item, however one of the changes a research might make is to move a taxon within a taxonomy. This raises the question of whether that is still the original taxon in a new rank, or whether it is a new taxon. If it is not intrinsic to the taxon/item, then it has to be part of the relation that links it to others, which is kind of messy.

  • In general, I [Patrick] would argue that a taxon is defined by rank as well as name, publication, etc. There are both species and genus taxa for "bufo", for "buteo", etc., so the rank is distinctive, IMO. This simplifies at least one part of the model.

Where do we configure the association of a given vocabulary to a given schema (and UI form) field?

There has been some discussion recently about how and where to configure the information that XSD cannot reasonable express. However, we will need to make a decision (at least for the near term) for how to bind to a vocabulary namespace as the legal range for a given schema field.

  • How do we refer to a vocabulary in config files (I [Patrick] think we need some namespace equivalent).
  • In what config file do we specify previously-configured vocabularies to load into the system?
  • In what config file do we link from a schema field to one or more vocabularies?

By the same token, we need to let the UI know about that association, so that appropriate UI mechanisms (and App tools?) are available.

  • In what config file do we link from a UI Form field to the vocabulary(ies) for the associated schema field, or is this automatic in some manner from the App layer?

Finally, we probably need to make clear the class or quality of the vocabulary namespace so the UI can pick the appropriate tools. E.g.,

  • If the vocabulary is flat (as for a simple enumeration), then a drop-down list is a likely UI mechanism.
  • If the vocabulary is tree-shaped (as for a taxonomy), then some sort of path-aware type-ahead picker is likely.
  • If the vocabulary is a faceted ontology, then some variant on the picker may be needed.
  • If the vocabulary is a graph (e.g., a taxonomy with relations and/or additional context links), then more UI may be required.
  • If multiple vocabularies are allowed, then some blend will likely be required.
  • Whether the vocabulary is closed (no new terms allowed), versus open (users can add new terms) will determine associated UI features.

How will we model and implement the services that support vocabulary functionality?

We can define the basic schemas around a vocabulary item, and around the relations among items (e.g, parent-child). However, we want to have both a Vocabulary service that supports generic vocabularies (like Subject or Concept vocabularies, enumerations, etc.), as well as more specialized services like Location, Person, Organization, etc. that have additional semantics:

  • Additional methods that provide specific semantics (e.g., "get closest location given a lat-long pair").
  • Extensions to the VocabularyItem to model additional information (e.g., lat-long, altitude or height, origin, taxonomic rank, etc.).

One approach to the modeling is to consider the specialized services as sub-classes or extensions of the Vocabulary service. They can then implement all the Vocabulary methods, and then pass them through to a superclass in the implementation.

Another similar approach is to model the vocabulary schema and methods like an interface definition. We would then have a service that is a base implementation for the simple cases, and then all the variants would support the interface methods and schema as part of their definition (and we can work out the implementation details on the back end).

A related issue is the management of namespaces in the vocabularies. We will need to validate a vocab reference, and so need some sort of registry that can map the pattern of a URI to a vocabulary and so to a handler that validates the actual reference. If we have to have such a registry with handlers, then we might consider having all vocabulary functionality go through a central proxy that dispatches to the different implementations. This basically inverts the interface model. This might be somewhat simpler for clients, but it introduces additional issues:

  • If we ask for info about a vocabulary item from a proxy, what is the contract-schema for this call? Is it just the basic vocabulary item information, or does it include the extended schema?
    • If the former, then the client has to make a different call to get the additional information.
    • If the latter, then the contract is effectively variable, which is kind of funky.
  • If we instead make the calls on the individual services, how do we model the notion of a handler service in our REST/SOA world, so that a client can get the handler for a VocabularyItem reference and then operate on it?

Should vocabularies define not just allowed relations, but also required relations?

For cases like life-science taxonomy, we may want to preclude orphan VocabularyItems, and so require parent or child linkage. Perhaps this is something that the individual services should implement as part of the their semantics, rather than trying to generalize this. For now, we will not rty to mode this generally, and we can see if some common themes emerge from experience.