Vocabulary and Authority Overview

General Requirements

The system should allow users to create and maintain vocabularies, authorities, and term lists that:

  • Establish preferred terms
  • Allow for the management of synonyms and related terms, including non-hierarchical "associative relationships" among authorities
  • Are heirarchical
  • Enable term control in object records, procedural records, organizational records, and media support records
  • Comply with ISO and NISO standards for thesauri
  • Allow users to merge duplicate or redundant records
  • Import vocabularies maintained by outside organizations

Vocabularies

From: A Guide to Enhancing Access to Art and Material Culture Information by Elisa Lanzi, revised by Patricia Harpring (2000). Available online through the Getty Research Institute.

A vocabulary is a body of knowledge represented by language. It answers the question - "How do we talk (or write) about this particular subject area?" Glossaries, dictionaries, thesauri, and word lists are all examples of vocabularies. Most vocabularies focus on a special subject area (e.g. a glossary of geographical terms) or audience (e.g., a dictionary for the architecture and construction trades).

Structured vocabularies are collections of words and phrases (called terminology) that are structured to show relationships between terms and concepts. One of the tasks of a structural vocabulary is to allow better retrieval be it in a card catalog or a computerized database. For example, a vocabulary for furniture would show that there is a relationship among the three terms, bookcase, book case, and book-case. In this example, the relationship is quite simple - they are spelling variations for the same concept: a piece of case furniture with shelves for books. These vocabularies may be applied as "controlled vocabularies," where a given term (such as the "descriptor" or "preferred term") is used consistently to represent a given concept.

Why do we need vocabularies? It is because language is ever-changing, nuanced, and complex. These very characteristics that make language so wonderfully expressive can cause ambiguity and confusion in documentation, and ultimately, hamper access to materials in databases. Here are a few examples of how language can cause confusion:

  • National and regional differences
    A particular type of a rectangular, gable-roofed barn is called a Connecticut Barn in the United States. The same type of barn is called an English Barn in Great Britain.
  • Historical and contemporary names
    The nation that is today called Iran was, before 1935, called Persia.
  • Indigenous vs. culturally inappropriate terms
    Both terms KhoiKhoi and Hottentot have been used to refer to a group of people in Southern Africa. In early 20th century Western texts, these people were called Hottentot. Today, KhoiKhoi is preferred and Hottentot is now considered culturally inappropriate.
  • Linguistic differences
    The Italian artist, Tiziano, is called Titian in English and Titien in French.

Structured vocabularies are especially designed to identify and make these connections among terms by managing synonyms and disambiguating homographs, resulting in improved results for the database searcher. In this way, the terms in a vocabulary serve as a knowledge base for the materials in the database. Vocabularies are most effective when used together with other standards, especially data structure and data content standards.

Authorities

From: A Guide to Enhancing Access to Art and Material Culture Information by Elisa Lanzi, revised by Patricia Harpring (2000). Available online through the Getty Research Institute.

The role of authority work

Authority work, in which terms and names are verified and validated, is a critical part of documentation practice. The concept originated in the library cataloging domain in the days of manual card catalogs and indexes when strict consistency was necessary for minimal access. Today authority work has extended to other information management communities and its processes and procedures have benefited greatly from computerization. The development and application of standard controlled vocabularies is an significant outcome of authority work.

Authority work is defined by the following characteristics:

  • Authority files are compilations of authorized terms or headings used by a single organization or consortium in cataloging, indexing, or documentation. Authority files are strictly maintained as terms are applied and often include associated information about the term or subject heading. This associated information can include: synonyms (e.g., "see references"), related or associated terms (e.g., "see also references"), and original sources (e.g., a note that the term was found in a particular dictionary).
  • Authority control is a system of procedures that maintains consistent information in database records. This procedure includes the recording of terms and the validation of terms using the authority files. The purpose of authority control is to ensure that the database searcher can collocate like material and relate it to others in the database. Today authority control is important in the online environment for making searching easier for users and improving precision in searching.
  • An authority file is a controlled vocabulary, but not all controlled vocabularies are authority files. This is because the main purpose of an authority file is to regulate usage in a particular database. In fact, you will find that some authority files use multiple structured vocabularies as a source for their files. For example, a historical society may use both AAT and LCSH as a source for terms in their institution's subject authority files. Most authority files also include "local terms," originating from the institution itself.
  • Authority files are an integral part of most automated information systems but you will find differing levels of implementation depending on the system. One of the most useful implementations is when the authority file is available as a resource for catalogers and is interactive in the search interface to assist users as they query the database.
  • Authority work procedures may be automated, but the intellectual processes needed to create quality authority files are still best accomplished by humans. This work may include: verification of the proposed term or name in authoritative sources, such as dictionaries, monographs, or (if relevant) historical sources; research of synonyms, such as variant spellings; establishing relationships to other terms/names in the authority file; and creation of an authority record to be added to the database. Authority work at the local level is often expensive and time-consuming and as data sharing becomes more prevalent, shared authority files are being explored.

Standards, Guidelines, Use Cases

CollectionsTrust Terminology Bank

OCLC Terminology Services

Useful presentation from Murtha Baca of the Getty Research Institute

Use Cases and Community Design Workshop Notes

See also the closely related Natural Science Taxonomy Use Cases

Workflows

Hierarchical Authorities - BT NT Workflows

Fields

Vocabulary Management Schema

Term List Management Schema