User Manual: CSV Importer: Formatting Data

Data must be properly formatted before using the CSV Importer. The tool will provide error messages for improperly formatted data, but it’s best to do as much cleanup as possible before getting to the tool. In the templates generated from GitHub, the header rows hold the information needed to determine what should/can go in each field.

Data Types: Strings, Integers, Floats, Dates and Structured Dates

The data type of your field is noted in Row 3 of the GitHub-generated import template.

  • Strings are any combination of alphanumeric characters. 

    • Example: a, b, c, 1, 2, 3

  • Integers are whole numbers

    • Example: 1, 2, 3 

  • Floats are numbers that may include decimal points

    • Example: 1.2, 3.456

    • CollectionSpace does not currently support fractions in float fields (e.g. 1 ½).

  • Booleans are boxes that may be checked or unchecked

    • If you would like the box to be checked, enter TRUE into the field

    • If you would like the box to be unchecked, you may enter FALSE or leave the field blank

  • Dates are single calendar dates

    • For example: 01/02/2020

  • Structured dates are groups of fields in CollectionSpace that provide for fuzzy, ranged, and other non-single dates. You can import the whole date value as a string. If possible, it will be parsed into the more specific date elements (dateEarliestYear, dateEarliestMonth, datePeriod, dateLatestQualifierValue, etc.) that are viewable in the box that pops up when you click on a structured date in the record.

    • If you have more granular date data (separate day, month, year values), you cannot import them directly into the more specific structured date fields. You will need to combine them into a full date for import. 

    • Regardless of whether your date string can be parsed into the more specific structured date elements, the string you import will remain in the “dateDisplayDate” and will be visible in the record.

    • Example: 01/02/2020, ca. 1920, 1950-1980, early 20th century

Controlled terms: Value Source Types and Value Sources: Optionlist, Vocabulary, and Authority Fields

The Value Source Type and Value Source for your field are noted in Rows 6 & 7 of the GitHub-generated import template. If the Value Source Type and Value Source fields have values in them other than N/A, the data in the field comes from a controlled source such as a pick list or authority.

  • Option lists are static pick lists in CollectionSpace - those that cannot be edited via the UI. For fields populated by an option list, you must use the values exactly as they appear in Row 7 of the GitHub-generated template, Value Source. For example, in Object Cataloging - Common, the collection value “permanent-collection” will import as expected. The following would not: “permanent collection” or “Permanent-collection”.

  • Vocabularies are dynamic pick lists in Collection - those that can be edited via the UI. For fields populated by a vocabulary, you must use the values exactly as they appear in the “Name” field of the vocabulary. This includes capitalization, spacing, and any included punctuation. (Variations will be imported and will display, but they will not be matched up with/reconciled against the vocabulary, introducing inconsistency in your data).

    • Tip: As every museum may have different terms in dynamic lists, these values are not included in the templates on GitHub. You may wish to add these values to your locally saved version of the template.

    • Critical Tip: You must create new terms in a given vocabulary before importing other records that use those terms. If you use a vocabulary term in a record, and that vocabulary term does not exist in the vocabulary, the CSV Importer will not import the record that uses the missing term. The Processing step will prepare a report of all unique missing terms in a given batch. This report includes the term and what authority or vocabulary the term must be added to.

  • Authority fields pull data from one or more authorities. For fields populated by an authority , the CSV value should exactly match the Display value field of the term in the appropriate authority. If there is a mismatch, the term used in your record will not be connected to the authority as expected and a new term will be created.

    • Tip: If a field is linked to more than one authority, the GitHub-generated template will include multiple columns for that field; for example, acquisitionSourcePerson and acquisitionSourceOrganization.

    • Critical Tip: You must create new terms in a given authority before importing other records that use those terms. If you use an authority term in a record, and that authority term does not exist in the authority, the CSV Importer will not import the record that uses the missing term. The Processing step will prepare a report of all unique missing terms in a given batch. This report includes the term and what authority or vocabulary the term must be added to.

  • Fields with N/A in the value source type can be formatted according to your local practice.

Multi-valued or Repeatable Fields

Whether or not a field is repeatable is noted in Row 4 of the GitHub-generated import template.

  • Values for multi-valued fields should be separated with a pipe: |

    • To enter the pipe | character, select shift and the key directly under your delete/backspace key

    • Example: Eleanor Brown|Daphne Brown

  • For multi-valued fields with multi-valued sub-fields, such as the Title block in Object Cataloging, non-repeating sub fields should be separated with a |, repeating sub-fields should be separated with a | for each top-level instance and ^^ for each repeating value of the subfield (sample below). The ^ is above the 6 key on your keyboard.

    • Tip: Still not sure how to format? Create a dummy record with data in the fields you’d like to import in your CSpace instance or on one of our demo sites and Export those fields to see the correct formatting.

  • Critical tip: If the data you’d like to update includes multi-valued fields, all existing values will be replaced by the data you import. To make sure you don’t delete existing values by mistake, we suggest first exporting the records you’d like to update via the Search Results Export feature, and then editing that CSV vs. creating one from scratch.

    • For example, if your existing record has colors Red and Blue, and in your spreadsheet you enter the value Green into the cell, after updating, your Color field will only include Green. If you want to keep Red and Blue and add Green, your Color cell needs to read: Red|Blue|Green.

title

A Man | A Woman

title

A Man | A Woman

titleLanguage

English | English

titleType

assigned-by-artist | popular

titleTranslation

Un Homme^^Un Hombre | Une Femme^^Eine Frau

titleTranslationLanguage

French^^Spanish | French^^German

TIP: Remove leading, trailing, and extra/repeated spaces in your data

Both LibreOffice Calc and Excel have a TRIM() function that can remove these spaces for you. We recommend you always use this while preparing your data, especially in fields containing record identification numbers or controlled term values.

Why is this important?

If you create a person name “Prince, Diana “, then try to use “Prince, Diana” as an Object Production Person value, you will get an error because computers are literal and your person term literally does not match the term you are using in your Object record.

Likewise if your Object Identification Number is “ 2024.003.028”, attempting to update object number “2024.003.028” will tell you that object does not exist and would need to be transferred as new.

Likewise, Storage Location “Bdg A, Room 2” (2 spaces after comma) is not the same as “Bdg A, Room 2” (1 space after comma), and wouldn’t match if you had one form in your Storage location term, and tried to use the other in an LMI.