User Manual: CSV Importer: Formatting Data
Data must be properly formatted before using the CSV Importer. The tool will provide error messages for improperly formatted data, but it’s best to do as much cleanup as possible before getting to the tool. In the templates generated from GitHub, the header rows hold the information needed to determine what should/can go in each field.
Data Types: Strings, Integers, Floats, Dates and Structured Dates
The data type of your field is noted in Row 3 of the GitHub-generated import template.
Strings are any combination of alphanumeric characters.
Example: a, b, c, 1, 2, 3
Integers are whole numbers
Example: 1, 2, 3
Floats are numbers that may include decimal points
Example: 1.2, 3.456
CollectionSpace does not currently support fractions in float fields (e.g. 1 ½).
Booleans are boxes that may be checked or unchecked
If you would like the box to be checked, enter TRUE into the field
If you would like the box to be unchecked, you may enter FALSE or leave the field blank
Dates are single calendar dates
For example: 01/02/2020
Structured dates are groups of fields in CollectionSpace that provide for fuzzy, ranged, and other non-single dates. You can import the whole date value as a string. If possible, it will be parsed into the more specific date elements (dateEarliestYear, dateEarliestMonth, datePeriod, dateLatestQualifierValue, etc.) that are viewable in the box that pops up when you click on a structured date in the record.
If you have more granular date data (separate day, month, year values), you cannot import them directly into the more specific structured date fields. You will need to combine them into a full date for import.
Regardless of whether your date string can be parsed into the more specific structured date elements, the string you import will remain in the “dateDisplayDate” and will be visible in the record.
Example: 01/02/2020, ca. 1920, 1950-1980, early 20th century
Controlled terms: Value Source Types and Value Sources: Optionlist, Vocabulary, and Authority Fields
The Value Source Type and Value Source for your field are noted in Rows 6 & 7 of the GitHub-generated import template. If the Value Source Type and Value Source fields have values in them other than N/A, the data in the field comes from a controlled source such as a pick list or authority.
Option lists are static pick lists in CollectionSpace - those that cannot be edited via the UI. For fields populated by an option list, you must use the values exactly as they appear in Row 7 of the GitHub-generated template, Value Source. For example, in Object Cataloging - Common, the collection value “permanent-collection” will import as expected. The following would not: “permanent collection” or “Permanent-collection”.
Vocabularies are dynamic pick lists in Collection - those that can be edited via the UI. For fields populated by a vocabulary, you must use the values exactly as they appear in the “Name” field of the vocabulary. This includes capitalization, spacing, and any included punctuation. (Variations will be imported and will display, but they will not be matched up with/reconciled against the vocabulary, introducing inconsistency in your data).
Tip: As every museum may have different terms in dynamic lists, these values are not included in the templates on GitHub. You may wish to add these values to your locally saved version of the template.
Critical Tip: You must create new terms in a given vocabulary before importing other records that use those terms. If you use a vocabulary term in a record, and that vocabulary term does not exist in the vocabulary, the CSV Importer will not import the record that uses the missing term. The Processing step will prepare a report of all unique missing terms in a given batch. This report includes the term and what authority or vocabulary the term must be added to.
Authority fields pull data from one or more authorities. For fields populated by an authority , the CSV value should exactly match the Display value field of the term in the appropriate authority. If there is a mismatch, the term used in your record will not be connected to the authority as expected and a new term will be created.
Tip: If a field is linked to more than one authority, the GitHub-generated template will include multiple columns for that field; for example, acquisitionSourcePerson and acquisitionSourceOrganization.
Critical Tip: You must create new terms in a given authority before importing other records that use those terms. If you use an authority term in a record, and that authority term does not exist in the authority, the CSV Importer will not import the record that uses the missing term. The Processing step will prepare a report of all unique missing terms in a given batch. This report includes the term and what authority or vocabulary the term must be added to.
Fields with N/A in the value source type can be formatted according to your local practice.
Multi-valued or Repeatable Fields
Whether or not a field is repeatable is noted in Row 4 of the GitHub-generated import template.
Values for multi-valued fields should be separated with a pipe: |
To enter the pipe | character, select shift and the key directly under your delete/backspace key
Example: Eleanor Brown|Daphne Brown
For multi-valued fields with multi-valued sub-fields, such as the Title block in Object Cataloging, non-repeating sub fields should be separated with a |, repeating sub-fields should be separated with a | for each top-level instance and ^^ for each repeating value of the subfield (sample below). The ^ is above the 6 key on your keyboard.
Tip: Still not sure how to format? Create a dummy record with data in the fields you’d like to import in your CSpace instance or on one of our demo sites and Export those fields to see the correct formatting.
Critical tip: If the data you’d like to update includes multi-valued fields, all existing values will be replaced by the data you import. To make sure you don’t delete existing values by mistake, we suggest first exporting the records you’d like to update via the Search Results Export feature, and then editing that CSV vs. creating one from scratch.
For example, if your existing record has colors Red and Blue, and in your spreadsheet you enter the value Green into the cell, after updating, your Color field will only include Green. If you want to keep Red and Blue and add Green, your Color cell needs to read: Red|Blue|Green.
title | A Man | A Woman |
---|---|
titleLanguage | English | English |
titleType | assigned-by-artist | popular |
titleTranslation | Un Homme^^Un Hombre | Une Femme^^Eine Frau |
titleTranslationLanguage | French^^Spanish | French^^German |
TIP: Remove leading, trailing, and extra/repeated spaces in your data
Both LibreOffice Calc and Excel have a TRIM() function that can remove these spaces for you. We recommend you always use this while preparing your data, especially in fields containing record identification numbers or controlled term values.
Why is this important?
If you create a person name “Prince, Diana “, then try to use “Prince, Diana” as an Object Production Person value, you will get an error because computers are literal and your person term literally does not match the term you are using in your Object record.
Likewise if your Object Identification Number is “ 2024.003.028”, attempting to update object number “2024.003.028” will tell you that object does not exist and would need to be transferred as new.
Likewise, Storage Location “Bdg A, Room 2” (2 spaces after comma) is not the same as “Bdg A, Room 2” (1 space after comma), and wouldn’t match if you had one form in your Storage location term, and tried to use the other in an LMI.