Page scope

Follow the steps on this page for opening:

  • .csv files exported from the CollectionSpace application

  • CSV Importer templates

  • reports created by the CSV Importer that you will use to prepare additional import batches

  • any other .csv data you will be using to prepare data for ingest via the CSV Importer

You can open .csv files with LibreOffice Calc, Microsoft Excel, Google Sheets, and other spreadsheet programs.

This page gives instructions for opening .csv files in LibreOffice Calc and Microsoft Excel so that the expected data format and encoding are retained. It also gives instructions for opening .csv files in Google Sheets and explains why this is not recommended. Finally, it gives some examples of why it is important to follow the instructions outlined here, or similar procedures in your spreadsheet application of choice.

If you do not follow these steps, you may run into things like:

  • garbled diacritics or other special characters

  • leading or trailing zeroes from record number fields being dropped (if “010.10” is interpreted as a number by the application, it will be changed to “10.1”)

  • date values being “helpfully” changed to another format

  • non-date values being interpreted as and changed to dates

  • long numeric IDs (such as ISBNs) being converted to scientific notation format

You may not notice these changes, but if you save them and import them via the CSV Importer, you run the risk of adding incorrect data.

We are providing this small sample CSV for you to download and use to experiment with your own CSV workflows: csv_unforced_quotes.csv. This file contains data formats that will clearly illustrate whether the process you use to open it changes the data:

idNumber,note,date,otherId
20220831.009,"Something, something",2022-08,9784326784987
20220831.010,Something something,9/1/2022,9784326784988
20220831.011,"CSV, not really a standard format",01-04,9784326784989

Instructions for recommended applications

LibreOffice Calc (recommended, available here)

If LibreOffice opens your .csv with Writer (word processing) instead of Calc:

On LibreOffice home screen, click Create > Calc Spreadsheet. Once in that spreadsheet, open the existing file as described above (File > Open or CTRL-o).

Microsoft Excel

The following instructions are for the Excel desktop application. The browser/Office 365 version of Excel does not currently support opening .csv files in a safe way.

To be extra safe

  • Click on “Queries & Connections” in the Data ribbon

  • Select the connection (there should only be one) and click “Remove”

  • Click “OK” on the scary-looking alert box

Why? There are some situations and combinations of settings/events that could cause the data you have imported to change in Excel. If you had made changes to the data in the Excel, they might be lost. If you remove the connection to the external data source, this absolutely cannot happen.

Instructions and warning for opening .csv files with Google Sheets

In our testing, Google Sheets sometimes converted 20220831.010 to 20220831.01 even with the “Convert text to numbers, dates, and formulas” box unchecked. This depends on how the .csv was formatted/saved. Since .csv data exported from CollectionSpace does not force wrapping of all values in quotes (see below), we do not recommend using Google Sheets to prepare data for the CSV Importer unless you are certain your data will not be affected by this kind of issue.

Details:
Most applications default to wrapping a CSV data value in quotes only if the data value itself contains a comma:

exitNumber,exitNote
20220831.009,"Something, something"
20220831.010,Something something
20220831.011,"CSV, not really standard"

If your .csv was written like this, rbGoogle Sheets removes the trailing zero from the second exitNumber, even with the “Convert text to numbers, dates, and formulas” box unchecked. This is because it parses the value as a decimal number, not text.

Some applications have the option (usually well-hidden) to wrap all values or all text values in quotes, resulting in:

exitNumber,exitNote
"20220831.009","Something, something"
"20220831.010","Something something"
"20220831.011","CSV, not really standard"

Google Sheets retains your data integrity for the above if you uncheck the “Convert text to numbers, dates, and formulas” box.

Instructions:

Cautionary examples

These examples all use as their source data the downloadable csv_unforced_quotes.csv. This file uses the common CSV dialect where only values themselves containing commas are wrapped in quotes. Here is the raw data in this file:

idNumber,note,date,otherId
20220831.009,"Something, something",2022-08,9784326784987
20220831.010,Something something,9/1/2022,9784326784988
20220831.011,"CSV, not really a standard format",01-04,9784326784989

LibreOffice Calc

Opened without setting column type to Text:

Trailing zero removed from second idNumber.

Microsoft Excel

Opened normally (File > Open or CTRL-o)

The date strings have been changed. Our otherId values have been switched to scientific notation.

idNumber has been interpreted as a decimal numeric value. If I increase the number of decimal points to three, it looks correct and the idNumber will be saved correctly. If I increase the number of decimal points to four:

And saving the file writes:

idNumber,note,date,otherId
20220831.0090,"Something, something",Aug-22,9.78433E+12
20220831.0100,Something something,9/1/2022,9.78433E+12
20220831.0110,"CSV, not really a standard format",04-Jan,9.78433E+12

Opened with File > Import, but not setting all column data formats to Text

You get the exact same result as opening normally.