User Manual: CSV Importer: Opening CSV Files Safely

Page scope

Follow the steps on this page for opening:

  • .csv files exported from the CollectionSpace application

  • CSV Importer templates

  • reports created by the CSV Importer that you will use to prepare additional import batches

  • any other .csv data you will be using to prepare data for ingest via the CSV Importer

You can open .csv files with LibreOffice Calc, Microsoft Excel, Google Sheets, and other spreadsheet programs.

This page gives instructions for opening .csv files in LibreOffice Calc and Microsoft Excel so that the expected data format and encoding are retained. It also gives instructions for opening .csv files in Google Sheets and explains why this is not recommended. Finally, it gives some examples of why it is important to follow the instructions outlined here, or similar procedures in your spreadsheet application of choice.

If you do not follow these steps, you may run into things like:

  • garbled diacritics or other special characters

  • leading or trailing zeroes from record number fields being dropped (if “010.10” is interpreted as a number by the application, it will be changed to “10.1”)

  • date values being “helpfully” changed to another format

  • non-date values being interpreted as and changed to dates

  • long numeric IDs (such as ISBNs) being converted to scientific notation format

You may not notice these changes, but if you save them and import them via the CSV Importer, you run the risk of adding incorrect data.

We are providing this small sample CSV for you to download and use to experiment with your own CSV workflows: csv_unforced_quotes.csv. This file contains data formats that will clearly illustrate whether the process you use to open it changes the data:

idNumber,note,date,otherId
20220831.009,"Something, something",2022-08,9784326784987
20220831.010,Something something,9/1/2022,9784326784988
20220831.011,"CSV, not really a standard format",01-04,9784326784989

Instructions for recommended applications

LibreOffice Calc (recommended, available here)

  • Open the LibreOffice application

  • From the main screen, choose “Open File” or do CTRL-o

  • Navigate to the .csv file you wish to open

  • The text import dialog will open.

    • Ensure the settings are as shown below.

    • Click anywhere in the data preview at the bottom, do CTRL-a to select all columns, and set the Column Type to “Text”.

       

If LibreOffice opens your .csv with Writer (word processing) instead of Calc:

On LibreOffice home screen, click Create > Calc Spreadsheet. Once in that spreadsheet, open the existing file as described above (File > Open or CTRL-o).

Microsoft Excel

  • Create a new, blank Excel workbook

  • Select/click into A1 of Sheet1

  • Do File > Import (in the application menu bar, not the ribbon)

  • Make sure “CSV file” is selected and click “Import”

  • Navigate to your .csv file and click “Get Data”

  • In the first import screen, ensure “Delimited” is selected and File origin = “Unicode (UTF-8)”, then click “Next”

     

  • In the next import screen, deselect Tab and select Comma. Make sure the Text qualifier is a double quote. Then, click “Next”.

     

  • In the next import screen:

    • Click in the first column of “Preview of selected data”. It will turn black to show that column is selected.

    • Then, pull the little gray slider at the bottom of the “Preview of selected data” all the way to the right. (If there is no slider, all your columns fit in the preview display)

    • While holding down the Shift key, click in the right-most column. All columns should now be selected/black. If not, you may have clicked off the first column while moving the slider.

    • Once all columns are selected, select “Column data format” = Text

    • Click “Finish”

       

  • If you started off in cell A1, just click “OK” in the next box. Otherwise, make sure it looks like the image below and click OK.

  • Now you may save the Excel workbook as an .xlsx file. You can open this file normally in Excel.

Instructions and warning for opening .csv files with Google Sheets

Instructions:

  • Create a new blank spreadsheet

  • From inside that spreadsheet, do File > Import

  • Navigate to/select the .csv you want to open. The details depend on whether this .csv is saved in your Google Drive, or whether you need to upload it from your computer/device.

  • In the Import file dialog, choose “Separator type” = Comma and uncheck “Convert text to numbers, dates, and formulas”

Cautionary examples

These examples all use as their source data the downloadable csv_unforced_quotes.csv. This file uses the common CSV dialect where only values themselves containing commas are wrapped in quotes. Here is the raw data in this file:

idNumber,note,date,otherId
20220831.009,"Something, something",2022-08,9784326784987
20220831.010,Something something,9/1/2022,9784326784988
20220831.011,"CSV, not really a standard format",01-04,9784326784989

LibreOffice Calc

Opened without setting column type to Text:

Trailing zero removed from second idNumber.

Microsoft Excel

Opened normally (File > Open or CTRL-o)

The date strings have been changed. Our otherId values have been switched to scientific notation.

idNumber has been interpreted as a decimal numeric value. If I increase the number of decimal points to three, it looks correct and the idNumber will be saved correctly. If I increase the number of decimal points to four:

And saving the file writes:

idNumber,note,date,otherId 20220831.0090,"Something, something",Aug-22,9.78433E+12 20220831.0100,Something something,9/1/2022,9.78433E+12 20220831.0110,"CSV, not really a standard format",04-Jan,9.78433E+12

Opened with File > Import, but not setting all column data formats to Text

You get the exact same result as opening normally.