User Manual: CSV Importer: Data Prep Special Topics and Tips

Importing media files with Media Handling procedure records

The CSV templates for media records include a “mediaFileURI” column on the far right.

Enter the URI of the media file you wish to ingest on the given media record in that column.

The media files will be ingested in the following cases:

  • the record is being newly created in CollectionSpace

  • the record exists in CollectionSpace but does not already have a media file associated with it

If a record exists in CollectionSpace and already has a media file associated with it, any data from other columns will be updated in that record, but the Importer will ignore the “mediaFileURI” in that row.

Due to limitations of the Services API, you cannot currently batch-add multiple or additional media files to a Media Handling procedure. Ingesting a second media file on an existing record via the API will cause the link to the first media file to be removed.

If you are using the CSV Importer at importer.collectionspace.org, in order for the Services API to ingest the file, the URI for each media file must be publicly accessible on the Web and cannot require any authentication credentials.

You can achieve this by moving your files to a Dropbox folder, AWS S3 Bucket, etc., and making the permissions of that folder/bucket (and all files in it) public (or “available if you have the URL”, if that is an option) for the duration of the ingest.

We do not recommend putting files into Google Drive for ingest via the CSV Importer. The Google Drive documentation is not very transparent about quotas/limits placed on file downloads, but we have had users report issues ingesting files from a Google Drive. The pattern we see is: The first X files ingest fine, but all the rest fail as the CSpace API’s request for the file is rejected by Google Drive. The number of successful files varies, but is relatively small (<25).

If you are using a locally-provided instance of the CSV Importer, you may be able to provide URIs using the file:// protocol to ingest locally-available files. Consult with the person who administers your CollectionSpace and CSV Importer for details.

Preparing a batch of records to delete

For the most efficient processing, include only the field containing the unique identifier of the records to be deleted. For example: “objectNumber” for collectionobjects, or “termDisplayName” for authority records.

Deleting will work with a CSV that includes more data, but the Processing step will take longer.

Deleting field values when updating existing records

In order to prevent accidental data destruction, if you leave a cell in your CSV blank, the CSV Importer does nothing to existing data in the corresponding field in the existing CollectionSpace record.

If you actually want to completely delete the value of a field in an existing CollectionSpace record, you must deploy the bomb 💣:

Copy/paste the bomb emoji 💣 into any cell of your CSV to complete clear that field in that record on update.

The bomb must be the only thing in the CSV cell. No other characters before or after it. This ensures you can round-trip a note such as “The artist known as 🧶💣 wraps urban objects and structures in knitted nets” without accidental data loss. The exception to this is in multivalued fields, where you can do something like: “value|💣|value”. The value of the 2nd instance of the field would be removed, since after splitting on the multivalue delimiter “|”, the bomb is the only thing in the 2nd instance.

Bombing boolean fields (like “Preferred for lang” in authority records)

This applies to boolean values in general, most of which are formatted in the application web forms as checkboxes, but may also show up as radio buttons or other widgets.

If you create an authority manually in the application and do not check the “Preferred for lang” box, the application produces the following in the underlying XML record:

<termPrefForLang>false</termPrefForLang>

We’ll call this False Value.

If you batch import new authority terms via the CSV Importer and you do not fill in the termPrefForLang field in the CSV, the resulting XML record has:

<termPrefForLang/>

We’ll call this NULL Value.

Using 💣 in a boolean field results in a False Value, not a NULL Value. If a boolean field has ever had a value in it, the system does not let you change it back to a Null Value, but you can change it to false.

If you import a CSV with 💣 in a boolean field, you will get a boolean_value_transform warning. This is because what you have in your CSV is not a valid value for a boolean field, and the processor has tried to be helpful and assign the correct valid boolean value. The bomb in a boolean field is always “corrected” to false.

False Value display

 

NULL Value display

Bombing multivalued fields

Given an existing Object record with:

And CSV data like:

| objectNumber | objectName | objectNameCurrency | objectNameLevel | |--------------+----------------------+--------------------+-------------------| | mvbombtest | name 1|name 2|name 3 | current|💣|unknown | 💣|group|subgroup |

Results in:

NOTE: You will get unknown_option_list_value warning because the bomb is not a known term for objectNameCurrency or objectNameLevel fields, but this is expected.

If I manually reset the record to the original values, but this time ingest the following CSV:

| objectNumber | objectName | objectNameCurrency | objectNameLevel | |--------------+----------------------+--------------------+-------------------| | mvbombtest | name 1|name 2|name 3 | 💣 | 💣|group|subgroup |

We get:

objectNameCurrency has been treated as a single-valued field, so only the first value has been removed. To completely clear the Currency column, the objectNameCurrency value in the CSV would need to be: 💣|💣|💣

For sake of example, I went in and manually edited this record to have:

Now, if I ingest the following CSV:

| objectNumber | objectName | objectNameCurrency | objectNameLevel | |--------------+----------------------+-----------------------+-------------------| | mvbombtest | name 1|💣|name 3|💣 | current|💣|unknown|💣 | |💣|subgroup|💣 |

We get:

For a number of reasons, the CSV Importer cannot currently identify fully empty rows in a repeating field group and intelligently remove them. You would need to manually remove those empty Object name rows in the application.