This page lists known issues with and limitations of the CSV Importer. There is no formal roadmap or timeline for addressing these limitations.

It also names some key things to understand about the design and function of the CSV Importer, which are by design, but important to understand.

General

Processed records are only stored for one day

The processing step creates CollectionSpace XML records. The transfer step ingests these XML records into the CollectionSpace instance.

The processed XML records expire after one day. If you wait too long between the processing step and transferring, you will transfer errors with the message: “No processed data found. Processed record may have expired from cache, in which case you need to start over with the batch.”

There are two reasons for this:

We cannot support storing all untransferred records forever for everyone who uses the the CSV Importer
The status or underlying data values of the target records can change after the processing step. More time between processing and transfer means greater likelihood of transferring outdated records, which is dangerous to data integrity.

The CSV Importer works only with the underlying data layer

The CSV Importer converts each row of your CSV to a CollectionSpace XML record, then performs the API calls to ingest those records into the application.

The CSV Importer has no access to settings or functionality that exist in the user interface or application layers of the web application. This includes things like:

Record ID generators
Field display label customizations
Derived/computed fields such as Object > Computed current location

This means:

You have to provide a record id in your CSV for every record you create. The CSV Importer cannot auto-generate record ids for you.
If you have customized your display change a field label in the Object record from Number of objects to Object count, you still have to use the numberOfObjects column in your CSV to import data into that field.
You cannot import Computed current location values into your Object records, since this field is automatically generated based on the latest Location/Movement/Inventory record associated with each Object.

Outlook: This is the intended functionality and it is not expected to change

The field names in CSV Importer templates do not match those you see displayed in the application

This is related to the above point: the CSV Importer only knows about the underlying data layer, and not any display configuration.

If you are using one of the community supported domain profiles with no field label customizations, you can look up the displayed field label-to-underlying XML field name (which the template column names are based on) in the “all_fields_{version}_dates_collapsed.csv” files available here.

Outlook: This will not change

Unique record identifier values are required to batch update or delete records

See our Data Documentation on Record IDs for more detail on what is meant by “record identifier”. Note that for authority term records, the initial Display name value is considered to be the record identifier.

CollectionSpace itself does not require you to enter a record ID in some record types (Location/Movement/Inventory (LMI) is an example). It will warn about non-unique record ID values in record types that require an ID or reference number value, but it does not require unique record ID values if you ignore the warnings.

The CSV Importer uses the Record ID values to determine what records should be updated or deleted. If it does not find a given Record ID value in your CollectionSpace instance, it treats that CSV row as a new record.

Thus, if you have two Objects with Identification number: “2022.001.1”, the CSV Importer will refuse to process any row with that ID, as it has no way of knowing which of those two records it should update or delete. Likewise, if you have not entered Movement reference numbers in your LMIs, you cannot use CSV Importer to update or delete those LMIs, or to import relationships between them and Objects.

CSV Importer does not let you import records with duplicate IDs. If two records in the same CSV have the same ID value, one will get an error and not be processed.

If you attempt to ingest as new a record with an ID value that already exists, the CSV Importer will flag the record’s status as Existing. If you proceed and opt to transfer existing records, you will overwrite the existing record. This is why it is crucial to check your processing step reports!

CollectionSpace itself considers record IDs case-sensitively. That is, you will not get a duplicate warning for “2022.001.A” if you have an existing record with “2002.001.a”. CSV Importer also treats record IDs case-sensitively, so if you have the aforementioned IDs in two Object records, you can batch update or delete those Objects.

Relationship records (objecthierarchy, authorityhierarchy, and nonhierarchicalrelationship) are the exception to the “every row in your CSV must contain a unique identifier value” rule. For these, the record identifer for the relationship record is derived from the combination of broader/narrower or subject/object record IDs in the CSV.

Outlook: This is by design, and intended to make it easy to work with you data via the values you see and are familiar with. However, we are exploring the feasibility of eventually adding an expert option to include each record’s underlying system URI in your CSV as a record match point. There is no timeline for when this might be implemented, and no guarantee that it will.

Duplicate authority terms

Creating/updating/deleting authority term records

The initial term Display name value is used as the authority record ID, so all of the points from the above section apply.

Populating fields in other records with authority terms

CollectionSpace does not prevent or even warn you from creating two terms in the same authority vocabulary (e.g. Person/ULAN, or Organization/Local) with the exact same Display name values. For example, if there are multiple persons whose identities are unknown, but you know they cannot all be the same person, you may create multiple Person/Local records with the Display name “Anonymous.” These records may or may not be distinguished from each other by other data in the record, such as birth dates, but when preparing data for CSV Import, we only use the Display name value.

In this example, if you enter “Anonymous” as a field value in the CSV, the Importer will produce a warning that more than one record for that value was found and that it used one of them (whichever was first when it did its search, which may not be the correct one).

Make sure to check your processing step report for warnings, and manually ensure the correct “Anonymous” has been added to the relevant fields.

The processing step must convert each Display name value like “Anonymous” into a unique “refname URN”. You will see these refname URNs in XML records retrieved via the API. They are also what is stored in the database. This is also partly why there are separate columns in the CSV templates for ownerPersonLocal and ownerOrganizationLocal to populate the owner field: which column the displayName value is in tells the system how to look up the correct refname URN.

An “Easter egg” of the CSV Importer is that it supports importing CSV files prepared with refname URNs as field values instead of Display name values. Since preparing such CSVs is only feasible with some scripting, we don’t keep the special “populate with refnames” CSV templates up publicly, but we can easily provide them upon request.

If you have lots of duplicate term Display names and the ability to generate CSVs with the desired refname URN values, this may be a workaround for this limitation.

Batch updating record ID values is not possible

Because the CSV Importer uses the record ID to determine what record to update, you cannot use the CSV Importer to change the record ID. If you try to change an object’s Identification Number from “2022.01” to “2022.001” via the CSV Importer, here is what will happen:

An object with “2022.001” does not yet exist in your instance, so this will be flagged as a new record when the processing step checks record status.
If you proceed with transfer of new records, you will now have records with “2022.01” and “2022.001” in your instance.

This is a use case that has come up repeatedly, so we are thinking about possible ways we might support something like this, though we have no timeline for the work.

You cannot ingest record rows having authority or vocabulary terms that do not exist

If the processing step cannot find a refname URN for a term in a row, an error is recorded for that row. It will not be further processed or transferred.

The CSV Importer produces a report of missing terms. This report can be split up into CSVs to ingest the missing terms into the appropriate authority vocabularies if desired.

Outcome: This is by design and will not change as the default behavior. Technically it would be possible to add a setting to switch on auto-creation of any authority/vocabulary terms that do not already exist. However, (a) this would significantly slow down the Processing step; and (b) would empower folks to create a lot of difficult-to-clean-up mess in their data. No decision has been made on this, and no timeline exists for making a decision

History: Early versions of the CSV Importer did allow you to ingest record rows with missing authority/vocabulary terms. The thought was that you could subsequently ingest data from your missing terms report to provide the missing terms, but a) some users were not paying attention to their processing reports and doing the subsequent data loads, meaning they were creating bad data in their systems; and b) we began to suspect we could not 100% guarantee that the authority terms ingested after the fact would have the exact same underlying system identifier values that had been put in the previously ingested records, which would pollute the data.

Repeatable fields populated by multiple authorities - order of terms from different authorities

A full list of fields affected by this issue is available here. Look for the “multi_auth_repeatable_fields_#_#.csv” file for the current version in the file listing and click on it.

You can view a rendering of the CSV in your browser, but if you want to be able to filter the list by profile and record type (recommended), click on the “Raw” link and do “File > Save” from your browser. You should be able to open the resulting file in your spreadsheet application of choice.

When entering data manually in the application, you can create the following in an Acquisition procedure:

CollectionSpace > User Manual: CSV Importer: Known issues/limitations > image-20220817-011132.png

It is currently not possible to achieve this via a batch import because of limitations introduced by the fact that Source can be populated from Organization/Local or Person/Local authorities. Because the CSV Importer needs to know which authority a term belongs to in order to look up the proper refname URN (see Info box under Populating fields in other records with authority terms section above for more details), there are two columns in the CSV template for Source values.

CollectionSpace > User Manual: CSV Importer: Known issues/limitations > acq_funding_csv_data.png

When there are multiple columns that can be sources for the same field, the data processor/transformer used by the CSV Importer always processes the data columns in left-to-right order. Thus, for the Acquisition Funding > Source field, Person values are handled before Organization values. The row for TEST1 results in the following, which is very wrong if Carmen only contributed 50 Euros:

CollectionSpace > User Manual: CSV Importer: Known issues/limitations > acq_funding_test1.png

The person column is processed first, so it gets lined up with “US Dollar” and “5000”.

CollectionSpace itself will export CSV data as seen in the row for TEST2. This does not have the desired effect:

CollectionSpace > User Manual: CSV Importer: Known issues/limitations > acq_funding_test2.png

If you are round-tripping data in the affected fields, the effects could be quite bad.

The CSV Importer’s data processor/transformer sees this as:

the first Source field value is blank
the second Source field value is Carmen SanDiego
the third Source field value is University of Place
the fourth Source field value is also blank, and there are no fourth values for any other fields in this group, so I’m not adding an empty group/row

The order of columns in the CSV templates indicates the order of processing; it does not control the order of processing

Both the CSV templates and the data processing/transformation code used by the CSV Importer are based on how fields and values are defined for a record in the CollectionSpace UI code. So, moving the acquisitionFundingSourceOrganizationLocal column so that it comes before acquisitionFundingSourcePersonLocal will not solve this issue. Only changing the application’s UI field definitions would change the order of processing.

The known solutions to this issue require changing how you enter data in the CSV for import. The new way of entering data would be more complex and easy for humans to mess up. Since this issue does not affect most uses of the CSV Importer, we have opted to keep the simpler data entry conventions.

The alternate data preparation/import described in the Info box under the Populating fields in other records with authority terms section above was actually developed and used successfully to address this issue in one institution's complex use case. If this is a blocker for you, please refer to that Info box and get in touch.

Outcome: Given that the CSV Importer cannot correctly import data that is exported from CollectionSpace (round-tripping), this is considered a BUG and needs to be fixed. However, it is a complex thing to solve and currently have no timeline for addressing it.

Structured date fields

A number of these are not limitations/issues as much as they are things to be aware of about how dates are handled.

The structured date fields in the CSV templates are typically named ending with “Group”: acquisitionDateGroup, objectProductionDateGroup, etc.
In this column, you enter the date string that you would type into the date field in the user interface.
The processing step attempts to parse the date string.
- If date parsing is successful, the date is transformed into the underlying detailed structured date fields you see in the drop-down interface in the user interface.
- If date parsing is unsuccessful the string is retained in the dateDisplayDate field and will be visible in the record, much as happens if you enter an unparseable date manually.

You can find unparsed dates in your CollectionSpace instance using an advanced search where All of the following conditions must be satisfied:

{Date field} - Represented dates computed is (unchecked/blank box)
{Date field} - Display date is not blank

Currently, you cannot import the individual fields within a structured date field group directly. This means you cannot set Period, Association, Note, Certainty, or Qualifier values via the CSV Importer.
We understand these fields to be fairly rarely used, but if you have needs/use cases for this, we’d be interested in hearing them.

The CSV Importer’s date parsing algorithm does not behave exactly the same as the one applied when you type a date manually in the application.
For many common date formats, the CSV Importer runs a fast date parser of its own that does not require making a call to /wiki/spaces/DOC/pages/2930128484 (which is quite slow). If the Importer’s own date parsing can’t figure out the date, it makes the API call, which returns exactly what you would get by typing the date into the user interface.
CollectionSpace’s internal date parser’s behavior is specified here, and it makes some odd/unexpected decisions (at least for someone who is more used to dealing with standard date formats). One example: two digit years are assumed to literally be two digit years. So an acquisition date of “1/7/19” gets parsed as “0019-01-07” (yyyy-mm-dd). Another: “2001-03”, which I commonly see used to record “March 2001” is parsed by CollectionSpace as the date range: “0003-01-01 - 2001-12-31”.

The “Config” box on the New Batch creation page allows you to change some aspects of how the CSV Importer behaves and interprets your data. This includes some date parsing settings such as date format (whether 3/4/2020 is March 4 (the default) or April 3), and two-digit year handling (whether 1/21/19 is in the year 19 or the year 2019 (the default)).

We are working on some more powerful date parsing tools on the backend/migration side (available, mostly-working options here), and would eventually like to make this parser available as a configurable expert user option in the CSV Importer.

Osteology inventory details cannot be batch imported

This only affects users of the Anthropology domain profile (and any custom profiles based on it).

The form interface for this section of the Osteology procedure is graphically and technically structured very differently than other CollectionSpace record data. It contains hundreds of cryptically named fields that would be extremely unwieldy to manage in a tabular format.

Outcome: Not feasible to support; will not be added

CollectionSpace > User Manual: CSV Importer: Known issues/limitations > Screenshot_2020-10-22 Osteology CollectionSpace.png