User Manual: CSV Importer: Known issues/limitations

This page lists known issues with and limitations of the CSV Importer. There is no formal roadmap or timeline for addressing these limitations.

It also names some key things to understand about the design and function of the CSV Importer, which are by design, but important to understand.

General

Processed records are only stored for one day

The processing step creates CollectionSpace XML records. The transfer step ingests these XML records into the CollectionSpace instance.

The processed XML records expire after one day. If you wait too long between the processing step and transferring, you will transfer errors with the message: “No processed data found. Processed record may have expired from cache, in which case you need to start over with the batch.”

There are two reasons for this:

  • We cannot support storing all untransferred records forever for everyone who uses the the CSV Importer

  • The status or underlying data values of the target records can change after the processing step. More time between processing and transfer means greater likelihood of transferring outdated records, which is dangerous to data integrity.

The CSV Importer works only with the underlying data layer

The CSV Importer converts each row of your CSV to a CollectionSpace XML record, then performs the API calls to ingest those records into the application.

The CSV Importer has no access to settings or functionality that exist in the user interface or application layers of the web application. This includes things like:

  • Record ID generators

  • Field display label customizations

  • Derived/computed fields such as Object > Computed current location

This means:

  • You have to provide a record id in your CSV for every record you create. The CSV Importer cannot auto-generate record ids for you.

  • If you have customized your display change a field label in the Object record from Number of objects to Object count, you still have to use the numberOfObjects column in your CSV to import data into that field.

  • You cannot import Computed current location values into your Object records, since this field is automatically generated based on the latest Location/Movement/Inventory record associated with each Object.

Outlook: This is the intended functionality and it is not expected to change

The field names in CSV Importer templates do not match those you see displayed in the application

This is related to the above point: the CSV Importer only knows about the underlying data layer, and not any display configuration.

If you are using one of the community supported domain profiles with no field label customizations, you can look up the displayed field label-to-underlying XML field name (which the template column names are based on) in the “all_fields_{version}_dates_collapsed.csv” files available here.

Outlook: This will not change

Unique record identifier values are required to batch update or delete records

See our Data Documentation on Record IDs for more detail on what is meant by “record identifier”. Note that for authority term records, the initial Display name value is considered to be the record identifier.

CollectionSpace itself does not require you to enter a record ID in some record types (Location/Movement/Inventory (LMI) is an example). It will warn about non-unique record ID values in record types that require an ID or reference number value, but it does not require unique record ID values if you ignore the warnings.

The CSV Importer uses the Record ID values to determine what records should be updated or deleted. If it does not find a given Record ID value in your CollectionSpace instance, it treats that CSV row as a new record.

Thus, if you have two Objects with Identification number: “2022.001.1”, the CSV Importer will refuse to process any row with that ID, as it has no way of knowing which of those two records it should update or delete. Likewise, if you have not entered Movement reference numbers in your LMIs, you cannot use CSV Importer to update or delete those LMIs, or to import relationships between them and Objects.

CSV Importer does not let you import records with duplicate IDs. If two records in the same CSV have the same ID value, one will get an error and not be processed.

If you attempt to ingest as new a record with an ID value that already exists, the CSV Importer will flag the record’s status as Existing. If you proceed and opt to transfer existing records, you will overwrite the existing record. This is why it is crucial to check your processing step reports!

CollectionSpace itself considers record IDs case-sensitively. That is, you will not get a duplicate warning for “2022.001.A” if you have an existing record with “2002.001.a”. CSV Importer also treats record IDs case-sensitively, so if you have the aforementioned IDs in two Object records, you can batch update or delete those Objects.

Relationship records (objecthierarchy, authorityhierarchy, and nonhierarchicalrelationship) are the exception to the “every row in your CSV must contain a unique identifier value” rule. For these, the record identifer for the relationship record is derived from the combination of broader/narrower or subject/object record IDs in the CSV.

Outlook: This is by design, and intended to make it easy to work with you data via the values you see and are familiar with. However, we are exploring the feasibility of eventually adding an expert option to include each record’s underlying system URI in your CSV as a record match point. There is no timeline for when this might be implemented, and no guarantee that it will.

Duplicate authority terms

Creating/updating/deleting authority term records

The initial term Display name value is used as the authority record ID, so all of the points from the above section apply.

Populating fields in other records with authority terms

CollectionSpace does not prevent or even warn you from creating two terms in the same authority vocabulary (e.g. Person/ULAN, or Organization/Local) with the exact same Display name values. For example, if there are multiple persons whose identities are unknown, but you know they cannot all be the same person, you may create multiple Person/Local records with the Display name “Anonymous.” These records may or may not be distinguished from each other by other data in the record, such as birth dates, but when preparing data for CSV Import, we only use the Display name value.

In this example, if you enter “Anonymous” as a field value in the CSV, the Importer will produce a warning that more than one record for that value was found and that it used one of them (whichever was first when it did its search, which may not be the correct one).

Make sure to check your processing step report for warnings, and manually ensure the correct “Anonymous” has been added to the relevant fields.

Batch updating record ID values is not possible

Because the CSV Importer uses the record ID to determine what record to update, you cannot use the CSV Importer to change the record ID. If you try to change an object’s Identification Number from “2022.01” to “2022.001” via the CSV Importer, here is what will happen:

  • An object with “2022.001” does not yet exist in your instance, so this will be flagged as a new record when the processing step checks record status.

  • If you proceed with transfer of new records, you will now have records with “2022.01” and “2022.001” in your instance.

You cannot ingest record rows having authority or vocabulary terms that do not exist

If the processing step cannot find a refname URN for a term in a row, an error is recorded for that row. It will not be further processed or transferred.

The CSV Importer produces a report of missing terms. This report can be split up into CSVs to ingest the missing terms into the appropriate authority vocabularies if desired.

Outcome: This is by design and will not change as the default behavior. Technically it would be possible to add a setting to switch on auto-creation of any authority/vocabulary terms that do not already exist. However, (a) this would significantly slow down the Processing step; and (b) would empower folks to create a lot of difficult-to-clean-up mess in their data. No decision has been made on this, and no timeline exists for making a decision

History: Early versions of the CSV Importer did allow you to ingest record rows with missing authority/vocabulary terms. The thought was that you could subsequently ingest data from your missing terms report to provide the missing terms, but a) some users were not paying attention to their processing reports and doing the subsequent data loads, meaning they were creating bad data in their systems; and b) we began to suspect we could not 100% guarantee that the authority terms ingested after the fact would have the exact same underlying system identifier values that had been put in the previously ingested records, which would pollute the data.

Repeatable fields populated by multiple authorities - order of terms from different authorities

When entering data manually in the application, you can create the following in an Acquisition procedure:

University of Place is an Organization; Carmen is a Person

It is currently not possible to achieve this via a batch import because of limitations introduced by the fact that Source can be populated from Organization/Local or Person/Local authorities. Because the CSV Importer needs to know which authority a term belongs to in order to look up the proper refname URN (see Info box under Populating fields in other records with authority terms section above for more details), there are two columns in the CSV template for Source values.

Sample CSV data for Acquisition funding field group

When there are multiple columns that can be sources for the same field, the data processor/transformer used by the CSV Importer always processes the data columns in left-to-right order. Thus, for the Acquisition Funding > Source field, Person values are handled before Organization values. The row for TEST1 results in the following, which is very wrong if Carmen only contributed 50 Euros:

The person column is processed first, so it gets lined up with “US Dollar” and “5000”.

CollectionSpace itself will export CSV data as seen in the row for TEST2. This does not have the desired effect:

The CSV Importer’s data processor/transformer sees this as:

  • the first Source field value is blank

  • the second Source field value is Carmen SanDiego

  • the third Source field value is University of Place

  • the fourth Source field value is also blank, and there are no fourth values for any other fields in this group, so I’m not adding an empty group/row

Outcome: Given that the CSV Importer cannot correctly import data that is exported from CollectionSpace (round-tripping), this is considered a BUG and needs to be fixed. However, it is a complex thing to solve and currently have no timeline for addressing it.

Structured date fields

A number of these are not limitations/issues as much as they are things to be aware of about how dates are handled.

  • The structured date fields in the CSV templates are typically named ending with “Group”: acquisitionDateGroup, objectProductionDateGroup, etc.

  • In this column, you enter the date string that you would type into the date field in the user interface.

  • The processing step attempts to parse the date string.

    • If date parsing is successful, the date is transformed into the underlying detailed structured date fields you see in the drop-down interface in the user interface.

    • If date parsing is unsuccessful the string is retained in the dateDisplayDate field and will be visible in the record, much as happens if you enter an unparseable date manually.

Osteology inventory details cannot be batch imported

This only affects users of the Anthropology domain profile (and any custom profiles based on it).

The form interface for this section of the Osteology procedure is graphically and technically structured very differently than other CollectionSpace record data. It contains hundreds of cryptically named fields that would be extremely unwieldy to manage in a tabular format.

Outcome: Not feasible to support; will not be added