Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

History: Early versions of the CSV Importer did allow you to ingest record rows with missing authority/vocabulary terms. The thought was that you could subsequently ingest data from your missing terms report to provide the missing terms, but a) some users were not paying attention to their processing reports and doing the subsequent data loads, meaning they were creating bad data in their systems; and b) we began to suspect we could not 100% guarantee that the authority terms ingested after the fact would have the exact same underlying system identifier values that had been put in the previously ingested records, which would pollute the data.

Structured date fields

A number of these are not limitations/issues as much as they are things to be aware of about how dates are handled.

  • The structured date fields in the CSV templates are typically named ending with “Group”: acquisitionDateGroup, objectProductionDateGroup, etc.

  • In this column, you enter the date string that you would type into the date field in the user interface.

  • The processing step attempts to parse the date string.

    • If date parsing is successful, the date is transformed into the underlying detailed structured date fields you see in the drop-down interface in the user interface.

    • If date parsing is unsuccessful the string is retained in the dateDisplayDate field and will be visible in the record, much as happens if you enter an unparseable date manually.

Panel
panelIconId1f4a1
panelIcon:bulb:
panelIconText💡
bgColor#FFFAE6

You can find unparsed dates in your CollectionSpace instance using an advanced search where All of the following conditions must be satisfied:

  • {Date field} - Represented dates computed is (unchecked/blank box)

  • {Date field} - Display date is not blank

Panel
panelIconId274c
panelIcon:x:
panelIconText
bgColor#FFEBE6

Currently, you cannot import the individual fields within a structured date field group directly. This means you cannot set Period, Association, Note, Certainty, or Qualifier values via the CSV Importer.
We understand these fields to be fairly rarely used, but if you have needs/use cases for this, we’d be interested in hearing them.

Note

The CSV Importer’s date parsing algorithm does not behave exactly the same as the one applied when you type a date manually in the application.
For many common date formats, the CSV Importer runs a fast date parser of its own that does not require making a call to /wiki/spaces/DOC/pages/2930128484 (which is quite slow). If the Importer’s own date parsing can’t figure out the date, it makes the API call, which returns exactly what you would get by typing the date into the user interface.
CollectionSpace’s internal date parser’s behavior is specified here, and it makes some odd/unexpected decisions (at least for someone who is more used to dealing with standard date formats). One example: two digit years are assumed to literally be two digit years. So an acquisition date of “1/7/19” gets parsed as “0019-01-07” (yyyy-mm-dd). Another: “2001-03”, which I commonly see used to record “March 2001” is parsed by CollectionSpace as the date range: “0003-01-01 - 2001-12-31”.

Info

The “Config” box on the New Batch creation page allows you to change some aspects of how the CSV Importer behaves and interprets your data. This includes some date parsing settings such as date format (whether 3/4/2020 is March 4 (the default) or April 3), and two-digit year handling (whether 1/21/19 is in the year 19 or the year 2019 (the default)).

We are working on some more powerful date parsing tools on the backend/migration side (available, mostly-working options here), and would eventually like to make this parser available as a configurable expert user option in the CSV Importer.

Osteology inventory details cannot be batch imported

This only affects users of the Anthropology domain profile (and any custom profiles based on it).

The form interface for this section of the Osteology procedure is graphically and technically structured very differently than other CollectionSpace record data. It contains hundreds of cryptically named fields that would be extremely unwieldy to manage in a tabular format.

Outcome: Not feasible to support; will not be added

...

Repeatable fields populated by multiple authorities - order of terms from different authorities

Info

A full list of fields affected by this issue is available here. Look for the “multi_auth_repeatable_fields_#_#.csv” file for the current version in the file listing and click on it.

You can view a rendering of the CSV in your browser, but if you want to be able to filter the list by profile and record type (recommended), click on the “Raw” link and do “File > Save” from your browser. You should be able to open the resulting file in your spreadsheet application of choice.

When entering data manually in the application, you can create the following in an Acquisition procedure:

...

It is currently not possible to achieve this via a batch import because of limitations introduced by the fact that Source can be populated from Organization/Local or Person/Local authorities. Because the CSV Importer needs to know which authority a term belongs to in order to look up the proper refname URN (see Info box under Populating fields in other records with authority terms section above for more details), there are two columns in the CSV template for Source values.

...

When there are multiple columns that can be sources for the same field, the data processor/transformer used by the CSV Importer always processes the data columns in left-to-right order. Thus, for the Acquisition Funding > Source field, Person values are handled before Organization values. The row for TEST1 results in the following, which is very wrong if Carmen only contributed 50 Euros:

...

The person column is processed first, so it gets lined up with “US Dollar” and “5000”.

CollectionSpace itself will export CSV data as seen in the row for TEST2. This does not have the desired effect:

...

Note

If you are round-tripping data in the affected fields, the effects could be quite bad.

The CSV Importer’s data processor/transformer sees this as:

  • the first Source field value is blank

  • the second Source field value is Carmen SanDiego

  • the third Source field value is University of Place

  • the fourth Source field value is also blank, and there are no fourth values for any other fields in this group, so I’m not adding an empty group/row

Note

The order of columns in the CSV templates indicates the order of processing; it does not control the order of processing

Both the CSV templates and the data processing/transformation code used by the CSV Importer are based on how fields and values are defined for a record in the CollectionSpace UI code. So, moving the acquisitionFundingSourceOrganizationLocal column so that it comes before acquisitionFundingSourcePersonLocal will not solve this issue. Only changing the application’s UI field definitions would change the order of processing.

Info

The known solutions to this issue require changing how you enter data in the CSV for import. The new way of entering data would be more complex and easy for humans to mess up. Since this issue does not affect most uses of the CSV Importer, we have opted to keep the simpler data entry conventions.

The alternate data preparation/import described in the Info box under the Populating fields in other records with authority terms section above was actually developed and used successfully to address this issue in one institution's complex use case. If this is a blocker for you, please refer to that Info box and get in touch.

...

Repeatable fields populated by multiple authorities - order of terms from different authorities

Info

A full list of fields affected by this issue is available here. Look for the “multi_auth_repeatable_fields_#_#.csv” file for the current version in the file listing and click on it.

You can view a rendering of the CSV in your browser, but if you want to be able to filter the list by profile and record type (recommended), click on the “Raw” link and do “File > Save” from your browser. You should be able to open the resulting file in your spreadsheet application of choice.

When entering data manually in the application, you can create the following in an Acquisition procedure:

...

It is currently not possible to achieve this via a batch import because of limitations introduced by the fact that Source can be populated from Organization/Local or Person/Local authorities. Because the CSV Importer needs to know which authority a term belongs to in order to look up the proper refname URN (see Info box under Populating fields in other records with authority terms section above for more details), there are two columns in the CSV template for Source values.

...

When there are multiple columns that can be sources for the same field, the data processor/transformer used by the CSV Importer always processes the data columns in left-to-right order. Thus, for the Acquisition Funding > Source field, Person values are handled before Organization values. The row for TEST1 results in the following, which is very wrong if Carmen only contributed 50 Euros:

...

The person column is processed first, so it gets lined up with “US Dollar” and “5000”.

CollectionSpace itself will export CSV data as seen in the row for TEST2. This does not have the desired effect:

...

Note

If you are round-tripping data in the affected fields, the effects could be quite bad.

The CSV Importer’s data processor/transformer sees this as:

  • the first Source field value is blank

  • the second Source field value is Carmen SanDiego

  • the third Source field value is University of Place

  • the fourth Source field value is also blank, and there are no fourth values for any other fields in this group, so I’m not adding an empty group/row

Note

The order of columns in the CSV templates indicates the order of processing; it does not control the order of processing

Both the CSV templates and the data processing/transformation code used by the CSV Importer are based on how fields and values are defined for a record in the CollectionSpace UI code. So, moving the acquisitionFundingSourceOrganizationLocal column so that it comes before acquisitionFundingSourcePersonLocal will not solve this issue. Only changing the application’s UI field definitions would change the order of processing.

Info

The known solutions to this issue require changing how you enter data in the CSV for import. The new way of entering data would be more complex and easy for humans to mess up. Since this issue does not affect most uses of the CSV Importer, we have opted to keep the simpler data entry conventions.

The alternate data preparation/import described in the Info box under the Populating fields in other records with authority terms section above was actually developed and used successfully to address this issue in one institution's complex use case. If this is a blocker for you, please refer to that Info box and get in touch.

Outcome: Given that the CSV Importer cannot correctly import data that is exported from CollectionSpace (round-tripping), this is considered a BUG and needs to be fixed. However, it is a complex thing to solve and currently have no timeline for addressing it.

Structured date fields

A number of these are not limitations/issues as much as they are things to be aware of about how dates are handled.

  • The structured date fields in the CSV templates are typically named ending with “Group”: acquisitionDateGroup, objectProductionDateGroup, etc.

  • In this column, you enter the date string that you would type into the date field in the user interface.

  • The processing step attempts to parse the date string.

    • If date parsing is successful, the date is transformed into the underlying detailed structured date fields you see in the drop-down interface in the user interface.

    • If date parsing is unsuccessful the string is retained in the dateDisplayDate field and will be visible in the record, much as happens if you enter an unparseable date manually.

Panel
panelIconId1f4a1
panelIcon:bulb:
panelIconText💡
bgColor#FFFAE6

You can find unparsed dates in your CollectionSpace instance using an advanced search where All of the following conditions must be satisfied:

  • {Date field} - Represented dates computed is (unchecked/blank box)

  • {Date field} - Display date is not blank

Panel
panelIconId274c
panelIcon:x:
panelIconText
bgColor#FFEBE6

Currently, you cannot import the individual fields within a structured date field group directly. This means you cannot set Period, Association, Note, Certainty, or Qualifier values via the CSV Importer.
We understand these fields to be fairly rarely used, but if you have needs/use cases for this, we’d be interested in hearing them.

Note

The CSV Importer’s date parsing algorithm does not behave exactly the same as the one applied when you type a date manually in the application.
For many common date formats, the CSV Importer runs a fast date parser of its own that does not require making a call to /wiki/spaces/DOC/pages/2930128484 (which is quite slow). If the Importer’s own date parsing can’t figure out the date, it makes the API call, which returns exactly what you would get by typing the date into the user interface.
CollectionSpace’s internal date parser’s behavior is specified here, and it makes some odd/unexpected decisions (at least for someone who is more used to dealing with standard date formats). One example: two digit years are assumed to literally be two digit years. So an acquisition date of “1/7/19” gets parsed as “0019-01-07” (yyyy-mm-dd). Another: “2001-03”, which I commonly see used to record “March 2001” is parsed by CollectionSpace as the date range: “0003-01-01 - 2001-12-31”.

Info

The “Config” box on the New Batch creation page allows you to change some aspects of how the CSV Importer behaves and interprets your data. This includes some date parsing settings such as date format (whether 3/4/2020 is March 4 (the default) or April 3), and two-digit year handling (whether 1/21/19 is in the year 19 or the year 2019 (the default)).

We are working on some more powerful date parsing tools on the backend/migration side (available, mostly-working options here), and would eventually like to make this parser available as a configurable expert user option in the CSV Importer.

Osteology inventory details cannot be batch imported

This only affects users of the Anthropology domain profile (and any custom profiles based on it).

The form interface for this section of the Osteology procedure is graphically and technically structured very differently than other CollectionSpace record data. It contains hundreds of cryptically named fields that would be extremely unwieldy to manage in a tabular format.

Outcome: Not feasible to support; will not be added

...