CSV Import Tool: Workflows: CSV Import, Validation, and Transfer/Update of New/Existing Records
User stories
User can upload a CSV file to the CSV Import Tool
User can receive validation message from CSV Import Tool (valid/invalid)
User can receive data check message from CSV Import Tool (new/update/warnings)
User can export CSV with warnings (row/field/type) noted
User can transfer all, new only, or updates only to CollectionSpace
Summary
User can create or update CollectionSpace records via the CSV Import Tool
Rationale
Users need a way to bulk create and update records in CollectionSpace.
Users
CollectionSpace power users with permission to use the Tool.
Basic Course of Events
User interface/GUI application | Technical/under the hood | Comments/questions | |
---|---|---|---|
1 | User logs in to the CSV Import Tool and creates a new Import/Update Batch |
| Delete Batch is a separate workflow. v2 option: Implement projects to group together batches. If implemented, there would be intermediate steps: create a new Project or select existing Project in which to create Batch. |
2 | Batch creation screen: User selects which connection to use |
|
|
3 | Batch creation screen: User selects record type for the batch from a dropdown |
| Constrained by profile/version set in connection or group default. |
4 | Batch creation screen: OPTIONAL: User may override default import configuration and specify batch-specific import configuration options. v1 - User can paste in a JSON config v2 - GUI form for specifying/displaying default configs that will create JSON under the hood |
| v1 - global config that can be overriden @ batch level by pasting in JSON. Default import config options include:
Batch-specific import config options include:
v2 or beyond: The same default field values may be helpful across multiple batches. For example, the same defaults may be used for any load of Nomenclature concepts or ULAN names. Consider a way to save/reuse configs in the UI. |
5 | Batch creation screen: User browses to and selects the CSV file they want to upload |
|
|
6 | Batch creation screen: User clicks UPLOAD button | Importer Application:
|
|
7 | Unless CSV parsing fails completely, user is shown indication of progress |
| Not sure how overall progress is to be determined when multiple different processes are being run, especially if they happen concurrently? |
8 | Post-import screen: If Importer Application was unable to parse the CSV file at all:
User cannot move forward in workflow from here. If Importer Application was able to parse the CSV file:
|
| (1) We typically run into this situation when a CSV contains invalid UTF-8. The average user cannot find one bad UTF-8 character in a CSV, so an informative error would say what row and field the character appears in. Another issue that can arise here is that quote escaping can get messed up such that the CSV is unparseable. (2) Bad quote escaping and/or the inclusion of various control (end of line) and whitespace characters within field values can cause a CSV to be parsed successfully but incorrectly. You may end up with fewer rows than expected or more rows than expected. |
9 | User clicks VALIDATE button | Importer Application:
Mapper::DataHandler:
Importer Application:
If first row is valid
|
|
10 | Post-validation screen: If Importer Application found a “Required field missing” error:
User cannot move forward in workflow from here. If Importer Application found any “Required field empty” errors:
User cannot move forward in workflow from here. If Importer Application found no errors:
|
|
|
11 | User clicks PROCESS button | Importer Application:
Mapper::DataHandler:
Importer Application:
| (1) Transforms currently include:
(2) Currently, quality checks result in warnings if:
On the roadmap:
(3) This handles, for example, putting ownerPerson and ownerOrganization values (which need to be imported and transformed to refnames separately), into one field for transformation to CollectionSpace XML. (4) We may never surface this in the UI, but it will be invaluable for debugging unexpected Mapper results. KS: I may disable this by default but add an debug option to the process method (5) I.e. objectNumber value for a CollectionObject, or shortIdentifier value for a Person (6) We are able to map even data with lots of warnings to a valid CollectionSpace XML document, illustrating why these warnings should not block the workflow moving forward. (7) Note that the purpose of this is not to flag duplicate values from the CSV data that got turned into the CollectionSpace identifier (though it will do that). That is easily handled and should be done in the process of preparing the CSV. (8) We discussed whether there should be an option to turn this off to speed up the process if the User is certain all the records are new or existing. |
12 | User is shown progress indication while Records are processed |
|
|
13 | Post-processing screen: User is returned a report that clearly indicates:
This screen also displays a clear message if any duplicate identifiers were found in the set. |
|
|
14 | Post-processing screen: This CSV contains all rows that were imported. It always adds two columns:
If there are warnings, one column is added per warning type. The column header is the warning type value. Each row representing a record with that warning type has the warning message value in that column.(3) |
| Rick: Any need to specify file naming/versioning conventions for these? (1) User may want to remove records flagged as new or existing before proceeding, whether there are warnings or not. Given that we won’t be displaying the list of records in the UI, this is also how you would find which 3 of the records in a large set of “new records” are unexpectedly flagged as existing. (2) If the duplicate identifier warning was displayed, User can find the sets of records ending up with duplicate identifiers via conditional formatting or pivoting on this column (3) This allows easy/clean filtering by warning, which simplifies the process of tasks like:
Consider adding ability to export CSV for only certain sets of the records for v2:
|
15 | Post-processing screen:
User can check one or both of the boxes. If at least one box is checked, DO THE THING button is activated |
| I (KS) am suggesting here that User can add/update records with warnings! We have discussed whether allowing updates with warnings could be turned on/off at the group level, or possibly managed per user level within a group (i.e. maybe admin and manager can update records with warnings, but members cannot?) Warnings were intended to be an opportunity to verify that your data is as expected, but not block you from adding/updating records. My use cases:
|
16 | User clicks DO THE THING button. If there are warnings (either duplicate identifier or on individual rows), some manner of “ARE YOU SURE?” confirmation happens and the user has to verify they want to proceed. | Importer Application:
|
|
17 | User is shown a progress indicator |
|
|
18 | Post-transfer screen: User is shown:
User can click on a link to download CSV of just the failures. CSV includes all original data columns, plus a FailureMessage column.(1) User also can click on a link to go back to the Post-processing screen.(2) |
| (1) User will need to examine and possibly re-process this data and only this data (2) User may prefer to initiate add new vs. update existing processes separately. Or they may have decided they wanted to get the CSV after all. |
19 | And then…. |
| Should workflow for managing/deleting batches be separate? |
Post-conditions
Users are able to view new or updated records in CollectionSpace
Users are able to confirm that they have removed records from CollectionSpace