Objective
Provide CollectionSpace administrators, data migration specialists, and advanced users a means of importing and updating CollectionSpace records using CSV files.
Deliverables
Develop, test, and document CSpace configuration untangler
Develop, test, and document CSpace mapper
Finalize functional requirements for CSV Import Tool
Design, develop, test, and document CSV Import Tool UI
Connect CSpace Mapper to CSV Import Tool UI
Priorities
Focus for v1 release of the new tool will be:
Connection with new CSpace Mapper backend
Import functionality “parity” with existing LYRASIS-developed tool + features required by the Mapper
New functionality around users, noted in the below list in the Admin context
User Story Candidates can be found on the page at this link.
Preferences for both contributing organizations are captured in 2 columns below, where 1 means “high priority”, and 3 means “low priority”
NB: UCB and LYRASIS / the wider CollectionSpace community will share most but not all feature requirements
Workflows
Workflows: System Administration (User management, Group management)
User Accounts - Accounts, Authentication, Groups, Connections
Data Flow Sketch (import/update) - IN PROGRESS
CSV ingest
Options
Set batch-specific import settings
Errors/outcomes
File cannot be read/parsed (everything fails)
Invalid character encoding that does not render entire file unreadable (report row/column containing problem)
Code actions
Split into separate rows and assign RowID corresponding to row sequence to facilitate errors/warnings that can usefully reference a row from the imported CSV
Display
Count of successfully imported rows
User actions
Initiate initial validation and record status check
Initial validation
Code actions, errors, outcomes
Creates Mapper::DataHandler connection/object for use with the batch
FIRST ROW sent to Mapper for validation (DataHandler.validate(data_hash))
If it gets a “Missing Required Field” error
Do not send any more rows
Tell user that CSV is missing a required column. They need to add that and reimport
WORKFLOW STOPS HERE
If it gets no error or a “Required Field Empty” error, continue sending rows for validation
As additional rows are returned from the Mapper with no errors, tag them as “New” or “Existing”
Result display
Count of valid and invalid records
Count of new and existing records
Columns - (f) indicates filterable column
RowID
RecordID value
(f) Valid? (y/n)
(f) Status (new/existing)
User actions
Filter or select records and…
Export selected to CSV (will do that and return to or remain on this view)
Remove selected from batch
Initiate Data Preparation
Data Preparation
Code actions, errors/warnings, outcomes
Multivalued fields are split, transformations are applied to field values, data quality checks are run
User interaction and design
Open Questions
Question | Answer |
---|---|
How will uploaded data merge with or replace existing data, esp. for repeatable fields? (e.g. duplicates? will new values always replace existing values, or will they in some case supplement them?) | |
Will there be “undo” and if so how will it work? | |
How will long-running batches be handled? Esp. the case of a long-running batch that fails in the middle? | |
Will there be a limit on the number of records in a batch? The duration of a batch job? A way to cancel? | |
From user story: User can opt to proceed with mapping data that receives a warning. | Need to discuss |
Out of Scope
Creation of stub vocabulary/authority records with option to transfer them to CollectionSpace, when non-established terms are used in Object or Procedure records. (This might be revisited in a later release)