CSV Import Tool

Objective

Provide CollectionSpace administrators, data migration specialists, and advanced users a means of importing and updating CollectionSpace records using CSV files.

Deliverables

Develop, test, and document CSpace configuration untangler
Develop, test, and document CSpace mapper
Finalize functional requirements for CSV Import Tool
Design, develop, test, and document CSV Import Tool UI
Connect CSpace Mapper to CSV Import Tool UI

Priorities

Focus for v1 release of the new tool will be:
- Connection with new CSpace Mapper backend
- Import functionality “parity” with existing LYRASIS-developed tool + features required by the Mapper
- New functionality around users, noted in the below list in the Admin context
User Story Candidates can be found on the page at this link.
- Preferences for both contributing organizations are captured in 2 columns below, where 1 means “high priority”, and 3 means “low priority”
- NB: UCB and LYRASIS / the wider CollectionSpace community will share most but not all feature requirements

Workflows

Workflows: System Administration - User management, Group management
User Accounts - Accounts, Authentication, Groups, Connections
Add/Update/Delete Records - Upload, Validate, Data Check, Map, Transfer or Delete

Data Flow Sketch (import/update) - IN PROGRESS

CSV ingest
1. Options
  1. Set batch-specific import settings
2. Errors/outcomes
  1. File cannot be read/parsed (everything fails)
  2. Invalid character encoding that does not render entire file unreadable (report row/column containing problem)
3. Code actions
  1. Split into separate rows and assign RowID corresponding to row sequence to facilitate errors/warnings that can usefully reference a row from the imported CSV
4. Display
  1. Count of successfully imported rows
5. User actions
  1. Initiate initial validation and record status check
Initial validation
1. Code actions, errors, outcomes
  1. Creates Mapper::DataHandler connection/object for use with the batch
  2. FIRST ROW sent to Mapper for validation (DataHandler.validate(data_hash))
    1. If it gets a “Missing Required Field” error
      1. Do not send any more rows
      2. Tell user that CSV is missing a required column. They need to add that and reimport
      3. WORKFLOW STOPS HERE
    2. If it gets no error or a “Required Field Empty” error, continue sending rows for validation
  3. As additional rows are returned from the Mapper with no errors, tag them as “New” or “Existing”
2. Result display
  1. Count of valid and invalid records
  2. Count of new and existing records
  3. Columns - (f) indicates filterable column
    1. RowID
    2. RecordID value
    3. (f) Valid? (y/n)
    4. (f) Status (new/existing)
3. User actions
  1. Filter or select records and…
  2. Export selected to CSV (will do that and return to or remain on this view)
  3. Remove selected from batch
  4. Initiate Data Preparation
Data Preparation
1. Code actions, errors/warnings, outcomes
  1. Multivalued fields are split, transformations are applied to field values, data quality checks are run

User interaction and design

Open Questions

Question	Answer
How will uploaded data merge with or replace existing data, esp. for repeatable fields? (e.g. duplicates? will new values always replace existing values, or will they in some case supplement them?)
Will there be “undo” and if so how will it work?
How will long-running batches be handled? Esp. the case of a long-running batch that fails in the middle?
Will there be a limit on the number of records in a batch? The duration of a batch job? A way to cancel?
From user story: User can opt to proceed with mapping data that receives a warning.	Need to discuss

Out of Scope

Creation of stub vocabulary/authority records with option to transfer them to CollectionSpace, when non-established terms are used in Object or Procedure records. (This might be revisited in a later release)