Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

Objective

Provide CollectionSpace administrators, data migration specialists, and advanced users a means of importing and updating CollectionSpace records using CSV files.

Deliverables

  • Develop, test, and document CSpace configuration untangler

  • Develop, test, and document CSpace mapper

  • Finalize functional requirements for CSV Import Tool

  • Design, develop, test, and document CSV Import Tool UI

  • Connect CSpace Mapper to CSV Import Tool UI

Priorities

  • Focus for v1 release of the new tool will be:

    • Connection with new CSpace Mapper backend

    • Import functionality “parity” with existing LYRASIS-developed tool + features required by the Mapper

    • New functionality around users, noted in the below list in the Admin context

  • User Story Candidates can be found on the page at this link.

    • Preferences for both contributing organizations are captured in 2 columns below, where 1 means “high priority”, and 3 means “low priority”

    • NB: UCB and LYRASIS / the wider CollectionSpace community will share most but not all feature requirements

Workflows

  • Workflows: System Administration - User management, Group management

  • User Accounts - Accounts, Authentication, Groups, Connections

  • Add/Update/Delete Records - Upload, Validate, Data Check, Map, Transfer or Delete

Data Flow Sketch (import/update) - IN PROGRESS

  1. CSV ingest

    1. Options

      1. Set batch-specific import settings

    2. Errors/outcomes

      1. File cannot be read/parsed (everything fails)

      2. Invalid character encoding that does not render entire file unreadable (report row/column containing problem)

    3. Code actions

      1. Split into separate rows and assign RowID corresponding to row sequence to facilitate errors/warnings that can usefully reference a row from the imported CSV

    4. Display

      1. Count of successfully imported rows

    5. User actions

      1. Initiate initial validation and record status check

  2. Initial validation

    1. Code actions, errors, outcomes

      1. Creates Mapper::DataHandler connection/object for use with the batch

      2. FIRST ROW sent to Mapper for validation (DataHandler.validate(data_hash))

        1. If it gets a “Missing Required Field” error

          1. Do not send any more rows

          2. Tell user that CSV is missing a required column. They need to add that and reimport

          3. WORKFLOW STOPS HERE

        2. If it gets no error or a “Required Field Empty” error, continue sending rows for validation

      3. As additional rows are returned from the Mapper with no errors, tag them as “New” or “Existing”

    2. Result display

      1. Count of valid and invalid records

      2. Count of new and existing records

      3. Columns - (f) indicates filterable column

        1. RowID

        2. RecordID value

        3. (f) Valid? (y/n)

        4. (f) Status (new/existing)

    3. User actions

      1. Filter or select records and…

      2. Export selected to CSV (will do that and return to or remain on this view)

      3. Remove selected from batch

      4. Initiate Data Preparation

  3. Data Preparation

    1. Code actions, errors/warnings, outcomes

      1. Multivalued fields are split, transformations are applied to field values, data quality checks are run


User interaction and design

Open Questions

Question

Answer

How will uploaded data merge with or replace existing data, esp. for repeatable fields? (e.g. duplicates? will new values always replace existing values, or will they in some case supplement them?)

Will there be “undo” and if so how will it work?

How will long-running batches be handled? Esp. the case of a long-running batch that fails in the middle?

Will there be a limit on the number of records in a batch? The duration of a batch job? A way to cancel?

From user story: User can opt to proceed with mapping data that receives a warning.

Need to discuss

Out of Scope

  • Creation of stub vocabulary/authority records with option to transfer them to CollectionSpace, when non-established terms are used in Object or Procedure records. (This might be revisited in a later release)

  • No labels