CSV Import Tool: User Story Candidates

Please note, these stories may be supported by the CSV import tool, this is not a definitive list of what will be supported.

Requirements

  • Preferences for contributing organizations are captured in 2 columns below, where 1/v1 means “high priority”, and 3 means “low priority”

  • NB: UCB and LYRASIS / the wider CollectionSpace community will share most but not all feature requirements

Context

User Story

Notes/Questions

UCB Priority

LYRASIS/ Community Priority

Context

User Story

Notes/Questions

UCB Priority

LYRASIS/ Community Priority

1

Admin

Sysadmin can install and maintain a single deployment of the CSV Import Tool that can be used by many CSpace users/instances

JL: Hmmm… considering UCB, would that be one instance for all 5 museums? Not necessary for UCB.

MC: Could be 1 instance for UCB, or 5.

 

v1

2

 

Sysadmin can create, update, and inactivate all User accounts

 

 

v1

3

 

Sysadmin can create a session as another user in the system, allowing an an admin to reset a user’s password, etc.

 

 

v1

4

 

User can create or edit a single User account

 

 

v1

5

 

Sysadmin can create and edit all Group accounts

MC: Can assign users to groups, but there should also be a (configurable) system to assign a user to a group on sign up.

 

v1

6

 

User can create and edit a single Group account

MC: Only allow admins to create groups (at least initially).

 

Won’t have

7

 

Admin can create a Connection between the Tool and any CSpace instance

 

 

Won’t have

8

 

User can create a Connection between the Tool and one or more CSpace instance to which they can authenticate

MC: To use the system a user will need to create at least one “connection” to cspace, but can create more (i.e. single user creates staging / prod connections).

 

v1

9

 

Sysadmin can grant a User access to a Connection

MC: use of connections should be user specific.

 

Won’t have

10

 

User can grant a single Group access to a single Connection

MC: we probably need to define groups better, and decide if we want / need this. I currently view a group as a way to connect multiple users so they can see each others records (batches etc.). An admin can see / do everything, but if a user is not an admin they can either be a “Manager” or “Member” of a single group. A member can see content from their group, but only directly create / update their own stuff. A manager is more like an admin but group scoped only. This would allow us (LYRASIS) to have groups like “OHC” and “Breman” with users who can see what their colleagues have done / are doing, and group managers who can enable / disable group users and so on.

We could also consider abandoning the group system and have a binary authorization implementation differentiating only between admin (see / do all) and regular users (see / do own stuff). This is simpler and seamless in terms of “sign up and start using”, but wouldn’t allow users from the same “group” (organization) to see what each other is doing in the app ... if that’s important … as each user would be sandboxed from every other user (except for admins). They’d have to share logins or show each what is up in a browser.

 

Won’t have

11

 

Admin or Manager can upload the mapping configuration for a Connection

 

 

v1

12

Import/Update

User can create import/update job for a given record type in a given profile by uploading a properly formatted CSV file

 

1

v1

13

 

User can import new records to a CollectionSpace instance

 

1

v1

14

 

User can update existing records in a CollectionSpace instance

 

1

v1

15

 

User can delete existing records in a CollectionSpace instance

UCB: not needed by us, and might be a dangerous feature.

3

v1

16

 

User can ingest a CSV file containing rows representing both records to be newly created and existing records to update

Data will be imported, validated, transformed, checked for quality issues the same way for all, and flagged in the system as whether it will be a new record in CSpace or an update to an existing record

3

v1

17

 

User can ingest a CSV file containing rows representing records to delete. (A batch record deletion workflow is treated as a different process in the system than an import/update workflow)

Deletion workflow is a separate workflow, requiring only a record ID match. No data quality checks or mapping will be done.

Question: does there need to be some form of validation/messaging that transparently says “No record with ID 123.456 found - nothing deleted”, or should the found records be deleted and others just skipped?

3

v1

18

 

As part of an import/update workflow, User can select one or multiple records and remove them from the batch

This has no effect on existing records in CSpace

3

Won’t have

19

 

As part of an import/update workflow, User can select one or multiple records flagged as already existing in CSpace and initiate deletion of those records from CSpace

This process also removes the records from the processing batch

3

Won’t have

20

 

After ingesting a CSV file for import/update, User can clearly see how many records are new and how many will update existing records.

 

3

v1

21

 

After ingesting a CSV file for import/update, if there are any records in the batch with duplicate record IDs (or Short IDs, for authorities/vocabulary terms), User is warned and can view a duplicate report from which they can select records to remove from the batch

The current design for the Mapper has it only knowing about one row of data at a time. So this checking for ID collisions across a batch needs to be handled in the import tool.

3

v1

22

 

After the initial ingest, data checks, and mapping, User can separately initiate just the import of new records into CSpace, just the update of existing records in CSpace, or both in one step.

Here I am imagining something like three buttons: Add New, Update Existing, Do Both

3

v1

23

 

User can specify (and save) default import batch preferences that will be applied to all their import jobs unless other options are selected. If they do not specify default preferences, the system defaults will be applied.

These include: repeating field value delimiter, repeating field-in-subgroup delimiter, date order (month-day-year or day-month-year). These preferences are not specific to a given record type, profile, or job.

 

v1

24

 

User can specify (and save/reuse) profile/record type/job level preferences that may override their default preferences.

These include default field values and field-specific data transformations, as well as the default batch preference

 

v1

25

 

User can validate a CSV file against an existing import Profile

This validation checks that required fields are included in CSV and populated in all rows. Presence of invalid data means the data cannot be mapped. Workflow stops until data is remediated.

1

v1

26

 

User can receive clear error messaging for invalid data

Error specifies what field is missing, and, if field is not missing but is unpopulated, says what row is missing the value

1

v1

27

 

User can receive clear data quality warnings prior to the mapping process. “Clear” means the user is told what field and what row contains the possible issue. User can opt to proceed with mapping data that receives a warning.

These warnings include: use of non-established values in option-list source fields; use of non-established vocabulary or authority terms; data type mismatches; uneven numbers of values in repeating field groups

1

v1

28

 

After receiving validation errors and/or data quality warnings, user can export CSV of the rows having errors or warnings.

This CSV contains the original rows/columns, plus two additional columns: errors and warnings

1

v1

29

 

User can export CSV of the rows having NO errors or warnings.

 

1

v1

30

 

If user receives validation errors and/or data quality warnings, they can choose to proceed with processing only the rows without errors, or only rows without errors or warnings.

 

1

v2

31

 

User can generate an CSV import template for each record type supported by the import tool

 

1

 

32

 

User can view a progress indicator for imports or updates in progress

 

1

 

33

 

User can delete a file that has been successfully imported

 

 

 

34

 

User can “redo” or “undo” any step in the process

JL: except to the extent that changes are necessarily permanent??

1

v1

35

 

User can mark a data import batch as “complete and immutable” (i.e. “archived”)

JL: What should happen in that case? Should the batch be hidden somewhere to reduce clutter?

1

 

36

 

User can obtain stats/analytics about their batch(es): count, type/token counts, report on valid/invalid values, etc.

 

1

v1

37

 

User can (directly) add new authority and vocabulary terms in support of adds and updates of records in a batch that use them.

 

2

v2

38

 

Users can download missing/invalid terms from a batch

JL: in order to analyze/correct/re-upload them.

1

 

39

 

Users can download valid terms and other values from a batch

 

1

 

40

 

On every screen where some set of records is listed/displayed, User can click “Select All”, “Select None” or individual checkboxes to select individual records for operations such as Export to CSV, Remove from this batch, etc.

 

3

Won’t have