CSpace Deployments System Performance Review, May-June 2013

Background

In May of 2013, Rick Jaffe of the UC Berkeley Deployment team conducted a series of short interview/observation sessions with users of our campus deployments. Participants were each asked to perform several of the processes that they do regularly using CollectionSpace. The goal was to determine which areas of CollectionSpace performance require our greatest attention.

Rick interviewed the following people from these institutions:

UC & Jepson Herbaria (UCJEPS): Andrew Doran, Ana Maria Penny & Kim Kersh, Clare Loughran
Phoebe A. Hearst Museum of Anthropology (PAHMA) - Natasha Johnson, Linda Waterfield, Ryan Gross
UC Botanical Garden (Botgarden) - Barbara Keller (yet to be held)

The conversations generated a variety of information and insights, such as:

  • approximate timings of page loads, autocomplete searches, etc.
  • examples of issues that frequently arise
  • workflows employed in the use of Collectionspace
  • Compliments and complaints: there is growing satisfaction with the application, but major areas that users avoid
  • Rick's notes on behaviors encountered; one example: common difficulties with advanced search functionality.

Notes

Notes from these sessions are attached to this page. Each interview has been transcribed to a Word document. The transcriptions have also been compiled into a single table. The table is broken into four columns, each with a different type of content, for easy scanning. The columns are: "Measurable observations," "Process notes," "User comments" and "Observer comments." This table is available in PDF format. Finally, the "take-aways" from these sessions are attached in Word doc format.

The quality of these notes range from clear to cryptic, depending largely upon Rick's understanding of the process being demonstrated. The process notes attempt to provide the context for a given action. However, some descriptions will be hard to understand without knowledge of or access to the specific customizations that UC Berkeley deployers have made. (For example, on the UCJEPS loan out record, we have added a repeatable block titled Loan Out Items, in which herbaria staff list -- one per block -- the items included in the loan. In the course of conversation, it became clear to Rick that as many as 900 items might be listed within a single loan out record.)

Key take-aways

Below are the "take-aways" from these observations, as compiled by Chris Hoffman and Rick Jaffe. These points can serve as the discussion-starter for our UC Berkeley deployment team discussions.

UCJEPS

Ana:

- Adding many loan items is slow.  She opens 50 blank ones at a time. (crashes so she saves frequently).  Is there a page load problem here?  There are in some cases 500-900 field blocks in these (though the average number of loan items is lower).  Maybe switch to creating related records, even via a web application.

Andrew:

  • Advanced search is very slow
  • Displaying records is kind of slow
  • Media upload of larger scans is slower
  • CRH: Might need to check broader context in taxonomy – taking a long time to enter Asteraceae as a broader term even though it was found in the basionym field right away.  But wait!  It then found the term in broader taxonomy right away (first found another term, beloperone, which was found right away.  Then tried asteraceae again and it was found right away.)
  • These slow/blocked searches exacerbate database performance.  Might be related to concurrency issues (app layer re-init) and/or resource contention with other processes (image uploading).

Clare:

  • Save takes 5-10 seconds (opening a second window in order to keep moving)
  • Deleting relations takes 10-15 seconds (with only two media records attached to a cataloging record).  This sounds similar to a PAHMA problem (for Ryan) that Aron investigated and found time spent was on finding ancestors.  ** look into this.
  • Finding 'names' (probably scientific names) can take 4-5 seconds
  • Check to see Yuteh's job now checks scientific name in addition to accession number.
  •  

PAHMA

Ryan:

- Generally satisfied with speed but here are some potential areas

- Huge PDF report takes >1hr to "assemble and display".  (Is that running the massive report in iReport and then pulling it up?)

- Create New from Existing: 15-20 seconds

- Note: Computed Current Location not always updating

- Add Related LMI search 15-30 seconds

Natasha:

- Batch image upload would be useful (is on Michael's list and mine)

- Note: PAHMA does have a Primary Display boolean on Media Handling.  Not being used anywhere but should be some day.

- 6 second save on MH -- no complaints though

- Record saving (MH) 15-30 seconds and is variable, based on other use of system

- Advanced search (*Texas) on provenience is slow and occasionally SWODs

- Note: Responding to request "From what sites in Texas does the museum have holdings?" is unwieldy.  This sounds like a web app request.

- Advanced Search for particular collection place > 1 minute, but this was a SWOD

- Packing list report > 2 minutes, for first attempt of day.  Are queries getting pushed out of cached results too quickly?

Linda:

- Deleting storage location (unused term) froze the system (probably related to app re-init/SWOD)

- Past performance issues, but not so bad now.

Post-interview actions/improvements

Yuteh has made improvements to her Duplicate record scripts based on Clare's comments.

Natasha, Linda and Andrew each encountered SWODs ("spinning wheels of death," i.e. the indeterminate progress spinner) while performing particular actions, usually involving advanced search and/or autocomplete matching. Investigation by Ray Lee has uncovered a concurrency problem within the App-layer. (See CSPACE-5988, PAHMA-752.) Since the interviews, a fix has been instituted.

Ray Gross reported a bug in which the deletion of a Location-Movement-Inventory (LMI) records did not properly cause the computedCurrentStorageLocation field to update. A batch process had been implemented to do this update, but the process had not been initiated on the PAHMA instance. That has since been corrected.

Ray Gross and Clare Loughran also reported that deleting and adding related LMI records took 15-30 seconds. Investigation by Aron Roberts has identified a step in the SQL process for deletions that takes up the majority of this time, some 13.5 seconds out of a total 13.8 seconds, for example. The step involves ancestors being identified by the system prior to the DELETE:

SELECT "hierarchy"."id", "hierarchy"."parentid",
"hierarchy"."primarytype", "hierarchy"."isproperty",
"proxies"."versionableid", "proxies"."targetid" FROM "hierarchy" LEFT
JOIN "proxies" ON "hierarchy"."id" = "proxies"."id" WHERE
EXISTS(SELECT 1 FROM ancestors WHERE id = "hierarchy"."id" AND
ARRAY\[$1\] <@ ancestors)

Also, we have fixed the error with the Cultivated field that Andrew noted. Behind the scenes, that field had been converted in the UC Berkeley master instance from a Boolean to a string field with Yes and No options so it could be searched against. That worked wasn't completed on the UCJEPS instance.

Finally, we've added additional RAM to the UCJEPS system.