/
CineFiles-CSpace gaps
CineFiles-CSpace gaps
With Nancy's initial ranking 1-3.
- IN SCOPE: 1 Extract data for CineFiles web site: Nightly process to push new and updated data to web site. Note: It is likely that images and metadata will be made available to the BAMPFA Drupal web site via an XML feed based on denormalized data from CollectionSpace.
- IN SCOPE: 1 Extract images for CineFiles web site: Event handlers and/or batch processes to handle image workflows, including checking for valid files, create derivatives, move derivatives, and so on. Note: It is likely that images and metadata will be made available to the BAMPFA Drupal web site via an XML feed based on denormalized data from CollectionSpace.
- IN SCOPE: 1 Pagination of Media Handling records/blobs attached to a Document (collection object) record. This might include some error handling and/or a report to identify pagination and file-naming mistakes. What happens at time of CSpace work vs. extraction for CineFiles web site?
- IN SCOPE: 1 Change the display of related media in secondary tabs and right sidebar so they sort in the right order in CollectionSpace (by page number).
- IN SCOPE: 1 Change the display of related media in the media snapshot (by page number). Remember: PAHMA and UCJEPS want a preferred image checkbox on media handling.
- IN SCOPE: 1 Targets for rights access. Availability on CineFiles web site is determined currently during the extract of the denorm tables. They are more granular than the "post to public" boolean we have used on other deployments (list values: do not publish on the web, publish to UCB only, publish to higher ed, publish to world). Access is set according to publisher (org-corporate or person) but can be overridden by author (person, org-corporate, org-committee) and even by document. We will need to add a field ("default access code" or "publisher access code") for organization (which can be a publisher or author) and person (which can be a publisher or author). An additional field on Document will be needed for the occasional document-level override.
- Note: For Drupal, there will be open access (including what used to be EDU) and UCB IP addresses. Maybe need PFA-only still?
- How will the user view the default permission based on publisher (document source, which can come from person or organization) since it is over on the Organization or Person vocabulary?
- IN SCOPE: Add access code field to the disambiguation text
- IN SCOPE: 1 Structured dates: Derive early and late values from what is entered in displaydate. Assumption: Don't change the schema! Question: Should we look for newer algorithms developed in Java or Javascript?
- IN SCOPE: 1.5 Article (part of speech) handling in Film record is new field and functionality. Having sort in CSpace use the article is new. We can reuse an existing field in the Work authority for the article ("Qualifier"). Note: If put article in the qualifier field, then CSpace will sort correctly, but the title displayed would not include the article unless we create a derived field that concatenates article and title. But then having that derived title in the search results summary will force sort to work incorrectly (i.e., will include the article), right?
- Term field (Rename to "Title") = Marriage of Figaro (field becomes required)
- Qualifier field (Rename to "Article") = The (should these be tied to a controlled list?)
- Display Name field = The Marriage of Figaro (derived via batch process, make not required, make read-only)
- On search results, display the Display Name but sort by Term field.
- IN SCOPE: 1.5 Article (part of speech) handling in Document record is new functionality. We can not add a field to the Title group so we might have to replace that field group. Having sort in CSpace use the article is new (but we have a model for doing that via a derived field; see notes about sorting on film title.). Would like to sort on website - not sure if needed in CSpace.
- Note: Looking at CineFiles, sorting by document title does not seem to be working as expected anyway? On Advanced search by document, enter Moon in Doc Title. Default sort is by publication date. If change sort to doc title, there are 34 documents that appear before the properly alphabetized list starting at "Across the moon". Interestingly, the 34 documents are ones without a linked title, meaning you have to request access (I was doing this from home so I wasn't getting default campus access). Presumably, we could fix this problem on the CineFiles web site if that were a priority. Nancy reviewed other systems and found mixed results.
- Alternative 1: Live without doc title sorting that handles articles correctly
- Alternative 2: If doc title sorting by article is required, we would hide the existing Title field group on cataloging and replace it with a non-repeating field group containing three fields:
- Document title (without article)
- Document title article (should these be tied to a controlled list?)
- Document title display name (read-only, derived via batch process)
- IN SCOPE: Resolution: Alternative 2
- IN SCOPE: CineFiles/XDB presents a "publisher" subset of Names records in the Document Source field (i.e., any person, corporate, or committee in Names that has a value in the names.code access code field). CSpace does not have the ability to present a subset of a vocabulary in a field.
- Do some data analysis: How many publishers are persons? How many org-corporate are not publishers? Findings: There are 27253 persons that have null access code (only 54 with an access code). There are 4819 org-corporate names with null access code (and about 3400 with non-null access code meaning they are valid publishers).
- Option 1: Maybe duplicate person-publishers into org-corporate vocab and tie doc source pick list only to org-corporate. Note: Even just using org-corporate will have 4800 names that are not valid publishers.
- Option 2: Produce a separate Publisher vocabulary (not ideal either).
- IN SCOPE: Resolution: Use option 1. Take advantage of disambiguation so we know who is a publisher. Write a report that shows documents with sources who are not publishers. Can do option 2 later if this isn't working. Make sure CSpace has a Jira ticket for filtering term completion/vocabulary selector.
- FUTURE Jira: Probably not because Film Notes will be in drupal. 1.5 Map in Film Notes
- IN SCOPE: 1.5 Workflow to support pre-publishing of MH/Image records (by students) that need to be approved by someone before moving over to web site. Need checkbox on Document for "Cleared", Default to No. If someone like Michael does imaging, he would need to remember to "clear" his own documents. We would need a report that finds uncleared Documents; the code that publishes to the web site would need to examine this field; and we might want to add this field to the search results summary so it's very visible. Add to Advanced Search on Document too. Maybe add simpler workflow later.
- IN SCOPE: Add "Cleared" checkbox (or reuse Post to Public boolean) on Document We decided to use the existing Record Status field for this, rather than creating a new field. Record Status currently defaults to "New", and can be changed to "Approved" when cleared for publication. Chris will confirm with Nancy if these values are acceptable.
- IN SCOPE: Add that boolean Record Status to Document (collection object) advanced search
- IN SCOPE: 1.5 Themes: Not used in CineFiles-XDB, but might be a good feature if done right in CSpace. Would it be possible to combine with subject headings or genres? Needs more thought, but could help automate something done by hand now.
- Add Themes concept vocabulary (would be hierarchical in order to support sub-themes).
- IN SCOPE: 9/26/2013: Add concept vocabulary for themes, and add field to document as specified, but do not data work. Make a FUTURE Jira for data work, or BAMPFA can do by hand.
- IN SCOPE: Document disambiguation. Maybe no work in CSpace needed, but need to doublecheck what happens with docs that have the same name. Document "display name" derivation for disambiguation. This might only be needed in denorm tables and reports (i.e., no work in CSpace) When looking at a Film (works authority) record, related documents will appear in the right frame under "Used By". We can display the Document title and one additional field in the list, maybe the author, source, or year?
- Author and Title would probably be sufficient, but it's possible we would need to create a derived column that holds more information, like XDB does now.
- IN SCOPE: Author and Title should be OK for record ID and summary in search results and related records.
- To discuss with Glen ? 2 Collect more image metadata on ingest (color model, pixel depth, compression): Glen has this information now, but it's not clear it's needed. Is this relevant to future asset management? Can it be added later?
- To discuss with Nancy and Glen: 2 Copyright holder: A set of screens built in XDB that are not currently used. Functions currently used in a FileMaker db; would love to integrate with CSpace when possible.
- Glen and Nancy would like to revisit this because the Access Code model is complex. We could have a field on Document that was "Copyright Holder" and the access code for that person or organization record could determine sharing directly. How would we populate this data on migration?
- Mapping in FileMaker data and functionality might be too big for this exercise.
- IN SCOPE: 9/26 idea: Add a repeating field group Copyright Holders to Documents (copyright holder, copyright date structured, notes) to identify documents where the copyright holder is not the source (publisher). Would we migrate data for that? Going forward, would users enter both source and copyright holder? Talk to Glen about data for overrides currently.
- FUTURE: To discuss but maybe a 2. On person do we need checkboxes for: Is Director? Is fully indexed? Needs indexing? This is to support http://www.mip.berkeley.edu/cinefiles/Directors.jsp and http://www.mip.berkeley.edu/cinefiles/BrowseBayAreaFilms.jsp and http://www.mip.berkeley.edu/cinefiles/BrowseDirs.jsp Need workflow for maintaining this information. Currently this is maintained by hand. Most important is tracking the info with checkboxes and allowing info to be extracted to web site. Nice to have: Reports and workflows for identifying updates.
- Future probably
- FUTURE: 2 Asset Management procedure and data.
- Need to have a design discussion about access rights. Needed for DAM requirements that CSpace advisory group is interested in.
- NLG has a FileMaker database for rights clearance (sounds like a new procedure)
- Glen: Could have a field for Copyright Holder (on Document), but (NLG), this changes over time, so it needs to be repeatable. How would we migrate data into this?
- Future probably
- FUTURE: 2 or 3 Need a "Download image with original filename" button or batch process. Original TIFF or derived JPEG? (Done from web site so probably not important)
- Might work from CSpace already
- FUTURE: 3 Object-persistent templates (document, poster)
- FUTURE: 3 Develop an "Archive to Merritt repository" capability
- FUTURE: Stills data entry so they don't go to CineFiles; point to images in Research Hub.
- Don't do Create a Place authority for City, State, and Country combinations (on Person, Organization, and other records?)
- Done Work authority (being incorporated as a contribution to 4.0).
- Don't do (Use term completion disambiguation feature instead) Authorities: Derive display names from name parts to help with disambiguation (persons, organizations)
- Don't do. (Use term completion disambiguation feature instead) Film name disambiguation via a derived display name: In CineFiles, an alternate title is included for disambiguation (e.g., primarily if the main title is not in English, then the English translation is included to help with disambiguation). In CSpace, I we can use a non-preferred term in the disambiguation hover.