This is a top level page for documentation related to web applications that UC Berkeley is developing for its CollectionSpace deployments.
Extending CollectionSpace with Django-based webapps
The UC Berkeley CollectionSpace deployment team has extended the functionality of CollectionSpace using lightweight web applications built within the Python-based Django framework. Berkeley has created a reusable Django project that authenticates against CollectionSpace, providing a starting point for further app development. With this code, available in GitHub, Berkeley has simplified the creation of custom CollectionSpace webapps.
At this point, UC Berkeley has produced a dozen such webapps. Screenshots for a few of them are shown below:
- A public search portal for the UC and Jepson herbaria collections. The UCJEPs search portal opens up the herbaria's vast collection of plant specimens to researchers and other interested parties.
- A point-and-click interface to a variety of "non-CSpace" reports for the UC Botanical Garden. (login required)
- A quick-and-easy interface for PAHMA to upload images and associate them to collection objects. (login required)
- A quick-and-easy generalized search interface (to be used by PAHMA and UCBG initially.) (NB: login required in both cases)
A few details about these webapps
The UCJEPS search portal queries a Solr datastore that holds a nightly export of data from CollectionSpace. The speed of Solr and the denormalized nature of the datastore allow the portal to return 100s of records in seconds.
Included within the search portal is a second webapp called imageserver
. Imageserver
makes authenticated requests to the UCJEPS CSpace instance to access specimen images. For each search result returned, i.e, for each specimen (cataloging) record that matches the query parameters, A REST Services call retrieves images from related media handling records.
The Botantical Garden's iReports webapp is a "standard component" which will eventually become part of the basic suite. It provides a means to access the iReports for an institution which require parameters that CSpace proper cannot provide (i.e. non-CSID values such as text input).
The "Bulk Media Upload" webapp addresses a long-standing need to be able to upload batches of image files and connect them to collection objects. While the approach taken for the implementation supporting PAHMA requires adhere to specific conventions (e.g. the image files must be named using the exact value for the Museum Number of the object), the actual application is tiny and easy to apply to other deployments.
Background
John Lowe developed a set of applications in 2012-2013 in Python using CGI for PAHMA in order to meet some rapidly evolving needs related to the major move that museum is conducting. In about March 2013, the UCB team decided that it was time to select a more enabling framework for web applications and build an environment that would provide an excellent platform for web applications that connect to our CollectionSpace instances. The framework selected was Django. Richard Millet then built a project using Django's "authentication backend" to permit apps to authenticate with CollectionSpace servers. That project is called cspace_django_project
.
The cspace_django_project
serves as the starter project for local, custom CSpace-Django projects. Using Git and GitHub, local CollectionSpace instances can fork the code to their own repository, clone it, and create a custom project – containing one or more web applications – by making modifications to the clone. The cspace_django_project
, a fork of which will reside unchanged in each deployer's repository, can serve as the conduit for general bug fixes and enhancements.
A set of wiki pages, linked to in the next section, documents the procedures involved in extending a CollectionSpace instance using Django webapps.
Installation
- Creating and Maintaining Django Webapps for CollectionSpace
This is the top-level page describing how to manage the entire process. Start reading here. - Details on installing and using Pycharm for webapp development
Preparing your computer to be a development platform: installing prerequisites and PyCharm. This will be your first action item. - CSpace-Django webapps - setting up a code repository in GitHub
Preparing a Git project for working on a brand-new CollectionSpace UCB Django project. - CSpace-Django webapps - setting up Python and Django in a development environment
Preparing your computer to be a development platform: installing prerequisites and PyCharm - CSpace-Django webapps - setting up Solr4
How to setup a Solr4 server for use with some of the UCB Django webapps. - CSpace-Django webapps - setting up a production environment using Apache
FOR SYSADMINS ONLY: Preparing a server computer to run production webapps: installing Apache, Django, and Django-Apache integration. - CSpace-Django webapps - setting up configuration for your project
Deploying from your Git project to a server computer that can run webapps (has some overlap with the end of the previous document)
(Editor's note: We plan to fold this document into the previous one, "...setting up a production environment using Apache")
Functionality requirements
- Security: Web apps must prevent SQL injection. Must run under https.
- Some applications will require login with CSpace credentials. Others will be public portals that will use a proxy CSpace login with appropriate permissions.
- Searching
- Will be supported using Solr, Postqres, or NXQL queries as appropriate..
- Performance will certain need to be considered in how queries are done.
- We will need to allow hierarchical searching (e.g., "find all specimens within the genus Phlox", "find all artifacts from Colombia").
- Term completion or type-ahead will be needed in some search fields.
- Images: images may or may not need to be publicly accessible (depending on the webapp). This issue will require some analysis.
- Save results as data file: list results should be available as a text (.csv) file for download.
Open questions
- Do web apps run on the CollectionSpace application server or on separate VMs?
- Do we query the Nuxeo database directly or build out a snapshot elsewhere?
- Django has an ORM. Should we use it, or write raw SQL? There seems to be some significant discussion of the advantages and disadvantages (e.g. vs. SQLAlchemy, here, here).
- If performing SQL queries directly, credentials need to be proxied and secured. Postgres views can provide some isolation of the data.
- Do we need to perform pagination of large search results? What do other sites do?
- For our first prototype application, should we demonstrate hierarchical searching, or should we start with the simplest scenario
- When there are multiple images related to a collection object, should we show only the "preferred" image (PAHMA customization?) or show them in some order with a prev-next widget?
Design and Implementation Links
The following links illustrate some of the efforts to implement Django-based webapps that support CSpace deployments
- Preliminary wireframes for a prototype UCJEPS portal, with notes about UCBG differences.
- Design and implementation of the Bulk Media Upload facility, currently used by PAHMA.
- "Generalized Web Portal" design and implementation.