Rough notes from Developer Meeting sessions on Friday, 2012-02-10
The following notes for this session are sparse - randomly selective
Review of the Roadmap to 3.0 Development Priorities page on the wiki
Work on vocabulary and authority control
Chris P: need to inform that work to incorporate support for multiple languages
Software wizards
Talk with Chris and Nate about scripts used at the Walker
Hovering over a term in term completion - see its context in the tree
edit section on proposed additions for media handling
3rd bullet on thumbnails
to limit scope to cataloging record
At Berkeley, want to propose structured objects soon
Can't just do at Berkeley, consult with, even have help from App and UI
Perhaps do some remote paired programming
Challenges on funding and resources for work not Mellon grant funded
In principal, we can spend IMLS funds at Berkeley to work on these
Version 2.2 Place Authority at least should be far enough along
Makes sense that core dev team should be able to do reviews on contributions to core
Overhead on the project
Chris M and Yura can consult, give us feedback on approaches
but we can't ask tasks to them
Perfectly appropriate for the community to suggest substitutions, or reordering
of priorities on the development roadmap
Patrick gives Megan and Angela some idea of difficulty, time estimates
As more people know how to do more across the system, like Chris and Richard, Yura and Amy ...
More and more, Chris is going to be telling us where to go and how to do stuff
Chris will be working on restructuring of internal app code
UISpec stuff
Eliminating at least part of artificial distinction between procedures and authorities
Will help introduce hierarchy into objects, for example
Talk to Chris if you plan to do app work, because of these changes
Taking a contribution in
As an example for Place:
Chris:
I'm usually asked to integrate near the end of the sprint
That means that I'm busier at the end
Better opportunity to review early on
Would require payloads in XML and JSON
Changes to tests
Resources provided for running tests
Wiki page
On what we each need, in our layers, before reviewing a contribution
Things we want ready before we consider this
Yura:
Similar to Chris
UI is entry point for people to identify problems, new features
Even if issue isn't in the UI itself
That means that I'm busier at the end of the sprint
Better opportunity to review early on
Not much impact on code itself
Addition of some templates, message bundle, configuration file for that record
Presence of the 'fake' pseudo-app layer UISpecs and UISchemas in the 'test' directory
so can run UI app locally without any developer
Just requires grabbing it from the running app
Those are the same files that Chris needs, as well
UISpec, UISchema, dummy payloads (XML, JSON)
Code submission process with GitHub
GitHub will help with this
A true queue of pull requests
I submit a pull request, someone makes some changes, do I need to cancel and re-submit pull request?
Just merge into the branch
It's not like submitting a patch
More like a merge path between the branch the pull request was submitted from and the main branch in the repo
Can add in-line comments
If you're not happy as a reviewer, can close it and ask for rework and resubmission
We should let people know, especially outside of our immediate group, that it was received and some idea of when we might look at it
merging into a branch is work
We could treat it as a workflow, 'these six steps need to be checked off ...'
There is an ordering issue
The app can't accept anything that the services haven't committed
Functionality test in this contribution review?
My understanding, you start on the Talk list, wireframes, etc.
Different from actually using it.
Scripted test during QA
The stuff running on a server may have a lot of local extensions
In theory create another tenant
Start out by sticking into a sandbox tenancy, play with it a little on Nightly
What if the contributions aren't self contained? If they need contributions from the core, like special searching?
Will need to be raised early - an up-front thing
With the first several contributions, we'll be playing it by ear
Many contributions may be relatively low-risk
If Nightly is effectively where the core dev is going on, may need a second slice for contributions
for internal team to throw up contributions and do first QA
Contributing team has its own server
Assuming they can open up a server for that
First functionality - look at it in core tenant, or a sandbox tenant
Chris would also like to see performance logs, to ensure no fanout
Jesse: generic VM image in which to test out
Chris at Walker: how far out is it that I can give you a directory
If the core is going to own it, it needs to be a pull request with the changed files
on one, two, or three repositories on the collectionspace project
the mini-build could include some convenience targets to merge these files
back into a tree
queue of submitted contributions should be public
so everyone can see it
broader community won't care about pull requests, and instead
look at wireframes, schemas, etc.
announce on Talk list, put up on a wiki - doesn't matter where it is,
can link to it from email, some page
Use git to manage contributions
branch on own repo
If testing an in-process contribution
Will be a contrib area for things we don't own
What does a community contribution area look like?
May be another repo, within collectionspace the project
Or can just be a public repo (anywhere), "just fetch it from here"
Tension between seeing what's been contributed and having them all in one place
and not having a maintenance headache for the core team
If we had a community repo under collectionspace
can we break it down into sections?
Yura: organization concept, like collectionspace,
then repos within the organization, that have teams assigned
If a trusted organization, could all be in the collectionspace org
each in their own repos
Community repo might not need to be a branch of the core repo
Org:Cspace
UI
App
Services
Contrib
Org:CSpace-UCB
UI
App
Services
Contrib
Can't have partial pull requests; it's all or nothing.
And the file structures at some level, no matter how deeply nested, need to match
Contrib can be links to repos
Main master of forked repo won't have any contributions in there
In your best interests to keep it up to date with main collectionspace master
All the rest will be documented in the branch
If it's a heavyweight contribution, you could have a branch in each layer in your repo.
Create a GitHub project
Your master is kept synced with master on the collectionspace repo
On the contributors repo, there should be a feature branch
in ui, app, svcs
In the main collectionspace repo,
we create corresponding feature branches for each layer
Org:Cspace
UI
master
Place (forked from Org:Contributor's Place branch)
...
Org:Contributor
UI
master
Place
...
Retire feature branches once pulled fully into core and accepted
On the wiki
there's a contribs area
links to contributors project pages on GitHub
or instructions
Could tag place
Could later contribute an update to Place
Chris from Walker:
If minibuild were part of each tree
Could conceivably use submodule init
other things beyond services/app/UI layer change:
some free structure on this
import files
ETL template
Perl script to do data cleaning on AAT
etc.
Will evolve over time
Wiki page for contributions
Not by organization, necessarily
Each needs minimal metadata
Description
Institution
Contact info (probably at least email)
Version last tested with
Special system requirements, prerequisites, dependencies, install instructions
Tagging would be nice
Feedback comments go in the Comments on the page
- Patrick has an image of the whiteboard which can be attached to this page
Field level permissions
An idea for level perms
issues with it
- at UI layer, being able to hide or expose fields with templates
least secure but one way to approach it- may require association of roles with templates
- this is the easiest
- can have map of field level perms that the UI could use to hide the data, regardless of template
- here's what this user is allowed to save
- services can enforce whether a record can be saved
- no-read-access fields have to be hidden altogether in the UI
- involves real work, but do-able
- real problem is with search, if I can do a search and some records come back, I know the data is there
- may not scale, particularly with large cataloging record and n extensions
- proposing that fields come in classes
- by default, are in an un-sensitive class
- could have financial, cultural sensitivity classes, entirely user defined, including their names
- would be configured, with no UI management
- would have associated permissions with them
- could say, "this role is allowed to view financially sensitive fields", etc.
- could create keyword indexes for each class
- at search time, we can say, which indexes does this user have the rights to; can search just the indexes to which they have access
- as an alternative, run a post filter on the results, makes searches brutally slow, requires fetching objects or at least their metadata
Chris M:
Where within the services schema would you envision marking this against fields?
If services derived from app, would be changing core configuration
Another possibility, a separate section where we declare classes, and refer to the paths (XPaths?) to these fields
Chris H:
With multiple tenants?
One of the changes we're making for SaaS support is to separate repositories for each tenant
to help facilitate backup and restore
also solves the problem where at the report level, can see each others' data
Susan:
Can you still have field-level write permission?
The idea is to move away from the unit being the field, no read for some roles, read-only for other roles
No access - no read - problem is the hardest
The read-only problem is not as big a problem - the services already have support for filtering fields out of payloads when they come in
In the UI, using the same fields, but applying a disabled decorator
to make read-only templates
could traverse and make selected fields read only
currently, already have templates where a field is disabled via autocomplete if the user doesn't have rights to the authority
Chris H
One of the main use cases is the student who is supposed to enter data for selected fields in a collectionobject
could use have a template, have grayed out fields
first step, "you should only use this template"
second step, "associate roles with templates", restricts use on the UI level
TMS doesn't enforce 'no read' access via search
Nate:
if that's what we launched with, at the beginning, we'd be fine
our use case on this is the valuation problem
on valuations or insurance coverage
Chris H:
most important for us is to restrict students to data entry in selected fields
when app layer fetches an object, will also need info from the services what template to fetch, for the current user based on their role
relatively straightforward
template based on the login status of the individual, which is available first even before the object is fetched
one of the things people do want to be able to say, is that I'd like the 'coins' template for that object
personalization issue, track for each person, which template did they use to enter the object last
will need some means of fetching full template
and that would need to be restriction-checked
if there are multiple templates legal for the user logged in, which they wish to use
parametrized permissions, how to return the correct data
services also don't support setting permissions on individual objects
nuxeo is capable of doing that now, slows search down a lot
Want people to go away and 'grind on' the idea of having classes of fields
rather than per-field permissions
Chris M: think this sounds good
Knock-on issue of whether one of these fields might be a name or number field for search results summary
Chris P: this will meet our needs
Import
Susan:
After Richard's fixes, import is now accepting 5K imports routinely
at some point, need to restart Tomcat from time to time
Would like to get to 10K and perhaps more in imports
Maybe some issues with macro substitution, ampersand substitutions, etc.
Had seen these issues even with the Java Client Library in the past
Chris P:
Using Talend
Data imported directly from a MSSQL database via a Talend module
Go through transforms, merging together
For output part, were using 'Advanced XML'
Problem with not supporting multiple loops in a tree
Instead, using Java, using JAXB-generated classes
if I have repeated fields, turn them into an object list
lots of modules with code, inside of Talend
write code to map objects
originally was using services APIs to import
now using import service
just using JAXB bindings to create the XML
if anything's changed in the schema, I get a compile error
which helps me find out what to change
just import new libraries in, and see what doesn't compile
is on the main CollectionSpace wiki
Yuteh:
Generating individual files for main record and repeatable structures
Previously using Susan's 10K record merge
Doing merging
XMLMerge works for smaller files like Person
CollectionObject is too big
Richard: it probably can be made to work
Am now splitting every 3K (78 MB, not including any repeating)
have hundreds of files
if I can keep my delta in one file
can run this over and over again on the same delta file
Susan:
Have 5K, but my objects at MMI aren't as large as PAHMA's objects
Patrick:
Pulled all 600K records, denormalized into 1 million rows, for Delphi, 270 MB
Yuteh:
Wrote Java code to strip off 'easy' empty records
Chris P:
Last did this in 1.7, don't recall how many batches, wasn't too bad
Have 46K object records
Chris H:
Talend XML generator has 'create elements even if empty checkbox', is checked ('on') by default
Susan:
Required in groups and lists, perhaps in repeatables
Relations are difficult with custom extensions
Patrick:
Should be able to have generic doctype in there
Richard will think about this
Already marked with a tenant
We don't need that in the doctype
- When filtering relations, could do a stem search
Richard:
Nuxeo shouldn't care, due to its derivation model, if derived from the common doctype
Susan:
When custom tenant isn't there
Richard:
The fix will mean that you won't need to re-import the relation records; you can leave the tenant-qualified doctypes in there
Susan:
Display predicate name in relation not used; different in app layer?
Doesn't appeared to be used at all
Richard:
Dan asked for this some time ago
Chris at Walker:
Hooking up Talend right now
Nate:
Sending payloads now using the services
The downside is you can't run it again, without querying whether the object already exists
May not necessarily be a bad fit for us
There was a set of tools that could take various data sources, transform, spit out uniform
Kettle
Susan:
I assemble the XML myself in JavaScript
Kettle lets you make fragments and assemble them in JavaScript
Quick and easy
Patrick:
Talend can import a schema and generate XML in that schema
Nate:
Pre-populate CSIDs with GUIDs?
Susan:
Yes
Richard:
Easier for creating relations
Chris H:
Simple Java method to get a GUID/UUID, which you can put in your CSID in Talend
Nate:
Collection we're importing is 11K objects
Even if we have to do it again, talking to services is appealing to us
Might look again at Talend, Kettle
Our starting data is in CSV files from FileMaker Pro, I can generate good CDWA Lite data from that
Chris P:
Relations, movements ... not just objects
Susan:
By sheer number, the relations are the most
Patrick:
If use import, you can prepopulate with CSIDs, with all relations using those, etc.
If use services, you will need to retrieve what you imported to get their CSIDs
Speed difference using import - close to an order of magnitude advantage in speed over services
If you're fiddling, that speed difference can be important
A Talend script importing from CDWA Lite would be interesting to many people
Can export a job from Talend, and someone else can look at what you've done
Chris H:
Talend is great, but has its own mindset
Not always clear about what should be shared
Yuteh has been creating some great documentation; e.g. on creating relationships
Chris P:
Has a page on the main wiki about what he did in 1.7
Would be really good would be a standalone output module
You can do whatever you need on the import side
But the maintenance is quite high on that, while the schemas are changing
Might be a significant benefit in a monthly implementer's call
Problems go out on the Work or Talk list
But successes don't always get reported or discussed
Richard:
It's possible I could get the Nuxeo shell and/or Webapp installed
If you get the Nuxeo DM webapp and configure it to point to the right repository settings used in CollectionSpace now
You can run it in its own container; it doesn't need to be in Tomcat or the same Tomcat
The configuration settings that are in Tomcat might be enough to figure this out
The worst case might be that you need to shut down / undeploy CollectionSpace while using the console or shell for an export, but you might not need to.
Most valuable?
Nate:
When Richard working with Chris
Ray working with me
When Jesse worked with our iimplementer in the past
Could be done in Skpe
with shared screens
Susan:
Badly need to get rid of old docs
Chris M:
Search on the wiki is very difficult - sometimes need exact titles
e.g. services APIs
Chris H:
Peer sessions are vital
Monthly sessions?
Chris M:
Anyone can use the Adobe Connect space
sultan @ caret maintains it
Nate:
IRC logs valuable
searchable on the wiki