...
Chris P: this will meet our needs
Import
Susan:
After Richard's fixes, import is now accepting 5K imports routinely
at some point, need to restart Tomcat from time to time
Would like to get to 10K and perhaps more in imports
Maybe some issues with macro substitution, ampersand substitutions, etc.
Had seen these issues even with the Java Client Library in the past
Chris P:
Using Talend
Data imported directly from a MSSQL database via a Talend module
Go through transforms, merging together
For output part, were using 'Advanced XML'
Problem with not supporting multiple loops in a tree
Instead, using Java, using JAXB-generated classes
if I have repeated fields, turn them into an object list
lots of modules with code, inside of Talend
write code to map objects
originally was using services APIs to import
now using import service
just using JAXB bindings to create the XML
if anything's changed in the schema, I get a compile error
which helps me find out what to change
just import new libraries in, and see what doesn't compile
is on the main CollectionSpace wiki
Yuteh:
Generating individual files for main record and repeatable structures
Previously using Susan's 10K record merge
Doing merging
XMLMerge works for smaller files like Person
CollectionObject is too big
Richard: it probably can be made to work
Am now splitting every 3K (78 MB, not including any repeating)
have hundreds of files
if I can keep my delta in one file
can run this over and over again on the same delta file
Susan:
Have 5K, but my objects at MMI aren't as large as PAHMA's objects
Patrick:
Pulled all 600K records, denormalized into 1 million rows, for Delphi, 270 MB
Yuteh:
Wrote Java code to strip off 'easy' empty records
Chris P:
Last did this in 1.7, don't recall how many batches, wasn't too bad
Have 46K object records
Chris H:
Talend XML generator has 'create elements even if empty checkbox', is checked ('on') by default
Susan:
Required in groups and lists, perhaps in repeatables
Relations are difficult with custom extensions
Patrick:
Should be able to have generic doctype in there
Richard will think about this
Already marked with a tenant
We don't need that in the doctype
- When filtering relations, could do a stem search
Richard:
Nuxeo shouldn't care, due to its derivation model, if derived from the common doctype
Susan:
When custom tenant isn't there
Richard:
The fix will mean that you won't need to re-import the relation records; you can leave the tenant-qualified doctypes in there
Susan:
Display predicate name in relation not used; different in app layer?
Doesn't appeared to be used at all
Richard:
Dan asked for this some time ago
Chris at Walker:
Hooking up Talend right now
Nate:
Sending payloads now using the services
The downside is you can't run it again, without querying whether the object already exists
May not necessarily be a bad fit for us
There was a set of tools that could take various data sources, transform, spit out uniform
Kettle
Susan:
I assemble the XML myself in JavaScript
Kettle lets you make fragments and assemble them in JavaScript
Quick and easy
Patrick:
Talend can import a schema and generate XML in that schema
Nate:
Pre-populate CSIDs with GUIDs?
Susan:
Yes
Richard:
Easier for creating relations
Chris H:
Simple Java method to get a GUID/UUID, which you can put in your CSID in Talend
Nate:
Collection we're importing is 11K objects
Even if we have to do it again, talking to services is appealing to us
Might look again at Talend, Kettle
Our starting data is in CSV files from FileMaker Pro, I can generate good CDWA Lite data from that
Chris P:
Relations, movements ... not just objects
Susan:
By sheer number, the relations are the most
Patrick:
If use import, you can prepopulate with CSIDs, with all relations using those, etc.
If use services, you will need to retrieve what you imported to get their CSIDs
Speed difference using import - close to an order of magnitude advantage in speed over services
If you're fiddling, that speed difference can be important
A Talend script importing from CDWA Lite would be interesting to many people
Can export a job from Talend, and someone else can look at what you've done
Chris H:
Talend is great, but has its own mindset
Not always clear about what should be shared
Yuteh has been creating some great documentation; e.g. on creating relationships
Chris P:
Has a page on the main wiki about what he did in 1.7
Would be really good would be a standalone output module
You can do whatever you need on the import side
But the maintenance is quite high on that, while the schemas are changing
Might be a significant benefit in a monthly implementer's call
Problems go out on the Work or Talk list
But successes don't always get reported or discussed
Richard:
It's possible I could get the Nuxeo shell and/or Webapp installed
If you get the Nuxeo DM webapp and configure it to point to the right repository settings used in CollectionSpace now
You can run it in its own container; it doesn't need to be in Tomcat or the same Tomcat
The configuration settings that are in Tomcat might be enough to figure this out
The worst case might be that you need to shut down / undeploy CollectionSpace while using the console or shell for an export, but you might not need to.