Botanical propagation data analysis
Major Questions
Goals for data parsing:
- If subsequent treatment record is a location only and matches the date of a preceding treatment or activity record, then merge the location into a new field. The new UI has a field for Nursery Location.
- In the new CSpace UI design, concentration from the SAGE treatment table is moving up to CSpace propagation concentration and only being used for hormone treatments. However, there are propagations that have multiple treatments that have different concentration values and also different hormones. Do we need to add hormone and concentration back into the CSpace Event (activity) record? Or maybe the hormone remains the Event Type (which means we can't merge those treatment types). Do we have concentration values for treatments that are not hormones (e.g., fungicide)?
- The new model proposes merging Number Started from Propagation records and Propagule Count in Activity (growh_history) records. It seems like we need a field at the Propagation level to hold the legacy Number Started information (because there are 2884 conflicting records), but data analysis might help make sense of this.
- Can Treatment Types be merged by their Treatment Category so that we don't have so many Event Types in CSpace? We might have to ask UCBG about this.
- Can we derive the new top-level propagation value for the seeds/scar-strat dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/cutting type dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/hormone dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/concentration dropdown? Use UDF? Need data analysis.
Also: Search for "ASKUCBG" in notes.
propType drop down - Susan wants to prune the drop down list to three values: seeds, cuttings, live plants, unsure what the mapping is for the rest of the values, may go to new fields we're adding (cutting type?). FINDING: Several other values might map to seeds, cuttings, live plants. Or we can concatenate into propagation comments. SOme values might go into Cutting type. Also, if we can make inactive terms not appear in drop down lists, then we are in good shape. Â
** = legacy read-only field might be needed
** numberStarted in SAGE propagation table is moving down to CSpace activity propaguleCount (merged where possible). FINDING: There are 2884 situations where propagation.number_started differs from growth_history.propagule_count. What do we do? Presumably we take the growth_history.propagule_count value, but we then lose propagation.number_started.
propaguleCount in SAGE treatment table is moving up to CSpace activity propaguleCount (merged where possible)
order in SAGE treatment table is moving up to CSpace activity order (collated). FINDING: Ignore treatment.propagule_count.
type in SAGE treatment table is moving up to CSpace activity type - though values from both of these might move up to new fields 'scar/stratification' in propagation, 'hormone' in propagation, 'location in nursery' in activity, or 'fungicide' in activity. FINDING: Might merge treatment types by Treatment Category.
notes in SAGE treatment table is moving up to CSpace activity comments. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records.
germination/rooting date in SAGE activity table is moving up to CSpace propagation germination/rooting date - will have to do data analysis to make sure there is one per propagation. FINDING: Always null so can ignore!
successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation. FINDING: Only 3 propagations have different growth_history.percent_success values (416 for 92.1029; 746 for 81.0829; 2836 for 2002.0330)Â If we can clean those up (set one value), then we'll be OK
treatment dates in SAGE treatment table are moving up to CSpace activity dates. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records..
** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data. FINDING: There are approximately 20 propagation records that have treatments with different concentration values. Might need a legacy field if these can't be "cleaned up" or if it doesn't make sense to clean them up (examples: 567 for 90.0231; 1551 for 77.0717). Does this break the model? ASKUCBG.
OTHER MAJOR QUESTIONS:
- If treatment type is one of the locations (e.g., lath house), we want to move that to the Location in Nursery field. What are all the location treatment types? Ideally, we would merge these with the matching activity-treatment record, but these might be hard to find.
- How do we derive the new top-level propagation value for the seeds/scar-strat dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/cutting type dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/hormone dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/concentration dropdown? Use UDF? Need data analysis.
Notes from analysis
spent some quality time this afternoon analyzing data in the propagation tables for the Botanical Garden. We have a situation where they have requested a design that is significantly different from what they use now in SAGE. Effectively they are going from a situation where three core entities support the information (propagation efforts can have multiple activities; activities can have multiple treatments) to one where they have two entities (propagation efforts can have multiple activities). So the treatment data is getting moved around in different places. This is certainly a simpler model for new data going into the system (the old one was too complex), but we needed to do some quick data analysis to see what this would mean for legacy data. Â
The simplest approach (though still hard) is to combine records in activity and treatment together and map those fields to new CSpace fields. But we knew of several cases where they wanted to do some more involved mapping to combine multiple legacy activity-treatment records into one CSpace activity record. In my notes below I basically conclude that it is not feasible to clean up data like this, that in fact it does not buy much and is a very expensive operation. Â
Lam and Amy, you should take a look at this and see if you concur or think we should look at anything else. Â
Amy, I am certain we'll have to add some of those fields to hold the legacy treatment_type and such. As discussed I think it might be good to make those read-only and clearly labeled ("SAGE Treatment" or something like that). Â
John, Glen, and Ray, I'd definitely be interested in your thoughts here, though I'm not expecting you to spend serious time on this.
Thanks,
Chris
--------
SAGE schema (all tables have create and update info too)
propagation: 6738 recs
prop_id
accession_number
prop_date
early_prop_date
late_prop_date
prop_type
purpose
number_started
prop_comments
extra_seed_flag
gh_sequence
growth_history: 19929
prop_id
growth_id
activity_date
activity_begin_date
activity_end_date
activity_type_id
propagule_count
pot_type
pot_size_id
percent_success
growth_comments
medium_name
germination_date
treatment: 20451
prop_id
growth_id
treatment_type_id
concentration
begin_date
end_date
propagule_count
treatment_notes
display_sequence
This query gets the activity and treatment display names too:
NOTE: outside joins needed because there are activities that don't have treatments attached
select gh.prop_id, gh.growth_id, gh.activity_type_id, atc.activity_type, t.display_sequence treatment_seq, t.treatment_type_id, tt.treatment_type_name from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) order by gh.prop_id, gh.growth_id, t.display_sequence
with data like:
25   1   4   started   1   16   benlate
25   1   4   started   2   25   lesan
25   1   4   started   3   15   prop house
25   1   4   started   4   1   bottom heat
25   2   14   results   (null)   (null)   (null)
25   3   2   changed pot   1   15   prop house
25   4   14   results   1   15   prop house
25   5   2   changed pot   1   14   holding area
25   6   3   planted out   (null)   (null)   (null)
showing 6 activities, the first of which has 4 treatments. two activities have no joined treatment records. Without hard work this would end up as 9 activities in CSpace for this activity. And we would be adding the read-only legacy treatment type fields, etc.
How are activity_types distributed?
select atc.activity_type, count\(*) from growth_history gh, activity_type_code atc where gh.activity_type_id=atc.activity_type_id group by activity_type order by count\(*) desc
potted   4359
started   3620
changed pot   3256
results   3139
pretreat   2210
planted out   1120
changed location   1001
applied treatment   544
dead   358
discarded   145
changed medium   62
repotted   61
checked pot   21
divided   10
plant sale   7
moved   4
sow   3
cold strat   2
cutting taken   2
surplus   1
damped off   1
distributed   1
deaccessioned   1
returned to nursery   1
Treatment types, interesting, there is a treatment category field that identifies treatment types in several categories: null, disinfectant, environment, fertilizer, fungicide, hormone, none, pesticide, physical, scarification, stratification
How are treatment types distributed?
select tt.treatment_type_name, count\(*) from treatment t, treatment_type tt where t.treatment_type_id=tt.treatment_type_id group by tt.treatment_type_name order by order by count\(*) desc
lath house   4346
prop house   3119
intermittent mist   1677
cold stratification   1338
bottom heat   1309
prop house 1Â Â Â 1128
holding area   914
benlate   845
physan wash   813
lights   747
dip and grow   630
physan   590
IBAÂ Â Â 543
soak in water   431
benlate & captan   261
IBA talc   250
benlate & lesan   211
warm stratification   162
research greenhouse   135
Hormex   97
sulphuric acid   95
water tray   92
prop house shade   90
smoke plus   79
subdue   58
shade house   51
boiling water on soil surface   50
scarify   40
GA3Â Â Â 32
lesan   31
Rootone   25
Truban   25
prop house 2 shade   24
hormone   18
benlate & physan   17
benlate & subdue   17
glass cover   16
physan soak   12
nursery   11
wardian case   11
lath house shade   10
Nurs   7
wound cuttings   7
boiling water   6
captan   5
fungicide   5
stratification   5
2-4-2Â Â Â 4
freeze   4
microwave oven   4
osmocote   4
pericarp removed   4
outside prop house   3
refrigerator   3
central prop house   2
fire   2
heat   2
liquid fertilizer   2
pericarp trimmed   2
physan dip   2
rear prop house   2
Serenade   2
bleach   1
bright location   1
cactus house   1
chipped   1
clorox soak   1
darkness   1
drenched in hot water   1
dusted with sulphur   1
force dormant   1
gravel mulch   1
green house #2Â Â Â 1
iron sulphate   1
metal ponds   1
NAAÂ Â Â 1
pericarp ruptured   1
prop house 2Â Â Â 1
seed coat removed   1
shade   1
sterilized   1
sulphur   1
under mist bench   1
wash & clean   1
water & let go dry   1
water basin   1
There must be 4 unused treatments (91 in the table). Many of the top ones are related to location (lath house, prop house)
OK, so how are counts of combined activity-treatment records distributed?
select atc.activity_type, tt.treatment_type_name, count\(*) from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) group by atc.activity_type, tt.treatment_type_name order by count\(*) desc
275 rows, but here they are. See Excel attachment too.
results   (null)   3089
potted   lath house   1822
potted   intermittent mist   1572
changed pot   lath house   1377
planted out   (null)   1116
started   cold stratification   1102
started   prop house   996
potted   prop house   858
changed pot   prop house   857
pretreat   physan wash   792
started   bottom heat   697
started   benlate   613
pretreat   dip and grow   613
started   lath house   567
pretreat   IBA   527
started   physan   526
started   lights   497
changed location   lath house   484
started   prop house 1   473
potted   holding area   447
pretreat   soak in water   428
changed pot   (null)   361
dead   (null)   355
potted   (null)   353
changed location   prop house   320
potted   bottom heat   316
changed pot   holding area   308
potted   prop house 1   255
changed pot   prop house 1   248
pretreat   IBA talc   247
changed location   bottom heat   226
applied treatment   cold stratification   215
discarded   (null)   145
started   benlate & lesan   133
changed location   prop house 1   130
started   warm stratification   128
potted   lights   125
started   holding area   119
pretreat   benlate   111
applied treatment   benlate & captan   108
pretreat   sulphuric acid   95
pretreat   Hormex   94
pretreat   smoke plus   77
changed pot   benlate & captan   68
started   prop house shade   64
started   research greenhouse   64
changed location   lights   58
changed pot   lights   57
started   water tray   56
started   intermittent mist   54
started   (null)   53
pretreat   physan   53
started   benlate & captan   53
potted   benlate   52
applied treatment   bottom heat   50
started   boiling water on soil surface   48
changed pot   research greenhouse   40
changed pot   shade house   38
applied treatment   benlate   37
applied treatment   prop house   37
pretreat   scarify   34
applied treatment   benlate & lesan   33
pretreat   GA3   31
potted   water tray   31
changed location   holding area   30
applied treatment   warm stratification   30
changed pot   benlate   29
applied treatment   subdue   29
pretreat   Rootone   25
changed medium   (null)   24
potted   benlate & captan   24
changed medium   lath house   24
potted   research greenhouse   24
results   lath house   22
potted   prop house shade   22
changed pot   benlate & lesan   21
started   lesan   19
results   prop house   19
changed location   (null)   18
started   Truban   18
pretreat   hormone   18
repotted   lath house   18
changed pot   intermittent mist   18
repotted   prop house   17
started   benlate & physan   17
started   prop house 2 shade   16
pretreat   (null)   15
started   glass cover   15
applied treatment   IBA   15
pretreat   intermittent mist   15
changed pot   subdue   14
changed pot   bottom heat   14
potted   benlate & subdue   13
potted   cold stratification   13
checked pot   (null)   12
potted   shade house   12
pretreat   benlate & lesan   12
repotted   (null)   11
repotted   prop house 1   11
applied treatment   lath house   11
applied treatment   physan wash   11
potted   subdue   10
changed pot   nursery   10
potted   benlate & lesan   10
applied treatment   dip and grow   9
started   physan wash   8
checked pot   lath house   8
applied treatment   lesan   8
changed medium   prop house   8
plant sale   (null)   7
pretreat   lath house   7
pretreat   physan soak   7
changed location   wardian case   7
applied treatment   (null)   6
started   dip and grow   6
potted   lath house shade   6
applied treatment   scarify   6
pretreat   benlate & captan   6
changed location   intermittent mist   6
potted   physan   5
changed pot   Nurs   5
pretreat   stratification   5
applied treatment   lights   5
repotted   intermittent mist   5
changed location   prop house 2 shade   5
changed location   research greenhouse   5
moved   (null)   4
divided   (null)   4
potted   Truban   4
repotted   subdue   4
started   physan soak   4
results   holding area   4
started   boiling water   4
pretreat   microwave oven   4
pretreat   wound cuttings   4
applied treatment   captan   4
applied treatment   osmocote   4
pretreat   cold stratification   4
applied treatment   intermittent mist   4
divided   prop house 1   3
changed pot   water tray   3
started   wound cuttings   3
applied treatment   physan   3
pretreat   pericarp removed   3
potted   warm stratification   3
changed medium   holding area   3
sow   (null)   2
cold strat   (null)   2
cutting taken   (null)   2
potted   lesan   2
pretreat   fire   2
pretreat   heat   2
started   freeze   2
dead   prop house   2
pretreat   lights   2
repotted   benlate   2
changed pot   2-4-2   2
divided   lath house   2
potted   physan wash   2
potted   dip and grow   2
potted   wardian case   2
pretreat   physan dip   2
pretreat   prop house   2
pretreat   bottom heat   2
results   prop house 1   2
started   refrigerator   2
started   wardian case   2
planted out   lath house   2
applied treatment   2-4-2   2
applied treatment   Hormex   2
started   lath house shade   2
pretreat   pericarp trimmed   2
repotted   benlate & captan   2
results   intermittent mist   2
applied treatment   IBA talc   2
applied treatment   Serenade   2
started   outside prop house   2
applied treatment   fungicide   2
changed location   water tray   2
changed medium   prop house 1   2
applied treatment   smoke plus   2
applied treatment   holding area   2
applied treatment   prop house 1   2
changed pot   prop house 2 shade   2
applied treatment   boiling water   2
applied treatment   soak in water   2
changed location   lath house shade   2
changed location   prop house shade   2
applied treatment   benlate & subdue   2
applied treatment   liquid fertilizer   2
applied treatment   central prop house   2
changed location   cold stratification   2
surplus   (null)   1
damped off   (null)   1
distributed   (null)   1
deaccessioned   (null)   1
potted   Nurs   1
results   IBA   1
pretreat   NAA   1
potted   Hormex   1
started   shade   1
divided   lights   1
pretreat   lesan   1
results   lights   1
started   captan   1
started   subdue   1
dead   lath house   1
potted   IBA talc   1
pretreat   Truban   1
pretreat   bleach   1
pretreat   freeze   1
results   benlate   1
started   sulphur   1
dead   bottom heat   1
potted   fungicide   1
pretreat   chipped   1
sow   prop house 1   1
changed pot   lesan   1
changed pot   Truban   1
changed pot   physan   1
potted   metal ponds   1
potted   water basin   1
pretreat   fungicide   1
repotted   fungicide   1
started   sterilized   1
potted   cactus house   1
potted   gravel mulch   1
results   bottom heat   1
started   clorox soak   1
potted   force dormant   1
applied treatment   GA3   1
changed location   Nurs   1
changed medium   lights   1
pretreat   wash & clean   1
repotted   holding area   1
started   soak in water   1
checked pot   prop house   1
dead   intermittent mist   1
planted out   prop house   1
potted   rear prop house   1
pretreat   iron sulphate   1
potted   under mist bench   1
results   benlate & lesan   1
started   bright location   1
started   rear prop house   1
applied treatment   Truban   1
applied treatment   freeze   1
changed location   nursery   1
started   benlate & subdue   1
potted   outside prop house   1
potted   water & let go dry   1
applied treatment   darkness   1
changed medium   bottom heat   1
changed pot   green house #2   1
potted   dusted with sulphur   1
pretreat   pericarp ruptured   1
pretreat   seed coat removed   1
results   prop house 2 shade   1
divided   research greenhouse   1
changed location   shade house   1
changed pot   benlate & subdue   1
changed pot   prop house shade   1
repotted   research greenhouse   1
applied treatment   glass cover   1
applied treatment   physan soak   1
changed location   prop house 2   1
changed location   refrigerator   1
started   drenched in hot water   1
returned to nursery   lath house   1
planted out   cold stratification   1
changed location   benlate & lesan   1
applied treatment   pericarp removed   1
applied treatment   prop house shade   1
pretreat   boiling water on soil surface   1
applied treatment   boiling water on soil surface   1
What we'd be really interested in finding out is if we can identify the situations where an activity has multiple treatment records, but the second is just a location. Then we could fold those together. But what if there are more than two? Use the date? What if there are only two but they have different dates? Then maybe they should be separate CSpace records. Ugh, too hard. Maybe just find all combinations of activity and treatment. Are there activities that don't have treatment records? Yes! Queries adjusted to do outside join to treatment table.
FYI, the number of records produced by getting all combinations of activity and treatment:
select count\(*) from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
26027
growth_history has 19.9K and treatment has 20.5K so by loading all combinations we are not growing the number of records by very much beyond the ideal. Maybe you could argue that this is 5500K extra records.
---
Next day:
Propagation type
SELECT prop_type, count(*) FROM dbo.propagation group by prop_type
seed   3630
cutting   1704
living plant   1104
division   111
bulbs   88
seedlings   33
corms   17
bulb   12
transplant   11
corm   6
rooted pieces   5
spores   5
seedling   3
bulblets   3
rooted cuttings   2
bulb scales   2
bare root   1
tuber   1
ASKUCBG: Susan wants to reduce prop type to three values. Can we map these existing values to those three? That is, if we can't figure out how to not display inactive terms?
Germination Date
Good news: Germination date is always null!
Success rate
successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation
SELECT p.prop_id, accession_number, prop_date, prop_type, gh.growth_id, gh.activity_type_id, atc.activity_type, gh.percent_success FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) order by prop_id, growth_id
When there is a non-null percent_success value it looks like the activity type is results. However, success_value can still be null even if activity type is "results"
The question is: Can there be different activity.percent_success values for a single propagation record?
SELECT p.prop_id, accession_number, atc.activity_type, gh.percent_success FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) where percent_success is not null order by prop_id, growth_id
Only 3 propagations have different results in growth_history records. Â
416   92.1029     results   5
416   92.1029     results   26
746   81.0829     results   71
746   81.0829     changed pot   99
746   81.0829     changed pot   99
2836   2002.0330    potted   5
2836   2002.0330    results   89
Concentration
** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data
SELECT p.prop_id, accession_number, atc.activity_type, t.display_sequence treatment_seq, tt.treatment_type_name, t.concentration FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) where t.concentration is not null order by prop_id, gh.growth_id, display_sequence
NOTE: Might use these queries to create merge fragments to merge with main propagation SQL. Otherwise have to find a way to get the non-null values, etc., in the main query.
NOTE: Might need to have the treatment_type come along in this query
567   90.0231     pretreat   2   dip and grow   1:5
567   90.0231     pretreat   3   IBA   0.8
610   83.1172     pretreat   2   dip and grow   1:3
610   83.1172     potted   2   bottom heat   1:3
610   83.1172     potted   3   intermittent mist   1:3
670   93.1051     pretreat   1   dip and grow   1:5
670   93.1051     pretreat   2   IBA talc   4.5
1402   94.0310     pretreat   2   dip and grow   1:10
1402   94.0310     pretreat   3   IBA talc   4.5
1478   91.1474     pretreat   1   dip and grow   1:5
1478   91.1474     potted   1   dip and grow   1:5
1551   77.0717     pretreat   2   dip and grow   1:5
1551   77.0717     pretreat   3   IBA   4.5
So I'm only about a 3rd through the 1375 records and there are 6 propagations that have different concentration values (and different treatment_types too)
Are the concentrations unique within an activity?
Here's an interesting query:
SELECT
   p.prop_id,
   tt.treatment_type_name,
   t.concentration, count(*)
FROM
   dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
group by p.prop_id,
   tt.treatment_type_name,
   t.concentration
having count(*) > 1
order by prop_id
---
10/9
Combined activity-treatment records
select gh.prop_id, gh.growth_id, gh.activity_date, gh.activity_begin_date, gh.activity_end_date, gh.activity_type_id, atc.activity_type, gh.propagule_count, gh.pot_type, gh.pot_size_id, gh.percent_success, gh.growth_comments, gh.medium_name, gh.germination_date, gh.date_entered, gh.entered_staff_id, gh.last_change_staff_id, gh.last_change_date, t.display_sequence, t.treatment_type_id, tt.treatment_type_name, t.concentration, t.begin_date, t.end_date, t.propagule_count, t.treatment_notes, t.date_entered, t.entered_staff_id, t.last_change_date, t.last_change_staff_id from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) order by gh.prop_id, gh.growth_id, t.display_sequence
ASKUCBG: Activity date in CSpace is calendar, but there are fuzzy dates in SAGE. 1058 dates of 18.5K are fuzzy.
Propagule count GH and T
Reconcile propagule count in GH and T:
SELECT p.prop_id, accession_number, atc.activity_type, gh.propagule_count, t.display_sequence treatment_seq, tt.treatment_type_name, t.propagule_count FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) where gh.propagule_count is not null or t.propagule_count is not null order by prop_id, gh.growth_id, display_sequence
> 5000 records returned, mostly GH by the looks of it.
where gh.propagule_count is not null or t.propagule_count is not null: 16216
where gh.propagule_count is not null and t.propagule_count is not null: 2
where gh.propagule_count is not null: 16216
where 2.propagule_count is not null: 2
For the 2 where t.propagule_count is not null, the values are identical in GH. Therefore use GH only.
Reconcile propagation.number_started with gh.propagule_count
SELECT p.prop_id, accession_number, prop_date, prop_type, number_started, gh.activity_type_id, atc.activity_type, gh.propagule_count FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) order by prop_id, growth_id
where p.number_started is not null or gh.propagule_count is not null: 13556
where p.number_started is not null and gh.propagule_count is not null: 10136
where p.number_started is not null: 11011
where gh.propagule_count is not null: 12681
where p.number_started = gh.propagule_count: 7252
where p.number_started <> gh.propagule_count: 2884
ASKUCBG: What to do about 2284 where the values are not equal? e.g.,
22   85.1541     November 20 1985   seed   15   2   changed pot   1
22   85.1541     November 20 1985   seed   15   6   changed location   1
22   85.1541     November 20 1985   seed   15   9   dead   1
6771   96.1050     Oct 19, 2010   cutting   20   13   potted   11
6771   96.1050     Oct 19, 2010   cutting   20   13   potted   1
6773   96.1051     Oct 26, 2010   cutting   20   13   potted   16
6773   96.1051     Oct 26, 2010   cutting   20   13   potted   1
Do we need to keep number started in top level propagation record?
Instead? Get the union of activity records and treatment records? Try this out SQL.
March 5, 2013
Here's the SQL we are currently using to bring together the activity and treatment records:
-- activity records with good dates select p.accession_number, p.prop_type, 'activity' record_type, gh.prop_id prop_id, gh.growth_id growth_id, 0 treatment_id, atc.activity_type type_name, null treatment_category, gh.activity_date event_date, convert(char(12), dateadd(dd,(gh.activity_begin_date - 2415021), convert(datetime, '1900-01-01', 103)), 101) begin_date, convert(char(12), dateadd(dd,(gh.activity_end_date - 2415021), convert(datetime, '1900-01-01', 103)), 101) end_date, gh.propagule_count propagule_count, gh.growth_comments comments, null concentration, gh.pot_type pot_type, gh.pot_size_id pot_size_id, gh.percent_success percent_success, gh.medium_name medium_name, gh.germination_date germination_date, gh.date_entered date_entered, gh.entered_staff_id entered_staff_id, gh.last_change_date last_change_date, gh.last_change_staff_id last_change_staff_id from growth_history gh, activity_type_code atc, propagation p where gh.activity_type_id=atc.activity_type_id and gh.prop_id=p.prop_id union -- treatment records look OK too: select p.accession_number, p.prop_type, 'treatment' record_type, t.prop_id prop_id, t.growth_id growth_id, t.display_sequence treatment_id, tt.treatment_type_name type_name, tt.treatment_category treatment_category, null event_date, convert(char(12), t.begin_date, 101) begin_date, convert(char(12), t.end_date, 101) end_date, t.propagule_count propagule_count, t.treatment_notes comments, t.concentration concentration, null pot_type, null pot_size_id, null percent_success, null medium_name, null germination_date, t.date_entered date_entered, t.entered_staff_id entered_staff_id, t.last_change_date last_change_date, t.last_change_staff_id last_change_staff_id from treatment t, treatment_type tt, propagation p where t.treatment_type_id=tt.treatment_type_id and t.prop_id=p.prop_id order by prop_id, growth_id, treatment_id