MMI Data Cleanup to Automate

Data Cleanup that can be automated. See also Objects data cleanup completed by hand and notes on OC objects data cleanup.

  • ARTIFACT CLASS, WORK TYPE, COLLECTION CATEGORY: Many uncategorized objects can be categorized using the attributes "Category" and "Subcategory." See the Classification Walkover page for instructions.
  • DIMENSIONS: - In OC, "Dimensions" is a single field containing up to three values. In CS, there will be multiple fields for dimensions that will be named according to type (e.g., height, width, depth, diameter, etc). We will need to separate each of the three values into its own field.
    • Assigning measurement types:
      • In most cases, each value will go in its own field WITHOUT a measurement type. This is because, while most OC records list the values in a specific order (height, width, then depth), many records do not follow this rule.
      • Fields that contain all three characters (H, W, and D): We can assign a measurement type to the value based on the character.
        • In most cases, there is a space between the value and the letter, eg:
          • 10 H x 11 W x 2 D, or
          • 10" H x 11" W x 2" D
      • In catalog records created after June 2011, there is no space after the value:
        • 10h x 11w x 2d
      • Fields that contain the word "Diameter" or "Dia": We can assign a measurement type to the value based on that word. (Given that the letter "D" repeats in both "D" for "Depth," as above, and in "Diameter," will it be possible to automate measurement type?)
      • Data from the attribute field "Other Physical Details" should be moved here with the text incorporated into a note. **It's unclear whether or not there's a way to automate this.
      • There is a CS Field for each dimension called "Dimension Measured By." Is there a way to pull this information from "Object Histories"?
    • Formatting measurement values:
      • In all cases we can convert fractions (1/2) into decimals (.5). If we do so, we will have to delete the space before the decimal, unless the fraction is the first value in the list.
      • Many decimal values are displayed in increments smaller than .25 (e.g. .125, .375), and many values include fractions in increments of smaller than 1/4 (e.g., 3/8 or 5/8). This is the correct formatting for objects smaller than (x); for larger objects, these values should be rounded.
  • REGISTRAR STATUS: Delete this field entirely.
  • PHYS_REMARKS: This is a hidden data field on OC -- it appears in the database and in reporting, but cannot be edited through the OC user interface. Over 5000 records contain information in this field. See report.
    • ALL information in this field could be moved to ATTRIBUTES, except for "Markings." This information could be moved to the field that is currently, in OC, called "Content Remarks."
    • In reporting, Phys_remarks info appears in the same cell. The attribute value appears after the attribute type. This is the complete list of attribute types:
      • Format Gauge:
      • ModelNumber:
      • Material:
      • Markings:
      • Serial Number:
      • Weight:
    • If one of these attribute fields contains multiple values, these are recorded in the report as separated by semicolons. The fields themselves are separated from other data in the field by a single space. Example:
      • Material: Tin; porcelain; wood ModelNumber: 1912
    • ONLY within the attribute field "materials," there are additional cases in which values are often separated by commas. Example:
      • Material: wood, metal, glass
    • "Markings" often contains a comma as a part of the value, rather than as a symbol that denotes two separate values. Example:
      • Markings: 'Nite Lite' Exclusively Distributed by LECO Electric Manufacturing Co., Florida, NY.  Copyright by Kagran Corporation.  Made in Japan.
  • CONDITION and ARTIFACT NEEDS:
    • Adding value to the "Condition" field:
      • Sometimes data that should be in "Condition" appears in the "Artifact Needs" field. See artifact needs report.
      • Where the exact phrase "Exhibitable/Needs Work" appears in "Artifact Needs," delete the phrase and the change value in "Condition" to 1.
      • Where the exact phrase "Needs No Work" appears, delete the phrase and change the value in "Condition" to 0.
      • Where the exact phrase "In Jeopardy/Unstable" appears, delete the phrase and change the value in "Condition" to 3.
      • Where the exact phrase "Not Exhibitable/Stable" appears, delete the phrase and change the value in "Condition" to 2.
      • These phrases occasionally appear in the field "Artifact Needs" with other text. The other text should not be deleted. If it is impossible to delete some text and retain other text in the field, then no text should be deleted.
    • Adding values for "Marking" and "Tagging"
      • Where the exact phrases "Marking" and "Needs marking" appear, check "needs marking" in Artifact Needs.
      • Where the exact phrases "Tagging" and "Needs tagging" appear, check "needs tagging" in Artifact Needs.
      • Where the exact phrases "Marking and tagging," "Marking & tagging," "Needs marking and tagging," and "Needs marking & tagging" appear, check both "marking" and "tagging" in Artifact needs.
      • There are no other phrasings aside from those listed above, so this task can be automated.
    • Keep remaining values in "Artifact needs" and keep field.
  • ACCESSION DATE: Delete this field entirely.
  • ATTRIBUTES_:_ (Data below comes from looking at each field on this report of all objects with all attribute fields.)
    • CATEGORY: Delete this attribute field after we finish the classification walkover.
    • COMPONENTS: Note: If we are using Excel to store data at any point in migrating this field: Be aware that Excel's auto-formatting will convert all page number values up to 12 p. (1 p. - 12 p.) automatically into times (1:00 PM - 12:00 PM). Simply reformatting the cells after exporting the data will not restore the data to its original form (instead, 12:00 PM becomes "0.5," and so on).
    • CREDIT LINE: Delete this attribute field (contains no data).
    • DIMENSIONS: Move this data over to the "Dimensions" field in "Cataloging" on CS. Since "Dimensions" is repeatable on CS, we do not need to replace the data that is already in the field. (Note: Some of the data contained in this field repeats data in the "Dimensions" field in the Objects > Basic tab. Some of it does not, and instead provides different measurements for an additional component of the object--e.g., the packaging for a toy, where the "Dimensions" field in Basic contains the dimensions of the toy itself.) 
    • EXTENT: Delete this attribute field (contains no data).
    • OTHER PHYSICAL DETAILS: Some data remains that could be moved automatically:
      • Alternate dimensions (eg: "Bowl: Diameter 5.5 x 2; Cup: 4.25 x 2.25 x 2.25," or "Box: 1 x 7.25 x 7.25")
      • Clothing sizes (eg: "Adult size medium (38-40)," "Children's large.")
    • PUBLISHER: Delete this attribute field (contains no data).
    • SUBCATEGORY: Delete this field after completing the classification walkover.
    • SUBJECT: In all but a few cases this field is blank or contains irrelevant info. I have already moved any relevant data into other fields in OC, save 1989.26 (in progress). After 1989.26 is completed, delete this attribute field and all the data in it.
    • TECHNIQUE: Can replace all instances of "lithograph" with "lithography.
  • History.
    • LOANS: In order to NOT migrate the "Loans" history data, we'll have to delete only the "history" records with a type_id of 1 (Loans).
    • LOCATION: Standardize format.** There are 1,827 entries that have no colon between the storage area and the location code, and instead have a space.** Example: MST 7:5:2*** Delete the space between MST and the number string and add in its place a colon.
      • There are nearly 200 entries that have a space between the first colon and the beginning of the numeric location code.
        • Example: MST: 7:5:2
        • Delete the space between MST: and the number string.
      • Another formatting change: In many cases, there are notes listed in parentheses after the location code.
        **** BOU:2:2:8 (lamphouse only)
        • MST:6:3:6 (.1-.3)**** MST:12:5:6 (temporary)
        • In CS, all of these notes should to taken out of their parentheses and put as a note or specification to the location code.