Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Notes on Data Cleanup , M. AhernKey: Red is for areas in which I still need to run preliminary reports. Purple is for bits of data that will need to be moved by hand. The rest of the report describes data migration that can probably be automated. Whether or not it is worth automating this data migration will depend on how labor-intensive it is to migrate automatically versus by hand. 

  • OBJECTS (OC to CS Migration)Objects that are not collection objects. ** See spreadsheet for number formatting.**
  • WA (digital media): 6 object records.
  • mfp (Main Photo File): 164 object records.
  • OH (oral histories): 5 object records.
  • FP (film program photographs)
      : 943 object records.Accession number All loan numbers for objects should are formatted: "IL0001.0001." 
    • Artifact Class/Work Type and Collection Category
      • Artifact Class and Work Type appear in a single cell in each object record row in the excel doc reports from OC. They cannot be sorted separately.
      • Many objects have "artifact class" but have not been assigned a work type. When this the case, only the "artifact class" appears in the "artifact class/work type" cell in the report. For this reason, it's impossible to determine based on the reporting function how many objects have a classification for "Artifact Class" and yet lack a classification in "Work Type."
      • Because "artifact class" is a controlled list, it is impossible to not choose a value for artifact class. The default choice is "*Undetermined." Over 13,000 records have this default classification. Most of these records also lack classification for "collection category" (see below).
      • Because "collection category" is a controlled list, it is impossible to not choose a value for collection category. The default choice is "*Not yet assigned." About 13,000 records have this default classification. Most of these records also lack classification for "artifact class" and "work type."
      • CLASSIFICATION:
        • Most of the uncategorized objects can be categorized using the attributes "Category" and "Subcategory." See Classification Walkover (simple), and a more detailed, object-by-object walkover which shows how many objects are in each artifact class/work type. Many of these objects on this list can be classified by hand (where there are too few objects in a category to justify automating their classification).
    • Dimensions - In OC, "Dimensions" is a single field containing up to three values. In CS, there will be multiple fields for dimensions that will be named according to type (e.g., height, width, depth, diameter, etc). We will need to separate each of the three values into its own field.
      • ASSIGNING MEASUREMENT TYPES
        • In most cases, each value will go in its own field WITHOUT a measurement type. This is because, while most OC records list the values in a specific order (height, width, then depth), many records do not follow this rule.
        • Fields that contain all three characters (H, W, and D): We can assign a measurement type to the value based on the character.
        • Fields that contain the word "Diameter" or "Dia": We can assign a measurement type to the value based on that word. (Given that the letter "D" repeats in both "D" for "Depth," as above, and in "Diameter," will it be possible to automate measurement type?)
        • CS Field: Dimension Measured By: Is there a way to pull this information from "Object Histories"?
      • FORMATTING MEASUREMENT VALUES
        • In all cases we can convert fractions (1/2) into decimals (.5). If we do so, we will have to delete the space before the decimal, unless the fraction is the first value in the list of 2-3 lengths listed in the "dimensions" field.
        • Many decimal values are displayed in increments smaller than .25 (e.g. .125, .375), and many values include fractions in increments of smaller than 1/4 (e.g., 3/8 or 5/8). Should we round these up or down?
      Extent - Many objects have no extent. (Extent report)
      • For many of these objects, it is obvious from the title that the extent is "1." However, these records number in the thousands.
    • Registrar Status - Delete this field entirely.
    • Phys_remarks - This is a hidden data field on OC -- it appears in the database and in reporting, but cannot be edited through the OC user interface. Over 5000 records contain information in this field. About 5000 records have information in this field, so it would be difficult to move this information by hand. See report.
      • Without exception the field contains information that can be moved to attributes. In reporting, all this information appears in the same cell. The attribute value appears after the attribute type. These are the attribute types:
        • Format Gauge:
        • ModelNumber:
        • Material:
        • Markings:
        • Serial Number:
        • Weight:
      • Repeated values are recorded in the report separated by semicolons. Example:
        • Material: Tin; porcelain; wood ModelNumber: 1912
      • Within the attribute field "materials," values are often separated by commas. Example:
        • Material: wood, metal, glass
      • "Markings" often contains a comma as a part of the value, rather than as a symbol that denotes two separate values. Example:
        • Markings: 'Nite Lite' Exclusively Distributed by LECO Electric Manufacturing Co., Florida, NY.  Copyright by Kagran Corporation.  Made in Japan.
    • Condition/Artifact Needs - "Artifact Needs" information can be fed into "Condition" where it matches up with one of the four fields in "Condition." See artifact needs report.
      • Adding value to the "Condition" field
        • Where the exact phrase "Exhibitable/Needs Work" appears in "Artifact Needs," delete the phrase and the change value in "Condition" to 1.
        • Where the exact phrase "Needs No Work" appears, delete the phrase and change the value in "Condition" to 0.
        • Where the exact phrase "In Jeopardy/Unstable" appears, delete the phrase and change the value in "Condition" to 3.
        • Where the exact phrase "Not Exhibitable/Stable" appears, delete the phrase and change the value in "Condition" to 2.
        • Keep in mind that these phrases occasionally appear in the field "Artifact Needs" with other text. The other text should not be deleted.
      • Adding values for "Marking" and "Tagging"
        • "Marking," "Tagging," "Needs marking," "Needs tagging," "Marking and tagging," "Marking & tagging," "Needs marking and tagging," "Needs marking & tagging."
      • Keep remaining values in "Artifact needs" and keep field.
    • Accession date.  - Delete this field entirely.
    • Attributes. (Data below comes from looking at each field on this report of all objects with all attribute fields.)
      • The following attributes fields are available for inclusion in reports run on OBJECTS. These are: Category, Components, Copyright date, Copyright holder, Copyright Statement, Creation date, Credit line, Dimensions (in), Display Date, Extent, Format, Manufacturer, Material, Other physical details, Photo credit, Publication date, Publisher, Serial number, Subcategory, Subject, Technique
      • The following attributes fields are a part of the database structure but are NOT available in reports run on OBJECTS. These are: Color, Color/BW, Country, Creator, Date of birth, Date of death, Director, Distributor, Form, Genre, Historical notes, Key, Label and caption, Language, Licensor, Medium, Network/Cable service, Place name, Principle cast, Production company, Production date, Release date, Running dates, Running time. Most of these attributes are in use in other data sets (Entities, Occurrences, etc). It is unknown whether or not any object records have data in any of these fields, but it is unlikely (the database structure probably does not allow it).
      • All attribute fields are repeatable. When exporting this data via the OC "reporting" function, however, each object record has only one cell per attribute field. In cases in which a field has been repeated in an object record, both pieces of data will appear in a single cell separated by a semicolon.
        • CATEGORY: We will be able to delete this category Delete this attribute field after we finish the classification walkover.
        • COMPONENTS: If we are using Excel to store data at any point in migrating this field: Be aware that Excel's auto-formatting will convert all page number values up to 12 p. (1 p. - 12 p.) automatically into times (1:00 PM - 12:00 PM). Simply reformatting the cells after exporting the data will not restore the data to its original form (instead, 12:00 PM becomes "0.5," and so on).
        • CREDIT LINE: This field is completely empty -- it holds no data. Delete.DIMENSIONSDelete this attribute field (contains no data).
        • DIMENSIONS: Move this data over to the "Dimensions" field in "Cataloging" on CS. Since "Dimensions" is repeatable on CS, we do not need to replace the data that is already in the field. (Note: Some of the data contained in this field repeats data in the "Dimensions" field in the Objects > Basic tab. Some of it does not, and instead provides different measurements for an additional component of the object (--e.g., the packaging for a toy, where the "Dimensions" field in Basic contains the dimensions of the toy itself). Since dimensions is repeatable in OC, we should export the data there.All Dates:  This includes .)
        • DISPLAY DATE, COPYRIGHT DATE, CREATION DATE, and PUBLICATION DATE: Many object records contain multiple dates.
        • EXTENT: now empty. Delete. Delete this attribute field (contains no data).
        • OTHER PHYSICAL DETAILS: Most of this data belongs in other fields (such as dimensions, components, copyright statement, etc), but there is no way to sort it automatically. Move this data by hand. Other Some data remains that could be moved automatically:
          • Alternate dimensions (eg: "Bowl: Diameter 5.5 x 2; Cup: 4.25 x 2.25 x 2.25," or "Box: 1 x 7.25 x 7.25")
          • Clothing sizes (eg: "Adult size medium (38-40)," "Children's large.")See "Other phsyical details" objects list for the remaining items. This may need to be moved by hand.
        • PUBLISHER: This field is blank. DeleteDelete this attribute field (contains no data).
        • SUBCATEGORY: We will be able to delete Delete this field after we complete completing the classification walkover.
        • SUBJECT: In all but a few cases this field is blank or contains irrelevant info. A few lots contain data we would like to preserve. I've listed these lots below along with information on the nature of the data and how we should migrate it. See subject report.I have already moved any relevant data into other fields in OC, save 1989.26 (in progress). After 1989.26 is completed, delete this attribute field and all the data in it. 
        • TECHNIQUE: Can replace all instances of "lithograph" with "lithography.
    • History.
      • LOANS: In order to NOT migrate the "Loans" history data, we'll have to delete only the "history" records with a type_id of 1 (Loans).
      • LOCATION: Standardize format for "location."****
        • Mis-formatting 1: "MST 7:5:2." There are 1,827 entries that are improperly formatted in this way. Delete the space between MST and the number string and add in its place a colon. 
        • Mis-formatting 2: "MST: 7:5:2." There are nearly 200 entries that are improperly formatted in this way. Delete the space between MST: and the number string. 
        • Another formatting change: Notes in parentheses after the number string should to taken out of parentheses and put as a note or specification to the location code.