Monday, May 10, 2010

Evidence Management Explained

See “Evidence Management” for an overview and links to other articles in this series.

Evidence management is hard to understand from just the diagram (below). Let me give a concrete example of how evidence management should work.

Evidence management diagramAt the NGS conference David Rencher and a team from FamilySearch did a demonstration on real-time collaboration. (It was that session that got me thinking about how close genealogy software is getting to good evidence management. But I digress…) I’m going to borrow sources from their scenario for my example of evidence management.

Example source document with Footnote.com annotation iconsRed Box: A Source

Sources are the red boxes in the diagram above. A red box contains a citation, a transcript and/or digital image, and sometimes a link to an online source. Of course, a red box only represents a real-world source: a document, a page in a book, an artifact, or a person.

PAF users beware: These are not the definitions of source and citation used by PAF. Instead, go back to your high school and college definitions. A source—or its original—is something or someone you can touch. A citation is something you read that tells where to find the source.

For our example, let’s consider a source document from Rencher’s presentation. The document is the administrators’ final statement in the probate file of Peyton C Clements (shown to the right).

The citation given is

Peyton C. Clements probate file no. 1952, final statement, Greene County Clerk’s Office, Eutaw, Alabama.

Rencher's team provided online links to the document on Footnote.com (accessible to anyone) and on Ancestry.com (accessible to subscribers, libraries, and Family History Library patrons).

Green Box: Evidence

Evidence summaries are the green boxes in the diagram above. From the information in a document, pick out evidence about the subject and key it into the summary.

Summary name: Peyton Clements probate  
Subject: Angeline née Clements Goldsmith (Link)  
Source: Peyton C. Clements probate file …   Link Image of original
Created date: November 25th AD 1873 Primary information
Created by: W N Clements
W S Goss
Adm’rs, P. C. Clements
Primary information
 
Attribute Evidence Notes
Name: Angelina Angelina née Clements Goldsmith?
Gender: Female Primary information
Age: Over 21 years Primary information
Residence: Lowndes County, Ala. Primary information
Residence date: November 25th AD 1873 Primary information
Husband: W. H. Goldsmith Primary information
Principal: Peyton C Clements Father?

 

Enter the evidence exactly as it appears in the document. Enter only evidence that addresses the subject.

(Whether keyed that way or not, conceptually we are dealing with evidence applicable to the subject. You’ll see why later. For proficient genealogists, the subject may be a research objective rather than an individual.)

Enter notes about the evidences to aid your analyses, as shown in the final column, above.

Give each summary a name to identify it in lists and reports.

Notice in the diagram that each green box has a link going to the left and a link going to the right. These correspond with the two links in the example above.

Purple Box: A Conclusion

Evidence management aids reaching conclusions. A purple box links to evidences pertinent to that conclusion. The purple box prompts the user to analyze each piece of evidences. It captures the conclusion and invites the user to provide sound, coherently written reasoning.

For an example, consider the purple box for the birth date of our example subject, Angeline née Clements Goldsmith:

Summary Name Asser-tion Evidence Notes Created Date Link Analysis
Automatically Selected Evidence
1850 Census Peyton C Clements Age 2 Image copy of federal copy 1850 Source The earliest record; at just two years of age, it is highly likely that the 1850 census correctly implies 1848.
1860 Census P C Clements Age 12 Image copy of federal copy 1860 Source Next earliest records agrees with 1848
1870 Census P C Clements Jr. Age 18 Image copy of federal copy 1870 Source New orphans with all birth dates wrong suggests a 3rd party supplied the data
Peyton Clements probate finalized Age Over 21 years Image of original. Primary information 25 Nov 1873 Source 1848 and 1850 are consistent with father’s probate record
1880 Census W H Goldsmith Age 25 Image copy of federal copy 1880 Source Census ages ending with 0 or 5 are suspect
Death certificate A J Goldsmith Birth date 5 Feb 1850 Image copy of original. Secondary information 1939 Source There is no reason to doubt 5 February even though the 1850 is not possible according to the 1850 census
Gravestone A J Goldsmith Birth date 1850 Secondary 1939 Source Likely same informant as death certificate
Manually Selected Evidence
Marriage W H Goldsmith Marr-iage Date 15 Jan 1873 Explicit 1873 Source Birth from 1843-1858 is likely.
1850 Census Peyton C Clements Sibling Eleanor Age 1 1850 Source To have a 1 yr old younger sibling in 1850, Angeline must have been born in 1848.
Conclusion for Birth date: 5 February 1848
Reasoning: It is clear that the earliest records have the correct birth year. While there is no collaborating evidence for the day and month, there is currently no reason to doubt it.

 

Let me make note of several particulars:

  • A major purpose of a purple box is to gather in one place all the evidence upon which a conclusion is based. The software could be intelligent enough to automatically select all the evidence about the subject’s birth date or other assertion. (I prefer the word assertion over other terms such as event or fact.) The software should allow manual selection of other relevant evidence.
  • Now do you see why evidence must be keyed in exactly as it appears in the source? Wait until the purple box before you start making assumptions or drawing conclusions.
  • Notice the format leads the user to explain conflicting evidence.
  • Notice in this example that the conclusion is different from any of the individual evidences. It is not sufficient just to pick one as the preferred value.
  • The diagram indicates that purple boxes are supposed to link to green boxes, not red boxes. But since real world software doesn’t provide evidence summaries, I linked to sources in the above example.

Blue Box: Individuals

Conclusions roll into the assertions shown about individuals. Entering or changing a birth date or other assertion would bring up the conclusion box, leading users to enter sources and evidence summaries. Complete evidence management would take hardly more effort than is currently required without it.

Come on, genealogy companies! You guys can do this.

I’ve provided the briefest sketch of what evidence management could look like and accomplish. What do you think? What would you change or add to better implement the Genealogical Proof Standard? Assume the big genealogy companies are watching. This is your big chance to shape the future. Leave a comment by clicking the link below.

Next week I’ll talk about how astonishingly close Ancestry.com, FamilySearch, and Footnote.com are to getting evidence management. And I’ll talk about what they lack. But I’m talking the remainder of this week to work on some awesome stuff from my Grandmother that Ancestry.com digitized for me (for free!) at the NGS Conference. Love you, Grandma!

11 comments:

  1. This looks like the notes I keep typing into the software I use. Under birth I commonly type a range, then the note says: 1919 calculated from the 1920 census, 1918 calculated from the 1930 census, and so on through all the evidence that I have that supports birth date in any way. Another other way to handle this is to enter "alternate" dates and supporting evidence. Imagine that time line, given 6 or 8 paces of evidence for each "fact".

    Reading this explanation of evidence analysis should be mandatory for everyone doing genealogical research.

    ReplyDelete
  2. Intriguing and well worth the cup of coffee I had to make to read it through.

    I have 2 major concerns:
    - if this sort of evidence management is implemented in software, it needs to be either optional (in which case, you might ask, what's the point?) or much better at guiding the user through the steps (in fairness, you're explaining principles, not specifying software). The biggest step comes for me with the "green box" that plucks out the evidence from the data in the document. For novices, because there is a degree of abstract analysis going on here, there needs to be some sort of prompting for the attributes, otherwise all sorts of garbage will get written down, e.g. attributes of "What Angelina said". (Conversely, those adept at extracting information from data will find prompts get in the way).

    - if software is upgraded to use evidence management (as I would dearly like), the data in that software must be exportable from one software package and importable into another. Otherwise we are all islands of self-congratulatory intellectual posturing. It is not sufficient to be able to produce a fully compliant Chicago Manual of Style report. There needs to be a data interface. And since it is absurd to produce (say) 30 interfaces for 6 interfacing products (6x5=30), that means a single standard for an interface - in other words, a revised version of GEDCOM that includes evidence management and all the other good things we need.

    Good, thought provoking stuff.

    ReplyDelete
  3. Some questions about the data first --

    1) shouldn't your conclusin be she was born 5 February 1848 rather than 3 February (assuming the date you posted from the death certificate is correct).

    2) And I think you mean "corroborate" rather than "collaborate" in the "Reasoning" section.

    ReplyDelete
  4. I really like the simplicity and logical layout of the Evidence box and the Conclusion box. If I were using them, I would add the actual dates of the records rather than just the year.

    I'm guessing (I like to do that!) that this type of evidence collection and conclusion generation may be part of the "collaboration" effort that FamilySearch will be adding to New FamilySearch in the future. Bravo!

    I love using the Footnote Pages to collect documents and assertions too! Excellent choice...because they're free.

    I'm looking forward to the next installment of this series.

    ReplyDelete
  5. Thanks for thinking this through and providing the process flow chart, as well as the accompanying narrative and charts. As a relative newbie to citation, I found this very helpful.

    One thing confused me, however: the only evidence for a day and month (5 Feb) in outline of evidence doesn't match the conclusion (3 Feb). Did I miss something, or is one of these a typo? Even the best processes can be compromised by little mistakes.

    ReplyDelete
  6. Good start on this important topic.

    Your example of "evidence", where you create data fields from the document, is more traditionally called an "abstract", and is the second step in Elizabeth Mills's evidence analysis process. The fields you have marked as "calculated" and "implied" are assertions and belong in the conclusion object.

    I don't agree that only the evidence related to a single individual should be abstracted, though each statement should be tagged with the individuals to whom it pertains. That will allow one to easily filter the evidence statements for selection at the next step.

    The "Notes" field alongside each statement needs to be more than just a word ("primary" and presumably "secondary" -- remember that "calculated" and "implied" are assertions, not evidence). It should be a discourse (needn't be long) about how directly the informant of the document (not necessarily the same as the creator) knows about the statement and how "fresh" is that knowledge. For example, Bible records which appear to be written at separate times, implying that they were entered shortly after the actual event, are generally regarded as more reliable than are those which appear to be written in blocks of the same ink and hand, implying that they were written from memory some time after the event. It's also worthwhile to consider whether the actual informant is known (true for a will, not true for a census) and how likely that informant is to have direct knowledge of a statement.

    The Genealogic Proof Standard requires
    * A reasonably exhaustive search
    * Complete and accurate citation of sources
    * Analysis and correlation of collected information
    * Resolution of conflicting evidence
    * A soundly reasoned, coherently written conclusion

    The conclusion block falls short of demonstrating a reasonably exhaustive search or that the analysis and correlation is thorough and complete. The block should lead off with a discussion of the search and an explanation of why expected evidence (e.g., censuses for 1900, 1910, 1920, and 1930) isn't present. All evidence that might be germane should be included with a statement of why it is or is not germane (Is this the right person? Is the informant reliable?) The conclusion should state an hypothesis, summarize the evidence statements, and address germane evidence which conflicts with the conclusion.

    But there's more to evidence management: It's an iterative process, beginning with a research plan to find sources, the search for those sources, abstraction of the evidence, formation of an hypothesis with an analysis of the evidence found so far, and writing up the search so far. That's more or less what you've covered, but we've only started: Make a new research plan, this time looking explicitly for evidence which will refute the hypothesis, and start again. Repeat until you can't find any more evidence. Then you can write up your conclusions.

    This process doesn't mesh well with a tabular "fact" presentation associated with all the genealogy database programs I've tried. The eventual conclusion statement is likely to be compound: That Angelina Goldsmith was born on 5 Feb 1848 to Peyton C and ___ Clements. A single research objective is likely to generate several such conclusions, and the evidence related to all of them should be evaluated together.

    You shouldn't be entering anything in the lineage-linked part of the database until you've finished the evidence analysis, and the "citation" for each "fact" (or "tag", if you prefer the TMG name) should point to the appropriate conclusion in the evidence analysis database.

    ReplyDelete
  7. While I agree with the theory of what John says, I think we need to remember what we are talking about - introduction of evidence management to software used by the normal family historian. Phrases like "formation of an hypothesis" will either go straight over their heads or they'll put in a hypothesis like "Find her birth date" or "Do my family history".

    Further, an instruction like "Repeat until you can't find any more evidence. Then you can write up your conclusions" is way open to misinterpretation. "I can't find any more evidence in the IGI, so I can write it up". Err. No.

    (Of course, if you're a pedantic mathematician like myself, you just carry on round and round this loop because you never convince yourself there isn't some more evidence, somewhere, I just haven't found it yet!)

    Again - I absolutely agree with these processes when applied sensibly but somehow we need software to guide people through those processes - that's the issue, and the more sophisticated the ideas, the more abstract it becomes, and the less likely it will be to appeal to the ordinary hobbyist.

    Somewhere, there is a happy medium but it will depend on both the process of evidence management and the user interface.

    ReplyDelete
  8. Bruce brings up two interesting points.

    I don't disagree that some way of coaching novices along the right path is worthwhile. But once the user has been through the help a couple of times, the coaching will get in the way. Meanwhile, more experienced genealogists will be irritated with it from the start.(I haven't noticed that any extant software except perhaps GenSmarts provides any coaching at all, anyway.) It's better, I think, to encourage novices who are, or have the potential to become, serious to join their local genealogical society, to attend lectures at national and regional conferences, and to pursue coursework like the NGS Home Study Course.

    The second point, about when a reasonably exhaustive search has been conducted, is a judgment that every genealogist has to make on every search. It's not something that any software can do for you. Worse, the judgment itself is a matter of some debate, even among the pros. One noted genealogist is a bit notorious for spending years organizing the records of an entire courthouse in pursuit of evidence. Others take a more pragmatic approach. Most amateurs are going to limit themselves to what they can get at their local FHC and online, because they don't have the budget to travel to some distant courthouse or state archive to look at the originals. That's not the software's problem to solve either.

    ReplyDelete
  9. Ou! You guys are coming up with lots of good ideas.

    I see I've made some mistakes. Some of the ideas I would or wouldn't implement depending on the audience of my product.

    Keep 'em coming. Ancestry.com and FamilySearch, are you listening?

    ReplyDelete
  10. Does FamilySearch have an app?

    And what about Wholly Genes, RootsMagic, Leicester Pro, and the rest?

    OTOH, a highly placed FamilySearch exec told me at the NGS Banquet that he regards a follow on (to GEDCOM) data interchange protocol, and that they're working on something. John Wylie was batting that around to anyone who would listen at his GENTECH booth in the back of the exhibit hall as well. Including evidence management in that standard would provide some nice pressure on the software vendors.

    ReplyDelete
  11. The first statement in the "Analysis" column is inaccurate.

    Since the 1850 US Federal Census enumeration was supposed to be "as of" 1 June, the person age 2 would have been born between 2 Jun 1847 and 1 Jun 1848.

    Another factor to take into account is when the enumerator actually visited the household, which could be even months after the "as of" date.

    Persons giving enumeration data (neighbor or visitor or hired hand are possibilities) may not be aware of ages with any precision. Many people did not know when they themselves were born, and may not have known or may not recall others' birthdates.

    ReplyDelete