Monday, June 28, 2010

Evidence Management Diagram Revisited

Last week I wrote about evidence management and the New FamilySearch Tree. The plan this week was to write about Ancestry.com Member Trees. I struggled as I wrote. I decided Ancestry.com has a piece of evidence management that isn’t represented in my model. It was time to revisit the evidence management diagram.

Here is how it has looked:

An old Evidence Management Diagram

Since I’m hardly an expert on genealogical methodology, my model draws heavily on experts. Elizabeth Shown Mills “Evidence Analysis: Research Process Map” has these basic components:

 The Evidence Analysis Research Process Map

From sources we draw information. From information we choose evidence. The proof of a conclusion lies in a careful analysis of the source, the information, and the evidence.

Upon this foundation, I drew upon my technical background for my contribution. What interfaces (technically, user interfaces and underlying objects) does a genealogy program need to implement this? Desktop genealogy programs already have interfaces for entering sources and for displaying individuals. I came up with two more: an evidence summary, and a conclusion entry interface. Here are the four interfaces juxtaposed beneath the evidence analysis components:

Evidence Analysis plus Evidence Management

The bottom row became the evidence management diagram shown at the beginning of this article. Some of the boxes are displayed more than once to communicate some technical stuff that I won’t bore you with.

Writing about Ancestry.com Member Trees, I realize there is another interface that can play an important role in evidence management. I include it in the new evidence management diagram:

 The Evidence Management Diagram

The circular nature of the new diagram evolved (revolved ;-) from the addition of the new Compare interface.

  1. To evaluate a potential source, we compare all the “facts” we believe about an individual with information in the source. If the comparison is favorable, we have identified a new source.
  2. Through a source interface, we enter a citation and other information about the source.
  3. We take information from the source and create an evidence summary.
  4. To help us make a conclusion, the analysis interface displays relevant evidence.
  5. Our conclusion becomes one of the “facts” displayed about a person.

While names may have changed, the function of the red, green, purple, and blue boxes remains the same as before.

What do you think?

  • Are the changes an improvement?
  • Is it easier to understand?
  • Does it meet the needs of newer users? Experienced users? Genealogy program software engineers?
  • Is the circular format appropriate?
  • Have I correctly applied industry terminology?
  • Do the interface names accurately reflect the function of the interface as explained here and in previous articles?
  • There are technical inaccuracies, to be sure. (For example, information comes from the source, not the source interface.) Are there inaccuracies that can be corrected such that the usefulness of the diagram increases?

I have come to depend upon your feedback during this series of articles. After you’ve had a chance to respond, I hope to have the stamina to go back and revise all the previous articles with the new diagram and the new terminology.

Thanks in advance.

With this new model, I am ready to take on Ancestry.com… Next time…

Friday, June 25, 2010

Advisory Notices: Evading Emending?

The National Archives and Records Administration (NARA) invites patrons to inform them of quality problems in NARA collections posted on partner websites. Send information to digitization@nara.gov . Officials point out, however, that some problems are “difficult to resolve in a seamless and timely manner.” In such situations, partners are to post advisory notices.

NARA pointed to the Ancestry.com database, “U.S. World War II Draft Registration Cards, 1942,” as an example.

When these draft cards were filmed in the states of Delaware, Maryland, Pennsylvania, and West Virginia, the front of each card was photographed above the back of the previous card. For an example, the draft card for John Henry Sullivan is circled in red on the microfilm snippet, below. (For legibility I added larger frame numbers.) The front of his draft card is at the top of frame 1307 and the back is at the bottom of frame 1308.

The Delaware World War II draft registration cards were microfilmed with the front next to the wrong back

As NARA states, Ancestry.com has posted an advisory notice in the database description and one in the record display of a registrant from one of the affected states:

Note regarding the images for the states of PA, MD, WV, and DE. These four states were scanned at the National Archives facility in such a way that the back of one person’s draft card appears on the same image as the front of the next individual. The result is that when you click to view the original image, you will see the correct front side of the draft card, but the back of the previous soldier’s card. Ancestry is aware of this problem, and is working to correct this issue.

To put it simply, if you want to see the correct back of a card, go to the next image. How could I tell which direction to go? Among the Wilmington cards after frame 1308 there are a few Kent cards. Compare the registrant's residence on the front with the draft board stamp on the back. But I digress…

To Fix or Not to Fix…

NARA says the notice is to be used when the problem can’t be fixed in a “timely manner.” I assume they expect the vendor to eventually fix the problem. Ancestry.com published this collection in May 2006. In the four years since then Ancestry.com has published billions of records. Isn’t that long enough to address this problem?

Look at Daniel Sullivan’s draft card from New York. The front of the card and the back are on separate images. Would it be so hard to do the same in Delaware?

Thumbs Up and Thumbs Down

A big green-thumbs up to Ancestry.com for keeping the frame numbers. This practice conforms with industry standards I mentioned in archive-quality digital record repository and respect des fonds.

A green-thumbs down to FamilySearch for clipping off frame numbers in their “United States, World War II Draft Registration Cards, 1942.”

Two big green-thumbs up for Ancestry.com digitizing additional states—IN COLOR! Wow. That’s all I can say. Wow!

Tim Sullivan Draft Registration Card in color
(Sorry, Tim; no disrespect intended to you or your namesake.
This was the first card I came across with a dramatic ink color.)

A tiny green-thumbs down to FamilySearch because their collection is too…  down, that is. As I write this their collection is down on the Pilot RecordSearch site and hasn’t been published yet on the Beta Familysearch.org site. But the green thumb is just a tiny bit down; pilots and betas aren’t expected to work all the time.

Finally, a green thumb up and a green thumb down for Ancestry.com.

The issue is frame 1309 in the microfilm snippet, above. It’s missing. How do I know? Because Ancestry.com did the right thing and included frame numbers. That is the point, after all. Give users the wherewithal to independently detect errors. If you view frame frame 1308 and click to the next image, you end up at frame 1310.

Missing a frame could lead a user to associate fronts and backs that don’t belong together.

But wait. This situation is not that simple. I consulted the microfilm and found that frame 1309 is a repeat of 1308. Had Ancestry.com left in frame 1309, users might have associated the front of the green card with the back of the red card!

Frame 1309 is a duplicate of frame 1308

Their solution was to silently remove frame 1309 so fronts and backs were associated correctly. But if a user notices a frame is missing, then without an explanation and without access to the original microfilm, a user must assume that one front and one back are missing, leaving two cards compromised.

The correct solution would be to include an advisory notice on frame 1309.

Here’s my advisory notice to Ancestry.com and FamilySearch: Can you see why industry best practices are “best practices?”

Wednesday, June 23, 2010

What About Current Problems?

Ancestry.com added the missing ship Etruria
Ancestry.com added the missing ship,
Etruria, to Browse of New York pass-
enger arrivals on 13 November 1893.

Ancestry.com has not totally fixed the misspelling of Throop township
Ancestry.com only partially fixed
the misspelling of Throop township.

The National Archives and Records Administration (NARA) recently responded to patron concerns regarding NARA record collections posted by partners such as Ancestry.com.

NARA takes the concerns raised by researchers seriously. We are working with our partners to improve their digital products, including those produced before the partnership agreement, as problems come to our attention. Our partners want to rectify errors and are cooperating in doing so.

Patrons are urged to report specific problems with partner collections by sending e-mail to digitization@nara.gov .

Already Ancestry.com has responded to several cases. In one, Ancestry.com added the ship Etruria, missing from Browse for New York passenger arrivals on 13 November 1893 (see illustration, top-right).

In another case, searches in the 1920 census, Throop township, Lackawanna County, Pennsylvania failed because Throop was misindexed as Throap. Ancestry.com fixed the search index in a timely fashion. However, Ancestry.com has yet to fix the spelling used in the browse menus, district descriptions, and headers above images (see illustration, bottom right).

Officials pointed out that some problems are “difficult to resolve in a seamless and timely manner.” In such situations, partners are to post advisory notices.

Next time, I’ll examine one such advisory notice.

Monday, June 21, 2010

The Evidence Architecture of the New FamilySearch Tree

See “Evidence Management” for an overview of this series and for links to other articles.

Let me show you that the New FamilySearch Tree (NFS) has the architecture needed for evidence management. You may not want to try this at home; I’m a trained professional on a closed course.

Before we dive into New FamilySearch, it will help if you know some NFS technical terminology.1

  • An evidence summary (a green box in the evidence management diagram) is called a persona.
  • An individual (blue box) is called a person.
  • A conclusion (purple box) is called an assertion.

If I specify both my term and the NFS technical term, I will separate the two with a slash, like this:

conclusion (assertion)

Example

To illustrate evidence management in NFS, I again use an example from David Rencher. I created an evidence summary (persona) in NFS for Angeline Clements for each of the five sources, below.

Source Name Persona ID Evidence Summary  
Image 1850 Census Peyton C Clements LWXX-TPN Evidence Summary: 1850 Census Peyton C Clements Link
Image Peyton Clements probate finalized LWXX-5Y3 Evidence Summary: Peyton Clements probate finalized Link
Image 1880 Census W H Goldsmith LWXX-BGT Evidence Summary: 1880 Census W H Goldsmith Link
Image Death certificate A J Goldsmith LWXF-M6T Evidence Summary: Death certificate A J Goldsmith Link
Image Marriage W H Goldsmith LWXF-9ZV Evidence Summary: Marriage W H Goldsmith Link
Source

Click the image link to see a digital copy of the source.

Name (Persona ID)

I included evidence summary names in the table even though NFS doesn’t support them. Instead, NFS uses persona IDs. Remember, a persona in NFS takes the place of an evidence summary.

Evidence Summary and Link

Click the thumbnail to see an image of the evidence summary. Compare a source and summary to see that the two match.

I included the images of the summaries (personas) because once connected—combined as NFS calls it—to an individual (person), NFS provides no way to see just the summary. I included links to the summaries, but once connected (combined), the link brings up the individual rather than the summary.

Creating Evidence Summaries in NFS

With such a perfect architecture under the covers, it is unfortunate that FamilySearch chose to dumb-down the interface to the lowest-common denominator. I think it is possible to adequately serve patrons of all genealogical maturity levels. Because NFS doesn’t give direct access to personas, and because I know that creating an individual (person) also creates an evidence summary (persona), I clicked on Add Information and then Add an Individual That Is Not Connected to My Family Tree.

Then I added the evidence from the source:

Entering evidence for: Peyton Clements probate finalized 
Evidence from the “Peyton Clements probate finalized” source.
Click on this and subsequent images to enlarge.

Since the summary was to be combined with Angeline, I entered only her information. Because it was not obvious how I arrived at a birth date of “Before 25 November 1852,” I clicked on the note icon and entered: “Probate file states Angelina is over 21 years of age.”

For each summary, I clicked on Source details and added a citation. A true evidence summary has but one source for the entire summary. Because NFS doesn’t have true summaries, I clicked the checkbox, Use this source for everything… As mentioned in the past, NFS source templates are inadequate. Consequently, I entered the entire citation in the Comment field, as illustrated below.

Source of evidence summary: Marriage W H Goldsmith
Source for the “Marriage W H Goldsmith”
evidence summary

Once I completed an evidence summary, I clicked Continue. For summaries without death information, NFS gave me the opportunity to enter it.

NFS message: Death info must be added

I appreciate the reminder to enter information that might have been inadvertently missed. But It is unfortunate that NFS insists on adding information—a death flag—even when the source did not contain death information. It seems superfluous, given that NFS is already completely convinced that the person is dead.

NFS then let me review the information.

Review evidence summary: Angeline death certificate 
Reviewing the evidence summary, “Angeline’s death certificate”

Notice that all the information is repeated twice. This is one of the nice features of NFS. I can enter information exactly as it appears in the source document. Below the original text, NFS displays its standardized interpretation so I can see if it understood. Most programs either throw away the original text, or silently—and perhaps incorrectly—interprets it.

If NFS truly supported evidence management I would click Review Possible Duplicates to see if anyone else had already entered an evidence summary for this source and this person. Then I would click Done.

After I clicked Done I saw the completed evidence summary (persona).

Evidence Summary: Peyton Clements probate finalized
The “Peyton Clements probate finalized” evidence summary

Reaching Conclusions

Conclusion entry in NFS for name of Angeline ClementsAs I’ve mentioned, the big payoff for evidence management comes next, when making a conclusion. NFS requires that evidence summaries (personas) be connected (combined) to individuals (persons) prior to using the evidence to make conclusions. The process is messy and reflects the duplication problem that FamilySearch painted itself into. Indulge me if I gloss over it and go right to entering conclusions. Suffice it to say, I connected all the summaries to the individual.

I then clicked on the Summary tab. In the context of evidence management, it could be called the Conclusion tab. The Conclusion tab shows the basic vitals: name, gender, birth/christening, and death/burial. Next to each is a down arrow (pointed out by the mouse pointer in the illustration to the right). I clicked the down arrow and NFS displayed all the values entered into the evidence summaries. I pointed to the one representing my conclusion (pointed out by the hand in the illustration) and clicked it.

Below, compare my suggested conclusion entry (left) with that of NFS (right) for the birth date of Angeline Clements.

Suggested conclusion entry for Angeline Clements birth date  NFS conclusion entry for Angeline Clements birth date

Comparing the two you will notice several shortcomings in NFS conclusion entry:

  • Without a summary name, it is difficult to remember where each piece of evidence came from.
  • The notes I entered in the summary are not displayed.
  • Obviously, since NFS has no provision for recording the attributes of the evidence, none can be displayed.
  • There is no place to enter analysis about each piece of evidence, so other users have no way to know if contrary evidence has been handled.
  • There is no place to enter the overall reasoning used to reach the conclusion.

No Benefits

The evidence management compliant architecture of NFS has given FamilySearch nothing but problems, so I expect they will discard it. One reason it has proven problematic is FamilySearch’s decision to preload millions of junky, sourceless evidence summaries (IGI patron submissions, Ancestral File, and Pedigree Resource File). But I digress.

One reason that NFS users have derived no benefits from the NFS evidence management architecture is that NFS designers failed to give users a way to see whole evidence summaries. Clicking on the  details tab of Angeline Clements doesn’t allow users to see the evidence summaries or even a list of the summaries. Instead, it shows all the evidence interlaced and out of context:

Details of Individual LWFZ-RDZ

Use the Combined records option to see the connected (combined) evidence summaries:

Use the Combined Records option to see NFS evidence summaries

This option gives users the ability to disconnect (separate) summaries (personas) mistakenly connected to the wrong person. When you do this, all the assertions from the source come out in a group. And when the summary is reassigned to the correct person, all the assertions come in as a group.

Conclusion

Because I understand the NFS architecture, I can tap into the strength of its evidence management. But doing so is painful. FamilySearch users have not benefitted from their architecture for two reasons. First, since FamilySearch preloaded garbage into NFS, garbage is what they’re getting out (GIGO). Unfortunately, user interface and architectural decisions are now being driven by the resulting ripples. Second, FamilySearch chose not to provide a different user experience to immature users and mature users. FamilySearch then had to simplify the user experience by hiding the existence of evidence summaries (personas).

To conclude this presentation of the New FamilySearch Tree architecture, let me say that it is extremely impressive and ideally designed for evidence management. Hopefully, FamilySearch will one day leverage their technical superiority by opening up persona management—or, as I call it—evidence management.


Sources

     1.  Rob Lyons, “Family Tree Combine/Separate,” FamilySearch Developers Conference, 2008; online archive, “Recorded Presentations,” FamilySearch Developer Network: for Software Programmers (http://devnet.familysearch.org : accessed 18 June 2010). Also “Glossary,” FamilySearch Developer Network.

Friday, June 18, 2010

NARA Responds to Ancestry.com Issues

Microfilm Documents Missing

The National Archives and Records Administration (NARA) recently responded to accusations that Ancestry.com posts NARA record collections with numerous quality problems, including missing documents.

A spokesperson for NARA wanted to make it clear that Ancestry.com has unilaterally digitized and published more than 300 NARA microfilm collections with over 70 million images prior to entering into a contractual relationship with them.

Ancestry.com digitized, indexed, and placed these images online using NARA microfilm publications that are available to anyone by purchase from NARA.  This was strictly the work of Ancestry[.com], with no involvement, oversight, or quality assurance work by NARA.

NARA has posted a list of all their collections online at partner websites, whether the collection was produced before or after their agreements. The list is at http://www.archives.gov/digitization/digitized-by-partners.html and will be updated on a regular basis. But I digress…

At a digitization facility in Silver Spring the two have started digitizing records per the contractual arrangement. NARA preps records for scanning, Ancestry.com does the scanning, and NARA conducts quality control checks. According to NARA,

Both staffs ensure that every page has been imaged. NARA does a page-by-page quality control check on 5% of the boxes scanned. If a problem arises, mistakes are rectified immediately and the percentage of review on that camera operator’s work is increased. An operator must image two consecutive boxes perfectly before the audit returns to the 5% level.

For a recent project, 9 boxes out of 130 were checked. The highest error rate for any one box was 4 pages missing for every 1,000 pages. Missing pages were immediately digitized before processing continued. The overall error rate for all the boxes reviewed was 7 missing pages out of every 10,000. Most missing documents hadn’t actually been skipped, but scanning failed to pick up a light stamp or mark on an otherwise blank page.

“NARA considers that digitizing thousands of documents and having them available online with unprecedented indexing is worth the small percentage of error.” As one attempts to drive the error rate to 0, the cost explodes exponentially. In other words, dropping the error rate from 0.07% to 0.007% might cost 100 times as much, and dropping to 0.0007% might increase costs 10,000 times.

While NARA seems to feel that the current quality/cost ratio is acceptable, a spokesperson made it clear, “NARA does not want errors.” Next week I’ll tell you what NARA officials recommend you do when you come across problems.

Wednesday, June 16, 2010

Ancestry.com Missing Documents

Microfilm Documents Missing Staff at the National Archives and Records Administration (NARA) recently responded to accusations that Ancestry.com posts NARA record collections that are missing documents.

Earlier this year at the Ancestry.com Annual Bloggers Day Todd Jensen briefed us on their new NARA scanning facility in Washington D.C. (I alluded to the presentation in one of my articles. I was waiting on Ancestry.com for photographs before I wrote my article. Now I probably can’t remember enough to do the presentation justice. But I digress…)

At that time I asked if Ancestry.com still dropped images from NARA collections when they published them. Andrew Wait assured us that their policy has always been to publish every image. Another Ancestry.com employee in the room (I don’t remember who) leaned over and whispered some circumstance in which they had dropped images.

My hearing isn’t all that sharp, so I didn’t hear the circumstances mentioned, but it is well known by most Ancestry.com subscribers that Ancestry.com has always done so. Ancestry.com seems to feel they are doing everyone a favor by chopping and dicing up census microfilms:

  • Dropping images with no legible names:
    • microfilm headers
    • NARA publication booklets
    • covers
    • census totals
    • blank forms
    • pages that can’t be read because they were imaged too dark, too light, or too blurry
  • Rearranging census districts according to alphabetical jurisdictions
  • Preventing going past either end of a group of images

These changes are perfectly reasonable to decision makers that dabble in genealogy just enough to be dangerous.

And, in fact, these changes might indeed be an improvement if Ancestry.com also allowed unimproved access. The former without the latter has serious repercussions:

  • Removing context removes information
  • Tampering with evidence decreases its evidentiary value
  • The changes rob users of any way to detect documents that were inadvertently dropped
  • Removing illegible images gives NARA staff members no way to know that access to the originals is warranted

For these reasons, members of the Association of Professional Genealogists (APG) have criticized Ancestry.com’s practices. Last year Peggy Reeves pointed out that all but one of the first 25 soldiers from roll 402 are missing from Ancestry.com’s publication of T-288, General Index to Pension Files, 1861-1934. If Ancestry.com allowed unimproved access to this NARA publication, Reeves would have discovered one of two things. Either the images were illegible, or Ancestry.com had inadvertently left out all the index cards from “Charles Roe” to “Allen Rogers.”

That’s a lot of missing documents.

Digital publishers might want to take a lesson from microfilming practice. FamilySearch (as the Genealogical Society of Utah) always filmed every document, but when an original document was illegible, included a label indicating “Illegible Original.”

Next time I’ll share what NARA had to say about all this.

Monday, June 14, 2010

Genealogist or Gossip

If they don't depend on true evidence, scientists [including genealogists] are no better than gossips.

Penelope Fitzgerald

In the example presented in “Evidence Management Explained” we saw that the big payoff of using evidence management occurs when a conclusion is drawn. Drawing from the example of Angeline née Clements Goldsmith again, suppose a week or more passed between discovering and entering each piece of evidence. As each piece of evidence about her birth is entered, we might well revisit our conclusion about her birth date. Each time we click to edit the birth date, the evidence manager displays a conclusion entry window.

For convenience I have reproduced the conclusion entry window from the example. I have incorporated most of your feedback. If I didn’t get yours, remember that this example is only conceptual.

Summary Name Asser-tion Evidence Notes Created Date Link Analysis
Automatically Selected Evidence
1850 Census Peyton C Clements Age 2 Image copy of federal copy 1850 Source The earliest record; at just two years of age, it is highly likely that the 1850 census correctly implies 1848.
1860 Census P C Clements Age 12 Image copy of federal copy 1860 Source Next earliest records agrees with 1848
1870 Census P C Clements Jr. Age 18 Image copy of federal copy 1870 Source New orphans with all birth dates wrong suggests a 3rd party supplied the data
Peyton Clements probate finalized Age Over 21 years Image of original. Primary information 25 Nov 1873 Source 1848 and 1850 are consistent with father’s probate record
1880 Census W H Goldsmith Age 25 Image copy of federal copy 1880 Source Census ages ending with 0 or 5 are suspect
Death certificate A J Goldsmith Birth date 5 Feb 1850 Image copy of original. Secondary information 1939 Source There is no reason to doubt 5 February even though the 1850 is not possible according to the 1850 census
Gravestone A J Goldsmith Birth date 1850 Secondary 1939 Source Likely same informant as death certificate
Manually Selected Evidence
Marriage W H Goldsmith Marr-iage Date 15 Jan 1873 Explicit 1873 Source Birth from 1843-1858 is likely.
1850 Census Peyton C Clements Sibling Eleanor Age 1 1850 Source To have a 1 yr old younger sibling in 1850, Angeline must have been born in 1848.
Conclusion for Birth date: 5 February 1848
Reasoning: It is clear that the earliest records have the correct birth year. While there is no collaborating evidence for the day and month, there is currently no reason to doubt it.

 

How do the vendors do with conclusion entry? Here is what I found:

Features Ancestry.com FamilySearch.org Footnote.com
Conclusion entry facilitates intelligent conclusions by design. No No No
Pertinent assertions from all relevant evidence summaries are gathered together and displayed in one place. Yes-ish Yes Yes (but not grouped together) if a person page is considered a conclusionary person, but no if the person page is an evidentiary person
To encourage critical thinking, notes can be entered for each piece of evidence and for the conclusion. Yes-ish Yes-ish Yes-ish
Evidence is automatically selected for analysis based upon the conclusion type. For a birth date conclusion, evidence about age and birth date are automatically displayed. No Yes Yes (but not grouped together) if a person page is considered a conclusionary person, but no if the person page is an evidentiary person
The user can manually select other evidence. No No All evidence is displayed in all cases.
Attributes are displayed for each piece of evidence and its source. For the source, these might include: original or derivative, derivative type, recording date, and recorder. For the evidence, these might include: informant, primary or secondary, direct or indirect, and supportive or contradictory. No No No

 

I realize these tables are barely better than nothing. I need to display some screen shots from each of these products to illustrating what I’m talking about. I had planned to start those today, but articles for Wednesday and Friday caused a digression and my weekend is over. Stay tuned…

And just so you (the vendors) know, I’m always open to answer questions you may have regarding evidence management… no extra charge.

Saturday, June 12, 2010

Rrrrr -- SPAM Comments -- Rrrrrr

I apologize for the SPAM comments that are being posted to my website. I know some of you subscribe to comments and have to wade through them.

This has forced me to enable a feature on my website that requires me to review each comment before it is published. Unfortunately, I do much of my Insider work on weekends so that it doesn't infringe on my employment. That means it will be difficult to review your comments in a timely manner.

I am very grateful for your comments. You are an amazing group and happily I learn from what you have to say. Hopefully this will be a temporary change so that you can continue to respond to other commenters in addition to responding to me.

This change is effective immediately.

-- The Ancestry Insider

Wednesday, June 9, 2010

No, You’re Not Traveling Through Time

While it’s true that the Ancestry.com Bloggers Day was held back in January 2010, I never got around to putting together a table of contents to all the articles I wrote about it. I’ve rectified that with the other article you’ve seen published today: “Ancestry.com Bloggers Day 2010.”

If you caught all the articles as I published them, you can ignore this. Otherwise, I hope you find it useful.

Ancestry.com Bloggers Day 2010

Click each of the following links to read about each presentation made at Ancestry.com’s Bloggers Day 2010.

Monday, June 7, 2010

Close, But No Cigar

FamilySearch Places Records Into Folders 
In the New FamilySearch tree, records
are combined into folders from which
summary values are selected. Image
courtesy: FamilySearch International.

See “Evidence Management” for an overview of this series and for links to other articles.

Vendor Support of Evidence Summaries

Each vendor has some features that tantalizingly approach evidence summaries.

The new FamilySearch Tree is close. Evidence summaries are placed into a folder for an individual from which users choose a conclusion (they call it a summary value). FamilySearch is the current technology leader. But then they preloaded evidence summaries with oodles of worthless, source-less, secondary informational, derivative sources: Ancestral File, Pedigree Resource File, and user-submissions to the International Genealogical Index. This, in turn, generated IOUSs, which in turn crashed their servers.  Further, they surfaced none of an evidence summary’s usefulness to users. Consequently, they see no advantage to their technology leadership.

FamilySearch has announced their intent to separate artifacts from individuals. This moves them in the right direction. But because their technological advance has given them nothing but problems, they will be tempted to abandon it entirely. Keep your fingers crossed.

Footnote is perhaps closest. They have created evidence summaries for individuals mentioned in collections such as the Social Security Death Index and the 1930 Census. They have created person pages that allow users to store conclusions. However, they have positioned the two kinds of pages as equivalent. Consequently, users are frustrated because there is no ability to combine the two.

Footnote should slap an “Evidence Summary” moniker on their evidence summary person pages. And they need the ability to attach an evidence summary to a user contributed person page in such a way that the person page inherits evidence from the evidence summary. Inheritance also allows users to inherit from other users’ person pages. This allows users to share their contributions without worry that other users will modify their pages.

Features Ancestry
.com
FamilySearch.org Footnote.com
Evidence summary stores abstracted evidence separately from conclusions No
Yes. The NFS tree documentation uses the terms record, folder, and summary to explain the evidence summary (green box), individual (blue box), and conclusion (purple box), respectively. All a person's records are placed (combined) into their folder and users choose a summary value when conflicts exist.
 
No for user contributions. No distinction is made between records and folders.
Yes-ish. Duplicate person pages come close, but there is no concept of an evidence summary person page versus a conclusion person page.
Evidence summary stores abstracted evidence separately from sources Yes-ish for provided sources. Yes for preloaded. Yes
Each piece of evidence in a summary is categorized by the assertion type (e.g. name, gender, age, birth date, birth place, marriage date, place, death date, place, burial date, place, and so forth) No Yes for preloaded. Yes
User can view and work with an entire evidence summary No Yes when combining or separating records. No
User can give descriptive name to an evidence summary No No No
Can generate a list of evidence summaries No No No
Can sort and filter lists by name, subject, source, evidence creation date, informant, evidentiary weight, etc. No No No
Evidence summary is linked to source (entry) No Yes for preloaded. Yes
User can characterize evidence as primary information or secondary information, supporting or conflicting, direct or indirect. No No
No
User can enter notes about a piece of evidence No No Yes-ish

Monday, May 31, 2010

“Evidence: There’s No Better Rule”

See “Evidence Management” for an overview of this series and for links to other articles.

“You [are not] responsible for my mistakes and wrong conclusions,” [said Pip.]

“Not a particle of evidence, Pip,” said Mr. Jaggers, shaking his head and gathering up his skirts. “Take nothing on its looks; take every thing on evidence. There’s no better rule.”1

 

I’ve discussed the immaturity of the source summaries of Ancestry.com and FamilySearch.org. Today, I will examine the embryonic nature of their evidence management and their total lack of evidence summaries (the green boxes in the diagram, below).

The Evidence Management Diagram
Evidence can be managed when
separated from sources and conclusions

Current genealogy programs lack separate treatment of evidence. Source summaries and records of individuals are left struggling to fill the gap. Some aspects are picked up by each, as shown in the diagram below. Contrast this with the evidence management diagram (above).

Diagram of genealogy programs that lack evidence management
Current genealogy programs suffer because
evidence is not managed separately

One may well ask if it matters that users follow established research processes and proven techniques for evidence analysis. Consider the following:

  • User retention goes up when success goes up. Conversely, research errors are demotivating at best and—when publicly and embarrassingly discovered—can be humiliating.
  • Genealogy practitioners have invested tens of thousands of hours distilling successes and failures into a research process that repeatably yields positive results.
  • Subscribers whose retention is most at risk are the very users least likely to follow successful research practices without guidance.
  • Software that reflects and enables successful research practices will lead to greater research success and greater subscriber retention.

Think back to the example in “Evidence Management Explained” where the month and day for the conclusion came from one source and the year came from several other sources. Compare the two diagrams above, while you think about these questions:

  • When evidence and conclusion are one and the same, what do you do when making a conclusion that is different from the evidence in any one source?
    • Do you enter a primary conclusion that is not linked to any source?
    • Do you link the primary conclusion to a source that states something different from the conclusion?
  • When evidence and sources are one and the same, what do you do when the source contains both primary and secondary information?
  • When evidence and conclusion are one and the same, what do you do with contradictory evidence?
    • Do you enter an erroneous, alternate conclusion?
    • Do you leave the contradictory evidence unconnected to the conclusion?
    • Do you enter source notes explaining the contradictory evidence?
    • What if the source also contains supportive evidence?

Further, when evidence is not managed separately from sources and conclusions:

  • Reports can not be generated about the evidences.
  • Evidence lists can’t be generated.
  • Evidence can’t be flagged as primary/secondary, direct/indirect, or supporting/contradicting.
  • Lists can’t be sorted by characteristic.
  • Lists can’t be grouped by geography or by time period.
  • Lists can’t be printed showing evidence that requires additional work or special handling.
  • Should you discover you misidentified a person, it takes a lot of manual work to move evidence from one individual to another.

Evidence. There’s no better rule.


Sources

     1.  Charles Dickens, Great Expectations (Boston: Estes and Lauriat, 1881), 366; digital images, Google Books (http://books.google.com : digitized 19 March 2008).

Wednesday, May 26, 2010

Of Sources and Citations: All Bets Are Off

See “Evidence Management” for an overview and links to other articles in this series.

I recently received this message (which I have edited slightly).

Dear Ancestry Insider,

In "Evidence Management in the Wild" you wrote “As does PAF, Ancestry.com misuses the terms source and citation.” I would find it useful if you could put up the "correct" definition of those two alongside the PAF and Ancestry definitions, so I can understand the differences. I don't use those 2 for recording stuff but do use a GEDCOM based program, which may, or may not, have similar "issues". (I put “issues” in quotes not because I don't believe you, but rather because on my side of the Atlantic this sort of thing doesn't register anywhere in Family History. On the other hand being trained as a mathematician and having done data modeling, I do appreciate robust definitions)

Sincerely,
Adrian Bruce

Dear Adrian,

I would be happy to.

Elizabeth Shown Mills taught me that wise genealogists would do well to recognize the centuries of scholarship that proceed them. Thus, for the definitions of source and citation, one need look no further than the dictionary. Back in November I did just that, augmenting the definitions from the writings of leading genealogists. (See "Genealogical Maturity Model Definitions.")

  • source – 1. the origin that supplies information.1 2. “an artifact, book, document, film, person, recording, website, etc., from which information is obtained.”2 

  • citation – 1. “citations are statements in which we identify our source or sources for…particular [information].”3 2. “a citation states where you found [the cited] piece of information.”4

  • information – 1. “knowledge obtained from investigation.”5 2. “the content of a source—that is, its factual statements or its raw data.”6

  • evidence – 1. “something that furnishes proof.”7 2. “information that is relevant to the problem.”8 3. analyzed and correlated information assessed to be of sufficient quality.9 4. “the information that we conclude—after careful evaluation—supports or contradicts the statement we would like to make, or are about to make, about an ancestor.”10

  • conclusion – 1. “a reasoned judgment.”11 2. “a decision [that should be] based on well-reasoned and thoroughly documented evidence gleaned from sound research.”12

Citation Style

For citations, the Board for the Certification of Genealogists recommends the humanities style from the Chicago Manual of Style. The humanities citation style utilizes reference notes (either footnotes or end notes) and bibliographies. Bibliographies are sometimes called source lists because they are lists that summarize all the sources used in an article, report, or work.

Reference note citations must be highly specific, citing the page level in a book or the certificate level in a vital records collection. Befitting its role as a summary, citations in a bibliography are more general. All the pages cited in a book can be summarized by citing the book. Citing a birth certificate collection is an appropriate summary for several certificate citations.

I have my own name for the information dropped from a reference note to create a citation in a bibliography. I call it locator information because it allows a researcher to locate the specific source cited by a reference note within the general source cited in a bibliography. For a book, the locator information is the page number. For a birth certificate, the locator information depends on the clerk's filing system. Are certificates filed by certificate number? Person's name? Birth date? Are new files started each year? Locator information for birth certificates involve one or more of these pieces of information, as appropriate.

PAF Meant Well

For reasons I will explain in a minute, PAF users are surprised to learn that both a reference note and an entry in a bibliography are citations. They may be likewise surprised that both a page of a book and an entire book are sources. You'll see the origin of their confusion in a moment.

Recording citations can be tedious, so programs use several ways to make it easier. Many genealogy programs exploit the fact that citations in bibliographies lack the locator information found in reference note citations. By prompting users separately for the bibliography citation and the locator information, the information in the bibliography citation doesn't have to be retyped for each reference note. Unfortunately, when prompting users separately for the bibliography citation and the locator information, PAF called the former a source and the latter a citation.

Oops.

Let me summarize. PAF calls a bibliography citation a source. And it calls locator information—a portion of a reference note citation—a citation.

Let me say it another way. PAF uses source for something that is not a source and citation for something that is not a citation.

Yes, they meant well. But this error has propagated to subsequent genealogy programs (which also faced the lack of a term for locator information).

Non-genealogists use the terms source and citation and they understand one another. Genealogists use the terms and all bets are off.


     1. Merriam-Webster Online Dictionary, online edition (www.m-w.com : accessed 23 November 2009), “source.”

     2. Elizabeth Shown Mills, CG, CGL, FNGS, FASG, FUGA, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, 2nd ed. [hereinafter, EE2] (Baltimore, Maryland: Genealogical Publishing Company, 2009), 828.

     3. Mills, EE2, 42.

     4. Patricia Law Hatcher, CG, FASG, quoted in The Source, ed. Loretto Dennis Szucs, FUGA, and Sandra Hargreaves Luebking, FUGA, 3rd ed. (Provo, Utah: Ancestry, 2006) p. 24; citing “How Do You Know?” in Producing a Quality Family History (Salt Lake City: Ancestry, 1996), 117.

     5. Merriam-Webster, “information.”

     6. Mills, EE2, 24.

     7. Merriam-Webster, “evidence.”

     8. Mills, EE2, 822.

     9. Christine Rose,CG, CGL, FASG,, Genealogical Proof Standard: Building a Solid Case (San Jose, California: CR Publications, 2005), 2.

     10. The Board for Certification of Genealogists (BCG), The BCG Genealogical Standards Manual, ed. Helen F. M. Leary, CG, CGL, FASG, (Provo, Utah: Ancestry, 2000), 8.

     11. Merriam-Webster, “conclusion.”

     12. Mills, EE2, 820.

Tuesday, May 25, 2010

NGS Conference Not the Last SLC Snow

NGS attendee, Randy Seaver, enjoys a spring snow storm in SLCJust a quick note on yesterday’s weather in Salt Lake City.

Attendees of the National Genealogical Society Conference in Salt Lake City were treated to some snowy Utah spring weather. Randy Seaver posted a picture (right) and said, “For a San Diego boy who has seen snow fall like twice in his life, this was a really big deal.” (Click on the thumbnail to see the entire photograph.)

MaySnowBut that storm was not be be the last of the season. Yesterday Salt Lake City broke a record for the latest spring snow storm ever.

“Think Warm Thoughts.”

Photo courtesy KSL.com, submitted by D Van Wagoner.
Click the photograph to see the complete photograph and others submitted by KSL viewers.

Monday, May 24, 2010

Evidence Management

A researcher acquires sources, extracts information, identifies evidence, and analyzes all in all to reach a defensible conclusion. This is a genealogist’s research process. This is the standard a genealogist uses to “prove” a conclusion.

Evidence Management consists of the methods and tools a researcher uses throughout the research process to gather, track, and apply evidence.

Just as individuals grow in genealogical maturity, so too does software. This series of articles examines the maturity of evidence management on the Ancestry.com and FamilySearch.org websites. Software vendors are the primary audience for this series. It will be quite technical, so some of you will have to bear with me. If I simplify it too much, I run the risk of miscommunication to the vendors. Hang with me, I have some lighter fare planned for the future.

Evidence Management diagram
The Evidence Management diagram shows the distinction that should exist between source tracking, evidence summaries, conclusions, and people in a genealogy programs.
 

As new articles are published, I will add links to this table of contents:

To learn more about evidence, the research process, and the genealogist’s standard of proof, start with these sources.


Sources

Mills, Elizabeth Shown, CG, CGL, FASG, FNGS, FUGA. “Fundamentals of Evidence Analysis.” Evidence Explained: Citing History Sources from Artifacts to Cyberspace. Second edition. Baltimore, Maryland: Genealogical Publishing Company, 2009). Pages 13-38.

———. Evidence Analysis: A Research Process Map. Laminated study guide. Washington, D.C.: Board for Certification of Genealogists, 2006. I haven’t personally used this source, but I understand it is a separate publication of the diagram inside the front cover of Evidence Explained.

———. “Working with Historical Evidence: Genealogical Principles and Standards.” National Genealogical Society Quarterly [issue titled Evidence: A Special Issue of the National Genealogical Society Quarterly] 87 (September 1999): 165-84.

Rose, Christine,CG, CGL, FASG. Genealogical Proof Standard: Building a Solid Case. San Jose, California: CR Publications, 2005.

Tucker, Mark. “Genealogy Research Process Map.” ThinkGenealogy: Genealogy, Software, Ideas, and Innovation, 10 July 2008. http://www.thinkgenealogy.com : accessed 23 May 2010.

Wednesday, May 19, 2010

Evidence Management in the Wild

Evidence management diagram

See “Evidence Management” for an overview of this series and for links to other articles.

This week I’ll start to evaluate the strengths and weaknesses of evidence management on Ancestry.com and FamilySearch.org.

I outlined my idea of evidence management in two previous articles (“Why Can’t You Get It Right?” and “Evidence Management Explained”) and got some great feedback from you. I had time to incorporate some of it into last week’s article, but if you haven’t read the feedback yet, you should (here and here).

Sorry, Randy. I’m not telegraphing FamilySearch’s plans; I can’t speak for them. I’m writing this series as a challenge to vendors, including FamilySearch. In publicly critiquing products of my employer, I walk the fine line between being helpful and being fired. You’ll notice I never criticize my employer and I never criticize its products without proposing solutions. But I digress… It’s time to walk that line.

Vendor Support for Source Tracking (the Red Boxes)

The table below shows source tracking features that are needed for good evidence management. I’m filling out parts of this table from memory. Let me know if I’ve made mistakes.

A source is a person, document, page, book, web page, or other artifact that supplies information. In the evidence management diagram, each red box represents the information stored by the genealogy website to track a source. It contains information such as a citation, a transcript, a digital image, and a link.

There are two definitions you’ll need to know for this and subsequent articles:

  • assertion – information about a person, relationship, or event. Birth date and birthplace are two examples of assertions. An assertion can exist as evidence or as a conclusion.
  • field – a square area on the computer screen where the user types in information.
Source Tracking Features Ancestry.com FamilySearch.org Footnote.com
Provides digitized sources online 4 billion genealogy records Record Search, IGI, PRF, AF Indexed images of original documents
Can upload digital images Yes No Yes
Links are “hot,” that is, can be clicked to reach destination Yes Yes Yes
Citation templates for referencing offline sources One Two None?
Handles sources uniformly, whether provided, uploaded, linked, or referenced No. Four different ways: 1. provided, 2. uploaded, 3.linked, 4.  referenced Not supported: 1. provided, 2. uploaded, 3. linked No. Three different ways: 1. provided and uploaded, 2. linked, 3. referenced
Manages sources independently of assertions No No No
Supports variety of citation formats/templates for uploaded images None; put in note field Not supported None; put in description field
Transcription field Yes, but inconsistently labeled Not supported No
Annotate images Yes Not supported Yes
Corrections to provided sources Yes No Annotations?
Annotations are searchable No? Not supported No?
Corrections are searchable Yes Nut supported Ditto?

 

Examples

Below is an example of Ancestry.com’s one and only citation template.

  • Because there is only one citation template, it is not possible to follow industry standards for citations.
  • Because there is only one citation template, it favors published sources even though non-published sources are more important to genealogists.
  • As does PAF, Ancestry.com misuses the terms source and citation.
  • While links to records provided by Ancestry.com are easily available from the assertion on the person page, links to non-Ancestry.com sources are nearly inaccessible. One must open the details about an assertion, then open the source, and only then can you click the link. To Ancestry.com’s credit, it used to be worse. Once unburied, the link used to be dead, requiring a copy and paste into a browser. (Click for a larger view.)

Ancestry.com citation format

Ancestry.com handles uploaded sources in a completely different way. Because there is no citation field, one must place it in the description field.

Ancestry.com citation for uploaded source

The New FamilySearch Tree (NFS) has a simple citation template for living memory sources and another for all other sources. As with Ancestry.com, having only one template forces the vendor to favor published sources. The template looks like this:

New FamilySearch Tree source templateNote that there is no field for a link. Also notice that the transcription field, labeled “Actual Text,” is appallingly small. While Ancestry.com has the annoying practice of ignoring line breaks when displaying transcriptions, FamilySearch takes the annoyance to the extreme. NFS runs together the entire citation template. As a result, the citation above is displayed in this incomprehensible format:

NFS Source Display I’m out of time. I’m back to genealogy the rest of the week. Next week I’ll move on to the heart of evidence management, the green “Evidence Box.”