Tuesday, October 8, 2013

Ancestry.com Adds Related Content Pane to Image Viewer

Ancestry.com has added a new feature to their image viewer. Click Related Content to show a pane along the right side of the window that shows records that might pertain to the person named in the record you are viewing.

Ancestry.com's related content pane in the image viewer

The pane also shows users who have linked the record to a person in their tree, along with links to their profiles and links to their trees.

I’ve never heard Ancestry.com explain how they determine related records. The safest method, when the record has been attached to a person in a tree, as in the example above, would be to show other records attached to that person. Ancestry.com doesn’t seem to be doing that. The example above shows the record of Stephen J. Sullivan in the 1940 census. Note that the first suggested, related record is for Semion S Sullivan. It is unlikely that Stephen was indexed twice in the 1940 census. That he was indexed once as Stephen and once as Semion, is extremely unlikely.

I conclude that Ancestry.com is using a machine-matching algorithm, rather than depending on human created relationships. I dislike machine-matching algorithms—a lot. Remember FamilySearch’s Ancestral File? Remember Ancestry.com’s One World Tree?

Machine matching has its place. I support its use as a means for suggesting to users possible matches that users then must manual review, accept, or reject. But as we’ve seen in Ancestry.com’s member trees and FamilySearch’s NFS and Family Tree, a few users accept matches without question, wrecking havoc.

Organizations must take all possible measures to discourage bad user behavior. First, matches should not be suggested unless they are almost certain.

Second, products should warn users and impede (as Ron Tanner would say) incorrect merges.

Third, , organizations should inform users of the probability that a given match is correct. Ancestry.com and FamilySearch.org have extensive datasets that could be used to calculate these.

  • What is the name frequency for the surname Smith?
  • What is the name frequency for given name John?
  • What percentage of the population is male?
  • What is the demographic probability that a person alive in 1940 was born about 1890?
  • What was the population of Salt Lake City in 1940?
  • How many John Smiths in Salt Lake City have been married to Elizabeth?
  • How many John and Elizabeth Smiths have children named John, James, Elizabeth, and Mary?
  • What percentage of adult males were employed in farming in 1940?
  • How many males from the state of Utah fought in WWII?

Mathematically combining these for two potentially matching records could yield a probability that the two people were the same person. That number should be presented to users so they understand the uncertainty of the match.

Then machine matching helps not only the newest subscriber, but the most seasoned researcher. Then machine matching makes sense. Until then, caveat emptor.

No comments:

Post a Comment