Wednesday, January 26, 2011

We Want Tech: Highlight Flagrantly Erroneous Pedigrees

At the 2010 NGS Conference GenTech Luncheon David Rencher presented “The Top 10 Areas Where Technology Can Still Make a Real Difference in Genealogy : Could You Please Hurry?” In “We Want Tech and We Want It Now” I am reviewing Rencher’s requests and the technologies already available. Today I look at:

# 7 Highlight Flagrantly Erroneous Pedigrees

There is a lot of garbage pedigrees floating around the Internet.

Some have subtle mistakes—conclusions that changed over time as more records were discovered. Someone figures out the mistake and publishes an article, proving a new conclusion. In an ideal world, a single, global, common tree would contain all the good conclusions and we would be fine-tuning conclusions such as this kind.

In the real world, the garbage floating around the Internet is pathetic: impossible timelines, inconceivable parent-child relationships, and unimaginable fusions of facts. Decades ago, PAF and other pedigree managers became capable of detecting most of the flagrant fallacies we see online.

David Rencher, FamilySearch Chief Genealogical Officer asked, “Why can’t websites do the same?”

Many websites host online pedigrees: RootsWeb, Ancestry.com, One Great Family, Geni.com, and the new FamilySearch tree (NFS), to name a few. Rencher recommended that these websites take action to highlight erroneous pedigrees:

RootsWeb World Connect flags trees with source citations Automatically evaluate source citations.

Personally, I find it immensely helpful just having RootsWeb World Connect’s icon indicating that source citations are present. I don’t bother looking at trees without the source icon.

Imagine how cool it would be if a website would intelligently assess the source quality!

Let users vote on the quality of the data.

Yahoo Answers Incorporates User RatingsThis is not ground-breaking technology. This is old hat. Dozens of websites outside the genealogy market already allow this. Why not genealogy sites?

Consider the example from Yahoo! Answers, to the left.

Note the “top contributor” badge underneath Ted Pack. So many people have agreed with Ted’s answers, that he’s earned a reputation. Yahoo displays that reputation along with his contributions.

Note that 118 people voted for Ted’s answer and 32 people voted for numbat’s. C-johnson awarded a 5-star rating and left a comment explaining her vote.

If I try to download a bad tree, WARN ME!

I think each tree displayed on the Internet ought to display a count of the number of pedigree errors and warnings present in the tree.

These suggestions seem straightforward to me. I think all of them could be implemented with today’s technology. OK, maybe not the automatic evaluation of source citations. But flagging their presence is within the capability of current technology.

Are these features you’d like to see on websites? What do you think? Leave a comment.

14 comments:

  1. I agree. I think what happens is that some of the beginner researchers are looking for a particular person and they find the name and area that fits and they get all excited but don't bother to verify whether it is the exact relative they are looking for. Which can cause major problems for others. I too always check and see if the trees have sources and if they don't then I proceed to another tree that has sources or I try to find the sources myself.

    ReplyDelete
  2. I can't really imagine how your suggestion of "votes" to build a contributor reputation will work. I am currently embroiled in a controversy about incorrect data in an online tree - the first communication I had from one of the negligent tree owners was "I have the same information as 32 other trees - what makes you think they are all wrong?"

    So ..... when those 32 people "vote" ... guess whose credibility factor is going to tank?

    ReplyDelete
  3. AI, a truly provocative presentation.

    I am afraid that Anonymous (January 26, 2011 9:30 AM) presented the crux: a vote need not be contingent upon actual evidence.

    You noted the positive feature of the Rootsweb WorldConnect tree results page: notation whether a source is given. However, most of these "sources" are lifted gedcom files, emails and websites. A few are quotations from books (a relative few book source citations give author, publication data and page number).

    Some other tree-hosting sites show whether there is a "source" to a particular tree person (Ancestry.com includes other trees as sources).

    The crux, however, is whether a conclusion is supported by evidence. I may be lacking in imagination, but do not see how a computer algorithm could be constructed that would evaluate whether germane evidence is contained in a source and whether it is accurately applied to matters of relationships and events.

    Some trees are partially accurate but then wander off into erroneous descendancies. Other trees contain mostly accurate conclusions, but their creators do not give the evidence supporting these conclusions.

    Such variation suggests that even in a 'voting' format some refinement would be needed. Would one vote as to existence of a person, the accuracy of their date and place of birth, identity of each parent and spouse, dates and places of marriage and death? or would voting be confined to the matter of adequacy of evidence for these factors?

    Say a site were to attract 10s of thousands of Accuracy Police. What would be their qualifications?

    I certainly would not want denizens or minions of Ancestry.com (organization that brought us a fictitious relationship between the British royal family and Vlad the Impaler) to play such a role.

    ReplyDelete
  4. Dear Readers,

    I overstated my confidence that current technology could implement all of Rencher's suggestions. I've updated the article to indicate that automatically evaluating source citations is not currently possible, but flagging the existence of sources is.

    -- The Insider

    ReplyDelete
  5. Dear Anonymous and Geolover,

    I agree that structured the wrong way, voting would not work. But I'm not willing to dismiss it. Community voting is working on eBay, Amazon, Yahoo! Answers, Alexa, Microsoft, MS-NBC, YouTube, Google, and just about every other site with user contributions.

    Given the right structure, I'm convinced a user community can converge on the best answer. "'We' are smarter than 'me'".

    -- The Insider

    ReplyDelete
  6. "Community voting is working ..."

    Ah, but those sites rely primarily on their users providing subjective responses - good, bad, fast, slow, like, hate, easy, hard - in other words, a Popularity Contest.

    Genealogy is not a contest.

    ReplyDelete
  7. AI, I do believe the response January 26, 2011 7:52 PM by Anonymous is sensible.

    Your statement, "I'm convinced a user community can converge on the best answer," leaves the notion of ~best~ to trolls, to the testosterone-poisoned, to those who changeback newFamilySearch Tree items that had been corrected in accord with evidence, and to those who do not understand the nature of records.

    In the last instance, perhaps you have seen the message board posts complaining that birth or death records do not give grandma's *true* name, and the posters want to know how to change them.

    You are not aiming at ~voting~ on specific facts?

    How would voting on particular whole trees be illuminating, given the variety of combinations of accuracy and fantasy that are in nearly all of them?

    If you mean that voting would be aimed only at pinpointing trees that are pure trash, consider that there may be relatively few of these. And there may be viewers who would "like" (say) the number of undocumentable single-named Central Asian entities of the first half-millennium CE.

    For all the rest, the problems are in the particularities of evidence.

    How would you vote on one tree I know of where the most recent 4 or 5 generations are pretty sound, but in one ancestral line the creator omitted several actual children of one couple (these children peskily did not fit the list in a will), omitted birthdate of a target child of that couple, jumped her parents from Charleston, MA in order to fit the target into a Delaware family and somehow also attributed a 1790 MD enumeration to the purported father? Oh, and sources are given for many of the juxtaposed fragments, except that at least one of the sources does not say what the treebie states it says . . . and no evidence whatever is adduced to cement the superficially plausible jumble.

    One would have to do appreciable research to detect the dishonest presentation of these 2 purported generations. Such muddles are not penetrable by a quick look.

    ReplyDelete
  8. Ancestry,com is about profits. If it really cared about accuracy, it would insist that the sources be cited before letting folks vomit their pedigrees onto their pages. But that would work against their goal of corraling evry human being on the planet as a subscriber. I use sourse citation for evrything, and most of those sources I digitize, if I can. What amazes me is how people claiming to conduct "genealogic" research just use other's data prima facia and never try to verify it. Amazing!
    It works for companies like Ancestry.com, but not in real research, genealogic or otherwise.

    ReplyDelete
  9. Interesting post and so on point.

    ReplyDelete
  10. Ancestry.Com could be locked in a loop. Most of the family trees I find there are sourced -- to other family trees on Ancestry.Com! We know it is all junk sources. The experts at Ancestry know they are junk sources, so Ancestry would be in the position of admitting most of their trees are junk sources, or coming to allow another type of source, call it "Outside (non-Ancestry) Source." But even that requires a partial admission that their family trees are mostly junk sources. I don't see it happening. Even if it is the right thing AND the best thing.

    ReplyDelete
  11. Absolutely, I'd want to see these things happening! I find erroneous data all the time in my tree on Ancestry.com, and no one else has thought to question them. I've done research for my questions, and if I don't find anything, those particular "ancestors" are deleted from my tree. For instance, one of my ancestors was born in 1580, but his mother died in 1575.

    When I first started, I accepted all the hints from other people's trees, without much question. Now, I make sure it all adds up, and I research it when it doesn't.

    ReplyDelete
  12. I also agree and I agree with tcollins who said;
    "they find the name and area that fits and they get all excited but don't bother to verify whether it is the exact relative they are looking for. Which can cause major problems for others."
    The problem happens as a rule not as as exception and then we are left to verify(or not) masses of "ancesters" who just don't check out. As you know it is very time consuming and it becomes labor intense. I spent about 9 hours yesterday undoing almost an intire tree wading through non exsistant family ties. I don't know how to explain that to most people because they trust Ancestry are prone to hostility.
    I hope for red flags or flashing signs that say "NO NO NO" to replace the cute but problamatic green leaves. Yes a highlight, please!
    I would be concerned about the voting though for the same reason Anonymous 26 Jan. 2011 09:30 gave;
    "So ..... when those 32 people "vote" ... guess whose credibility factor is going to tank?"

    ReplyDelete
  13. When confronted with 15 trees that all contain the same flawed information, I'd like to be able to tell which tree was uploaded first, to get an idea of which tree was copied by the other 14 people. Then, when I have a question about something, I can skip the 14, because they won't have a clue, and go to the one that just might be responsible for the (perceived) error and have them walk me through their process to determine if the data can be indeed verified or disputed/corrected.

    ReplyDelete
  14. Um, yes. Of course. I have often thought this myself, but you articulated it better than I would have & offered several great suggestions that I hadn't thought of yet. Thanks!

    ReplyDelete