Friday, July 17, 2009

Top Secret FamilySearch Project

You may have been able to tell from my last two articles (here and here) that I loved Ron Tanner’s presentation last Saturday. But I’ve saved the best for last. Keep in mind Tanner’s humorous presentation style, wrapping serious information in mock-seriousness. (While it plays extremely well live, it can be misinterpreted in written form.)

So when I tell you that Tanner laughed at us immediately after asking if any of us thought New FamilySearch improved genealogy, I don’t want you to write and complain to your son’s grand-father-in-law, the General Authority (high official) at the Church of Jesus Christ of Latter-day Saints, sponsor of FamilySearch. You’ll get both me and Tanner in trouble.

Conclusion Model

In mock-comedic seriousness, Tanner warned us that the proposal he was going to share with us was all theoretical. These were ideas being tossed around. He not only invited us to share are reactions with him, he passed out 3 x 5 cards, invited us to write down our feedback, and at the end of the presentation he conscientiously collected them.

To set the stage, Tanner asked us what might be the capabilities of an ideal family tree system. He suggested we would ask for (contrast these with NFS, if you will),

  • The ability to easily correct information
  • The ability to prove conclusions are accurate with source references and images
  • Invite greater peer review and collaboration
  • Allow for the evolution of a combined human family pedigree

Tanner then proposed ideas that sounded like genius to me. Sure, he was pretty much quoting ideas I published last year, but hey, I recognize genius when I see it.

Tanner announced that we would be the first sentient beings outside FamilySearch to hear the top-secret, internal project name for this proposal. He mockingly made us raise our hands and swore us all to secrecy. Then he revealed the internal code name:

Source Centric, Open Edit (SCOE) model

Hmmm… At least I think he was joking about that solemn oath of secrecy…

(Oops.)

SCOE proposes a new human pedigree that encompasses all the principles that Wikipedia uses to make it such a successful, massively collaborative project. SCOE will be freely editable by any registered user. That’s the “open edit” portion of the SCOE moniker.

And it will establish a community where sources, polite discussion, and dispute resolution procedures lead conclusions to converge towards best possible values as judged by genealogical community standards, backed by proper evidence. That’s the “source centric” part of SCOE.

Wikipedia

I know some of you are anti-Wikipedia ideologues. Tanner had at least one attending his presentation. While academia has been loathe to give any nod to a publication that favors “consensus over credentials” (see “Wikipedia”), Tanner shared a 2006 review of Wikipedia by Library Journal. The review concluded that,

While there are still reasons to proceed with caution when using a resource that takes pride in limited professional management, many encouraging signs suggest that (at least for now) Wikipedia may be granted the librarian’s seal of approval.

In its short lifetime, Wikipedia has racked up some awfully amazing stats:

  • Launched just 8 years ago, on 10 January 2001
  • 13 million articles among 262 languages
  • Through Jan 2007 the number of articles doubled every year
  • Thereafter, about 1600 articles were added every day
  • The current size is the equivalent of 784 book volumes

English Wikipedia Article Count Graph

As to the accuracy of Wikipedia, Tanner said,

I did an experiment to see how quickly incorrect information was detected and corrected on Wikipedia. I vandalized… er… I mean, some unnamed individual went out and introduced wrong information on the solar system page. The page was restored to the prior version within 27 seconds.

(I hate to tell you this, Ron, but as you yourself mentioned in your presentation, another precept of Wikipedia is that changes are all logged so that undesirable members of the community can be held accountable. The Wikipedia solar system history page shows every change that has ever been made to the article.)

As a second test, the dragon page was modified. The thinking was that the solar system article might be too mainstream, but a less popular page might be watched less carefully. Not only was the dragon page restored in 26 seconds, but a helpful message was included, discouraging addition of information that cannot be substantiated.

User Testing

FamilySearch tested a basic prototype of an openly-editable tree system with a group of people from a broad range of genealogy and Internet experience. FamilySearch discovered that users can become comfortable with such a system if

  • sources are used as evidence,
  • they can see who changed the data and why they changed it,
  • they are able to contact those making changes,
  • they can optionally be notified when changes occur,
  • they can hook reliable and verified sources to their data, and
  • sources are protected and can be modified only by the contributor.

“We’re thinking about having theories, with reasoning, and one theory can be set as a conclusion,” said Tanner. “We’re looking at making the system more consistent with genealogical best practices.”

Tanner said that a Source Centric Open Edit system must:

1. Provide a genealogy-structured interface rather than the free-style format of a traditional wiki system

2. Prevent non-registered users from modifying data

Registered users have to provide contact information. We’ll verify e-mail addresses by sending them a message as part of the registration process.

3. Allow restoration of previous values after data is changed

Each person page would have an associated history page that logged and captured every change, as with the history page for the Wikipedia solar system page. Like Wikipedia, it must be simple for a user to restore a previous version of the page.

4. Separate evidences (sources) from conclusions

I applaud FamilySearch product managers for including this extremely important concept. The necessity seems to elude product managers during their first years of operation in the genealogy industry. Picking on FamilySearch for the moment, witness the treatment of sources during the past 40 years:

  • After 1969, temple submissions were no longer rejected without sources
  • Contributor information was not keyed into electronic systems from submissions
  • Ancestral File was stripped of sources prior to incorporation into the database
  • TempleReady submissions did not require sources and if any were present in the PAF file, the sources were excluded from the submission file
  • Sources in Pedigree Resource File submissions are removed before the data is presented online
  • New FamilySearch (NFS) excludes source information from ordinance data, even when the IGI lists sources that may contain additional data
  • Much of the data in NFS shares a general, common source: “Temple Records, March 2007”
  • Source handling in NFS is widely regarded as inadequate if not counter-productive

5. Notification and collaboration features

As we’ve seen from Wikipedia, these features are critical to facilitate friendly discussion and avoid inadvertent errors or vicious vandalism (not viscous, as my spell checker suggested).

6. Protection of classified or sensitive genealogical information

Administrators need to be able to lock a page from view or edit. This is needed as part of dispute mediation. It’s needed for groups such as the Japanese burakumin, for specific individuals, medieval families, and famous or infamous persons.

7. Mediation of conflicts between contributors.

The system needs moderation or dispute resolution. This would be modeled after Wikipedia’s dispute resolution process.

Conclusions

In conclusion, Tanner reminded us that these are just ideas being thrown around. He asked us to give him our suggestions, as they’re still thinking this all through. He had us think back to the beginning of the presentation, when an open edit system sounded insane. He said, that while we might not be totally convinced, he wanted us to sit on it.

“Now aren’t you thinking that maybe this could work?” he asked. “This is still just research. We may not do any of this.”

Q and A

Tanner had a couple of minutes to answer questions, and attendees didn’t hold back. (These are not quotes.)

Q. Won’t vandalism be a big problem?
A. Sure, It’s a problem in all open systems. We’ll need community administrators. Wikipedia has 2,000 volunteer administrators.

Q. What do the General Authorities think about this proposal?
A. I’ve talked this over with Elder Maynes and Elder Sybrowsky. Their biggest concern is over inappropriate images.

Q. How long will temple reservations last?
A. Forever. Forever. Seriously, we realize this needs to be looked at.

Q. When will SCOE happen?
A. I have preliminary approval to proceed with this project. If I were King, I would have the beginnings of this start in February or May. Maybe the first step will be the addition of discussion pages. The full treatment is a year or two away. I’m trying to get the features added incrementally.

For more information, see Ron’s PowerPoint presentation from the March 2008 BYU Family History and Genealogy Technology Workshop.

6 comments:

  1. It sounds like they are going to set up their own version of WeRelate.

    ReplyDelete
  2. Insider,

    I admire the intentions behind this project. Source and evidence based genealogy and with an arbitration process. Unfortunately I believe the scope of this project is both too broad to be completed properly as a system, or to be executed in practice, especially as to arbitration.

    While perhaps your description is not adequately describing the system, and though is on the right theoretical track, it still seems lacking from the perspective found in Elizabeth Shown Mills' book Evidence and the followup, Evidence Explained, and other professional level books and articles.

    The process and parts of same are given in the following excerpt from the APG list archives found here in one of Ms. Mills' posts:
    http://newsarch.rootsweb.com/th/read/APG/2004-08/1092795113


    "Basic Principle:
    SOURCES give us INFORMATION
    from which we select EVIDENCE.

    SOURCES --> INFORMATION --> EVIDENCE
    are .... is ... is ...
    Original Primary Direct
    Derivative Secondary Indirect

    All of these go through the
    EVALUATION PROCESS
    to produce ...
    "PROOF."

    That is an older discussion, and a fuller one can be found in EE p. 24ff.

    So going off of a chain of modern vital records, or a little bit earlier before such records using clear statements of relationship in estate documents in absence of actual/potential conflicts, then one might be able to craft a fairly short proof argument as imagined in the NSF system. But take the time period back in American genealogy before 1850, then short proof arguments will rarely suffice in the Southern states without very good estate records, though one may still be OK in New England.

    Southern and frontier genealogy, especially in the time period from a little before the Revolution to 1830, is especially difficult, even without the often found poor record keeping or subsequent record loss. There is where brickwalls are found in abundance, including my own. It requires the study and analysis of large family and neighborhood units and all common and allied surnames in the area (cluster genealogy). And if success is found in proving another generation, it usually is by correlation of multiple pieces of indirect evidence producing a sound, albeit circumstantial conclusion. And that proof won't look like a 2 or 3 paragraph proof argument, but rather an article in the NGSQ/TAG/NEHGS/etc.

    The vast majority of persons, either Mormon or non-Mormon, who might use such a new NFS wiki type of family tree, will not be able to produce such an analysis. And without plagiarizing and possibly violating copyright, they also won't be able to dump a long article in the proof section, even if there is space for same. And if such article length arguments are produced by someone (or merely cited as in where a person donates an article to BYU and it shows up in the FHL catalog), and there are disagreements, who exactly will be qualified to referee such a dispute that even competent professionals might disagree on? Only professionals and less than 5% of "amateur genealogists". And the question is not just who, but how many such qualified arbitrators do you think NFS can come up with?

    I love the concept, both here and as envisioned elsewhere by other organizations (like Ancestry's ill-fated and ill-executed OneWorldTree). But it is too complex to execute and most professionals or highly competent amateurs who can produce valid conclusions from exhaustive study and analysis of original sources are not going to be willing to repeatedly waste time arguing with those who lack such skills. I seem to recall that the church only allows experts to submit ordinances and make trees etc. in medieval genealogy. But the fact is that such a bar should also exist between medieval and 1850 or so (maybe even 1900). Weekend internetologists who copy and recopy each other's "work" simply are not going to contribute anything other than further iterations of wrong assertions that they already do.

    MikeF

    P.S. Was that too pessimistic? :)

    ReplyDelete
  3. ditto to "their own version of WeRelate." The similarity is astounding.

    ReplyDelete
  4. I have discovered that no matter how much evidence I provide to disprove the Indian Princess in our family tree, there are those cousins who will continue to cling to their illusions because it makes them feel good regardless of the mounting pile of evidence to the contrary.
    Marilyn

    ReplyDelete
  5. Applause and dittos for MikeF's last 3 paragraphs, and !!! emphasis on his 2nd one.

    The basic database for NFS is quite contaminated with non-evidenced genealogical errors. One could wonder how many years it would take such a Project to clean up most of them, given MikeF's points.

    ReplyDelete
  6. I don't think there's any doubt that conjecture could be taken as fact. And facts could be over shadowed by bad sources. But in my opinion, a universal tree where anyone can contribute, debate and scrutinize its content is the future of genealogy. It will never be perfect or without error (just like it's users). But it would be more accurate than most data bases where there is no real scrutiny. Once people start get over the sense of ownership of information, we should see something very interesting.

    ReplyDelete