Friday, May 13, 2011

To Infinity and Beyond

Craig Miller at NGS 2011 Conference “For the Family Tree, the whole effort this next year is to be able to make it public,” said Craig Miller, “and to be able to allow people to preserve their information and to share it in a collaborative way.” Miller made the remarks in his session “FamilySearch 2011 and Beyond” at the 2011 NGS Conference.

Product Management

Craig Miller is Sr. Vice President of Product Management at FamilySearch. Miller explained what product managers do. “We analyze how people do family history. We watch what you are doing and why are you doing it.” Then they work with software engineers to build products that meet users’ needs.

Miller explained why FamilySearch.org may not be as easy to use as you would like. “At FamilySearch over the past few years we have been developing things out in the open. I apologize for that, but it provides really value information. I apologize, but not really.”

“Some of the things we do may prove confusing,” he said. “I’m going to try to explain a little of that today.”

Search Changes

Miller started with the recent changes in Search, which have drawn lots (and lots and lots) of criticism. “We really goofed up,” he said. He noted that in the Record Search Pilot, filtering to Utah or Wisconsin took just a couple of clicks. He counted on the latest release and found that today it takes 40!

FamilySearch can’t go to the local store and buy a search engine that serves the needs of genealogy. What is available out there is technology like unto Google. Miller pointed out that genealogists need to be able to search and match name variants, time ranges, geographies, and various events.

FamilySearch spent 4.5 years customizing the search engine used in Record Search, only to find that it could not scale. That means, as the planned thousands of record collections were added, the number of computer servers needed was growing explosively.

Fortunately, FamilySearch located another search engine that was twenty times better. Unfortunately, FamilySearch now has to re-customize the search engine with all the special capabilities of the Record Search engine.

Miller assured me after his presentation that improving search was one of the top two points he would like me to take away from his presentation. Over the next several months FamilySearch will be working on this. “We’re revamping it and constantly refining it and adding the features that are needed.”

Family Tree

The biggest point was FamilySearch plans for Family Tree. “We’re trying to preserve the heritage of mankind through a common family tree.” And they are working hard to make it available to the public.

Here again FamilySearch made some bad decisions. Miller said that at the outset of the FamilySearch Family Tree (nFS) project, they failed to appreciate how extreme the problem of duplication would be. When FamilySearch loaded all their separate trees into the common tree, they had 1,000 variations of John Lathrup. (Miller didn’t tell us the total that existed prior to collapsing exact duplicates.)

Didn’t FamilySearch learn that sort of thing when they built Ancestral File? (Search for John Lathrop of Massachusetts in Ancestral File. Look at the variants in the list. Look at the number of submitters for each variation…) I’ve spoken before about the need for product managers to preserve “corporate memory” of things learned in the past. But I digress…

Darn it. Even a little math would tell you… Oops. I’m still digressing…

(As long as I’m hopelessly digressing, may I just say that I don’t understand how anyone in the south can weigh less than 400 pounds? Wow! The food here is so good. And Goo Goo Clusters!? Mmmmm…)

Miller went through the flaws of the Family Tree to demonstrate why it wasn’t ready for the public. As I had guessed they would, FamilySearch blamed the architecture (conclusion model versus wiki model) rather than the data (garbage in, garbage out). To be fair, there were (are) some design problems:

  • Redundancy – I believe it’s now public knowledge that when viewing a person in the tree, every redundant was explicitly and simultaneously loaded into a computer’s memory, crashing that computer. As you crash more and more of the computers (servers) that constitute a website, the website gets  s l o w e r   a n d    s   l    o     w      e       r……………
  • Disputes – Miller showed the infamous dispute symbol. Whose idea was it to create a feature that informed the contributor that they had a fact wrong and simultaneously locked the fact so no one could fix it?
  • Source citations – Since the tree doesn’t have evidence management, sources had to be multiplied, duplicated, and redundantly replicated across every event on every person on every duplicate associated with that source. Miller showed one example list of sources that went on for 26 pages. (Don’t even get me started at the substandard nature of the citations…)
  • Errors could only be fixed by contributors, most of which were unidentified, sometimes even dead. He didn’t mention it, but no one could fix gender errors except for Salt Lake.

Miller reviewed the wiki architecture that FamilySearch is moving to. I’ve covered it before, so won’t repeat it here. Miller presented four goals they have for the Tree this year:

  1. Provide the ability to correct and improve data.
  2. Integrate the Tree (currently at new.familysearch.org) with www.familysearch.org .
  3. Improve site navigation.
  4. Create meaningful sources.

Miller thought that if this all goes well, they’ll have the tree rolled out to the general public by next spring.

Stay tuned… (but don’t hold your breath…)

4 comments:

  1. I'm a product manager and lead a team of product managers. Go Product Management. Have fun building solutions for us! Thank you for all the hard work figuring out what problem we need solved and then solving it!

    ReplyDelete
  2. Sadly, no amount of tinkering with architecture, even by the most genealogically savvy developers, can fix the fundamentally GooGooCluster database (full of repetitive lumps, laden with submitters' sweet dreams, and prone to fall apart if approached too closely).

    A wiki approach would need a big red unrestricted-use "Delete" button, a big green unrestricted "Disconnect" button, unrestricted ability to merge duplicate individuals and a team of, say, 15,000 volunteers on board at all times (for 24/7 that means at least 50,000) to monitor that all new entries are supported by correctly cited evidence, and that the evidence is correctly applied to the case.

    ReplyDelete
  3. I tried making this post a few days ago, but apparently it was during the time that the posts weren't being received so i'm trying again in the hopes that somebody can explain this to me. I know that there has been a lot of disgruntlement about the changes to the familysearch site and I'll admit I'm not the most technologically savvy person, but I don't get it. I just did a presentation on navigating the site and in preparing to do that I visited the blog there (and here) nearly every day for over 3 weeks. I tried to reconstruction the problems that were mentioned (although admittedly that was more difficult when the comment consisted of "change it back," "I hate it," and "why did you guys mess this up?", etc. However, the ones that I could reconstruct I did. I must be missing something because I had problems reconstructing the problems. Maybe they were being addressed as soon as the feedback was posted or something. But here's the one that really surprised and baffled me (and it came from this very post and was attributed to Miller): Miller started with the recent changes in Search, which have drawn lots (and lots and lots) of criticism. “We really goofed up,” he said. He noted that in the Record Search Pilot, filtering to Utah or Wisconsin took just a couple of clicks. He counted on the latest release and found that today it takes 40!"

    I went immediately to the site. Operating on the assumption that he was talking about reaching the Utah databases that were available I clicked on the "Canada, United States and Mexico" link on the home page under the browse location feature. I clicked on the "United States" filter to narrow the results further. And then scrolled down the resulting list to the Utah databases. By my count that's only three clicks. I went back to the home page and just searched for "Utah" -- admittedly there were several hundred thousand hits since it brought up every record that had a Utah connection, but again, it took no time to filter down to specific Utah databases. What am I missing? And, no, for those who may read this and wonder, I am not new to internet genealogy by any means.

    I'm not trying to start anything. I just really am trying to understand the difficulties that everyone else seems to be experiencing. Especially since it looks as if I'll be doing more presentation of this type and I would like to be able to intelligently answer questions that arise.

    ReplyDelete
  4. Dear Cathleen,

    I believe it is for searches such as John Smith, 1920.

    You're right. Most complaints do not include enough information to reproduce the behavior. It is impossible for product managers to change the behavior of the website when they can't see at least one example.

    -- The Insider

    ReplyDelete