Wednesday, August 26, 2015

FamilySearch Should Increase Indexing Efficiency and Utilize Partnerships

Jake Gehring presenting at the 2015 BYU Conference on Family History and GenealogyFamilySearch is not keeping up with indexing the records it digitizes and improvements in three ways could help fix this, according to FamilySearch director of content development, Jake Gehring. Yesterday I presented the first part of my remarks about his presentation at the 2015 BYU Conference on Family History and Genealogy (#BYUFHGC). Today I’ll present the second part, covering the first two of the three ways, increasing efficiency and partnering. Tomorrow I’ll present the third way, increased use of computerization.

Today’s FamilySearch Indexing (FSI) system is somewhat inefficient. FSI primarily utilizes a double-blind indexing methodology, sometimes described as A+B+arbitrate. Two indexers independently index a batch of records. If there are any differences, even one letter in one record, the entire batch is sent to a third person to arbitrate between the two values, or supply a value of their own. It turns out that 97% of all batches have at least one difference, even though what is keyed is the same for 70% of the fields. As a result, almost all records are looked at by three people. There’s a good argument that that is wasteful. For certain kinds of records and certain kinds of people [and certain kinds of fields, I might add], only one keyer is sufficient. The accuracy doesn’t get any better when involving two more people. FamilySearch has recently switched to single keying for newspapers in the last year since reading typeset material can usually be done without error. You wouldn’t want to do this for certain types of records or for beginning indexers.

A more efficient methodology is referred to as A+review. One person keys the information and a second person reviews what is keyed. All the reviewer does is indicate whether the information is correct or not. This could easily be done, even on a cell phone. This method is about 40% more efficient than the double-blind methodology because FamilySearch knows when a record needs to be keyed a second time. FamilySearch is actively working on this kind of methodology to increase the efficiency of indexing.

Jake showed three, entirely new, experimental types of indexing. Some do not even have working prototypes: keyboardless indexing, free-form indexing, and casual “micro-indexing.”

Jake showed an indexing system that allows productive use of devices without keyboards, such as smart phones. If you’ve used photo recognition in Photoshop, you have seen the paradigm before. He showed a slide showing 12 snippets of a name, such as “Henry.” (See my version, below.) These had been read from documents by a computerized handwriting recognition system. But since computers aren’t too good at reading handwriting, it presents its results to a person for verification. The person marks any that the computer got wrong. Where the computer had a good second guess, it could present that as well, allowing the person to select an alternate name, such as “Kerry.” For pre-printed forms, this works great and allows easy indexing on devices without keyboards, such as cell phones.

Snippet of name indexed as Henry

Shippet of a name that was indexed as Henry or Kerry Snippet of name indexed as Kerry
Snippet of a name that was indexed as Kerry Snippet from a page wherein one name was indexed as Kerry Snippet of a name that was indexed as Kerry
Snippet of a name that was indexed as Kerry Snippet from a page wherein one name was indexed as Kerry Snippet from a page wherein one name was indexed as Kerry

Snippet of a name that was indexed as Kerry

Snippet of name indexed as Henry Snippet of a name indexed as Kerry

Jake showed the FamilySearch Pilot Tool, another indexing system for free-form indexing. It is currently live, as a pilot. A large portion of the screen is a browser showing a record on FamilySearch.org. Along the right side is a pane where an indexer can enter names, dates, and places extracted from the document. (See the screen shot, below.) A person would use the tool to index any record that they care about and a short time later the record would be searchable. You wouldn’t have to ask for anyone’s permission. You wouldn’t have to index all the names. Anyone could take any collection desired and do some indexing. This tool is in pilot right now. FamilySearch is very interested in tools that let you index as you go. To join the pilot, send Jake an email. (I see someone has also posted the link online. See “FamilySearch Pilots Web-Based Indexing Extension” on the Tennessee GenWeb website.) There is no arbitration. If you care enough to index the image, you probably care enough to be accurate. But that supposition is something yet to be validated.

The FamilySearch Pilot Tool for indexing - Click to englarge

“Micro-indexing” could be used to make images more usable. It would be nice to be able to browse unindexed images easier. FamilySearch is very interested in an upgrade to the current browse experience. Jake showed an animated artist’s rendition of a tool, reminding us that this is just a research and development idea.

FamilySearch is interested in making it easier to find records in images that have not yet been indexed.

In micro-indexing the system might ask you really simple questions, like, “What kind of record is this?” and have you click the record type. By asking volunteers to do tiny tasks, FamilySearch might be able to gather information to make browsing images easier to find my record type, locality, and time. Just because FamilySearch doesn’t have the time to index the images, doesn’t mean they can’t be made easy to browse.

This is a mock-up of what a micro-indexing tool might look like.

In addition to talking about increasing the efficiency of indexing, Jake talked about partnering. FamilySearch is fine with the concept of trading data with other companies. FamilySearch provides images and the partner creates indexes. They may even get exclusive use of the indexes for awhile. For example, a lot of Mexico church and civil records are being indexed right now by Ancestry.com. We all get the value of it eventually. FamilySearch has similar projects going on with Findmypast (I didn’t catch the projects names) and MyHeritage (Danish census and church records, and Swedish household names). This increases the rate of indexing by bringing more indexers to the table.

4 comments:

  1. Something very similar to the free-form indexing is already very successfully used to improve upon the OCR of Australian newspapers on the Trove site. Many of us make a habit of correcting the text of the sections before and after as well as the one of interest. It is not a big step to imagine being able to record names and dates, etc in some sort of simple form instead.

    ReplyDelete
  2. Great article. Ever since seeing Argus Search at Rootstech2015 I've wondered what level of software FS uses.

    ReplyDelete
  3. Does anyone know when Family Search will digitize more of it's microfilms. I have hundreds of relatives from Sant' Angelo dei lombardi, Italy. Family Search put a few of that regions films online, not indexed. But the rest of the films for earlier dates have not been put on line. While indexing is nice to have, my main concern is that they put the rest of Sant' Angelo dei Lombardi's films online. Many of us cannot access a family history center. The films are just sitting there...Please Family Search put the films online.

    ReplyDelete
  4. I like the idea of the micro indexing... "What kind of record is this?" sort of thing. I'm not to crazy about the Photo recognition snippets unless there is a way to expand the document. Often, it's hard to understand what letters are used unless you look at a variety of words that the original writer wrote. To look at a snippet out of context could lead to more problems then it solves.

    Thanks for providing the inside scoop.

    ReplyDelete