Tuesday, August 25, 2015

FamilySearch Indexing Not Keeping Up – #BYUFHGC

Jake Gehring presenting at the 2015 BYU Conference on Family History and Genealogy“FamilySearch just isn’t indexing records fast enough,” said Jake Gehring. “If that is the case,…then what do we do about it?” Jake is director of content development for FamilySearch and presented at the BYU Conference on Family History and Genealogy last month. Jake’s presentation was titled “FamilySearch Indexing, Robo-keying, and Partnering, Oh My!”

In the last little while Jake has been involved in some research and development, which is really rewarding. It’s fun to work on some things that may become real someday. He emphasized to me that these things may never become real, so keep that in mind as you read.

Back in the old days an index was that thing in the back of the book, not some multi-billion name index you can search from your home. “We index records so that they get used more,” he said. We gather records for the same purpose and have been doing so since 1938, he said. FamilySearch has about 280 cameras, roughly 40 in the United States and the rest abroad.

There have been huge improvements in the technology for capturing records and making them available. Jake showed an example, a Weber County, Utah marriage license. It is one of the rare collections that FamilySearch has captured twice. A scan from microfilm looks like this:

Weber County, Utah marriage license scanned from microfilm

FamilySearch went back recently and captured the records digitally, in color.

A Weber County, Utah marriage license that was digitized in color

Granted, viewing a record scanned from microfilm is often less clear than viewing it on a microfilm reader, but you can see the huge improvement.

FamilySearch does things so the captured images are easier to use. One of the things they have done from early on with books and microfilm was catalog them. A catalog entry can specify locations, authors, subjects, and so forth. For family histories, they might put in a list of surnames, but that was about it. You had to know what you were looking for to find the records you needed.

When you think about what we do now, things are quite a bit different. Indexes contain full names and direct you to individual images. We index (as FamilySearch calls extracting) more than names. We also capture dates and places and relationships. By doing this, not only can you search for them, but FamilySearch can recommend records to you. FamilySearch calls these hints.

There is a range of things that FamilySearch can do to make records more accessible. Some can be done with less cost than others. Jake showed a diagram showing treatments that can be made to a collection. With added accessibility comes added cost. Here is my version of his diagram, including my own definitions:

Increasing the usability of a digitized genealogy record increases the cost of publishing it.

Definitions:

  • catalog entry: a single entry for an entire collection
  • film notes: individual notes for each film
  • light waypointing: dividing the images of an entire collection into a few groups containing a large number of images
  • heavy waypointing: dividing the images into more specific groups with fewer images
  • light indexing: extracting a few, basic pieces of information from an image, perhaps just a name or a name and date
  • heavy indexing: extracting most genealogically significant information
  • lineage-linkage: using the record extracts to reconstruct families with links between parents, spouses, and children

Resources are limited. The more work invested in collections, the easier it is to use them, but the number of collections that can be published decreases. “The truth is that it is quite expensive to make collections very, conveniently searchable,” Jake said. “But it is still worth doing. In fact, we want to do it faster.”

The Church of Jesus Christ of Latter-day Saints, FamilySearch owner, has been indexing for a long time in one way or another.

  • 1922 – Church employees started extracting information for the TIB, an early predecessor of the IGI. [I added this bullet point.]
  • 1961 – Church employees started extracted names from historical records at Church headquarters.
  • 1977 – Church members started extracting records at Church buildings via the stake records extraction program.
  • 1986 – Church members began the family records extraction program (FREP) which used data entry by members using home computers.
  • 1994 – The stake and family record extraction programs were consolidated.
  • 2006 – FamilySearch began using the current FamilySearch Indexing tool, utilizing Church members and the general public.

Jake showed the current application, FamilySearch Indexing. He then showed the new, browser-based tool that is now being rolled out. The tool allows the data entry pane to be positioned in various places, such as the left of the screen or the top. One data entry mode, when field positions are well defined, allows data entry overtop of the image itself.

Jake showed statistics of the number of indexing volunteers since 2006.

FamilySearch Indexing Volunteers, by Year

This graph should be encouraging to everyone. Compare this to the number of people—probably 15,000—indexing the 1880 census some 20 to 25 years ago. The big explosion in indexers in 2012 was because of the 1940 census. This year FamilySearch is on track to have more volunteers than for the 1940 census project.

“It is a wonderful, exciting program to be a part of. You have the satisfaction of knowing that you and 350,000 of your closest friends are all working together to make documents more usable,” Jake said.

He compared the indexing project of the 1880 census to that of the 1940.

1880 US Census Index

1940 US Census Index

Index only

Index and images

56 CD-ROMs

Web

50 million records

132 million records

17 years to complete

150 days

But this amount of indexing is not good enough.

“Do the math,” Jake said. FamilySearch captures about 150 million images in the field each year. FamilySearch is also scanning the microfilm out of the Granite Mountain Record Vault. This year FamilySearch expects to scan about 300 million images. On average there are four to five records per image. That amounts to about 2 billion records digitized just this year. And Jake expects to have the same amount next year. But we are only indexing about 250 million per year. That’s only 12% of the records “brought in the door.”

“Now do you see why I say, we’re not going fast enough?”

Additionally, FamilySearch is trying to increase the number of cameras. Then they will be even more in the hole. Since 90% of indexing is English, the situation is far worse for other non-English records.

“If we want genealogy records to be more helpful more quickly to more people, we need to look at other ways of indexing,” Jake said. He spoke about three ways that might accelerate the number of indexed records: efficiency, collaboration, and computerized assistance.

Tomorrow I’ll report on what Jake said about increasing efficiency and using collaboration. Thursday I’ll finish reporting about his presentation.

6 comments:

  1. Obviously Family Search is going to need help from non LDS members - but it is discouraging to index for free and then go back and see that all the images are on Ancestry or FMP and you have to pay to see them. I realize that these are not the same files , but the source says they are from an LDS film so one gets the feeling that you are indexing for Ancestry's benefit.

    ReplyDelete
    Replies
    1. Marie,

      I mention it briefly in Wednesday's article, but Jake said that FamilySearch trades data with partners. That means sometimes you will find on Ancestry.com the very keystrokes that you typed. In exchange, you get something back. You get (or will get after an "embargo" delay) access for free on FamilySearch.org to data that previously required an Ancestry.com subscription. It's like a two-for-one sale. For every one name you index, FamilySearch.org users get free access to two names.

      Delete
    2. Non-LDS users who do not subscribe to ancestry.com do not get free access to ancestry-owned images through family search.org . That arrangement was negotiated for people who sign in with LDS Church member userids. Granted, there is much benefit for everyone in the increased access to index data when non-LDS people contribute to the indexing effort.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. FamilySearch currently has many non-LDS indexers. There are many genealogical societies (the vast majority of which have few members of the LDS Church) that regularly index projects. Indexing in general is not well-advertised outside of genealogical circles. It's not as universal of a thing as using the records that the indices help populate for consumption.

    ReplyDelete
  4. i bet there would be a lot more indexing if they let people choose which films to index rather than assigning them films at random. People will index a film just so they can get the information they need.

    ReplyDelete