The Ancestry Insider: FamilySearch Indexing Not Keeping Up

Tuesday, August 25, 2015

FamilySearch Indexing Not Keeping Up – #BYUFHGC

Jake Gehring presenting at the 2015 BYU Conference on Family History and Genealogy “FamilySearch just isn’t indexing records fast enough,” said Jake Gehring. “If that is the case,…then what do we do about it?” Jake is director of content development for FamilySearch and presented at the BYU Conference on Family History and Genealogy last month. Jake’s presentation was titled “FamilySearch Indexing, Robo-keying, and Partnering, Oh My!”

In the last little while Jake has been involved in some research and development, which is really rewarding. It’s fun to work on some things that may become real someday. He emphasized to me that these things may never become real, so keep that in mind as you read.

Back in the old days an index was that thing in the back of the book, not some multi-billion name index you can search from your home. “We index records so that they get used more,” he said. We gather records for the same purpose and have been doing so since 1938, he said. FamilySearch has about 280 cameras, roughly 40 in the United States and the rest abroad.

There have been huge improvements in the technology for capturing records and making them available. Jake showed an example, a Weber County, Utah marriage license. It is one of the rare collections that FamilySearch has captured twice. A scan from microfilm looks like this:

Weber County, Utah marriage license scanned from microfilm

FamilySearch went back recently and captured the records digitally, in color.

A Weber County, Utah marriage license that was digitized in color

Granted, viewing a record scanned from microfilm is often less clear than viewing it on a microfilm reader, but you can see the huge improvement.

FamilySearch does things so the captured images are easier to use. One of the things they have done from early on with books and microfilm was catalog them. A catalog entry can specify locations, authors, subjects, and so forth. For family histories, they might put in a list of surnames, but that was about it. You had to know what you were looking for to find the records you needed.

When you think about what we do now, things are quite a bit different. Indexes contain full names and direct you to individual images. We index (as FamilySearch calls extracting) more than names. We also capture dates and places and relationships. By doing this, not only can you search for them, but FamilySearch can recommend records to you. FamilySearch calls these hints.

There is a range of things that FamilySearch can do to make records more accessible. Some can be done with less cost than others. Jake showed a diagram showing treatments that can be made to a collection. With added accessibility comes added cost. Here is my version of his diagram, including my own definitions:

Definitions:

catalog entry: a single entry for an entire collection
film notes: individual notes for each film
light waypointing: dividing the images of an entire collection into a few groups containing a large number of images
heavy waypointing: dividing the images into more specific groups with fewer images
light indexing: extracting a few, basic pieces of information from an image, perhaps just a name or a name and date
heavy indexing: extracting most genealogically significant information
lineage-linkage: using the record extracts to reconstruct families with links between parents, spouses, and children

Resources are limited. The more work invested in collections, the easier it is to use them, but the number of collections that can be published decreases. “The truth is that it is quite expensive to make collections very, conveniently searchable,” Jake said. “But it is still worth doing. In fact, we want to do it faster.”

The Church of Jesus Christ of Latter-day Saints, FamilySearch owner, has been indexing for a long time in one way or another.

1922 – Church employees started extracting information for the TIB, an early predecessor of the IGI. [I added this bullet point.]
1961 – Church employees started extracted names from historical records at Church headquarters.
1977 – Church members started extracting records at Church buildings via the stake records extraction program.
1986 – Church members began the family records extraction program (FREP) which used data entry by members using home computers.
1994 – The stake and family record extraction programs were consolidated.
2006 – FamilySearch began using the current FamilySearch Indexing tool, utilizing Church members and the general public.

Jake showed the current application, FamilySearch Indexing. He then showed the new, browser-based tool that is now being rolled out. The tool allows the data entry pane to be positioned in various places, such as the left of the screen or the top. One data entry mode, when field positions are well defined, allows data entry overtop of the image itself.

Jake showed statistics of the number of indexing volunteers since 2006.

This graph should be encouraging to everyone. Compare this to the number of people—probably 15,000—indexing the 1880 census some 20 to 25 years ago. The big explosion in indexers in 2012 was because of the 1940 census. This year FamilySearch is on track to have more volunteers than for the 1940 census project.

“It is a wonderful, exciting program to be a part of. You have the satisfaction of knowing that you and 350,000 of your closest friends are all working together to make documents more usable,” Jake said.

He compared the indexing project of the 1880 census to that of the 1940.

1880 US Census Index	1940 US Census Index
Index only	Index and images
56 CD-ROMs	Web
50 million records	132 million records
17 years to complete	150 days

But this amount of indexing is not good enough.

“Do the math,” Jake said. FamilySearch captures about 150 million images in the field each year. FamilySearch is also scanning the microfilm out of the Granite Mountain Record Vault. This year FamilySearch expects to scan about 300 million images. On average there are four to five records per image. That amounts to about 2 billion records digitized just this year. And Jake expects to have the same amount next year. But we are only indexing about 250 million per year. That’s only 12% of the records “brought in the door.”

“Now do you see why I say, we’re not going fast enough?”

Additionally, FamilySearch is trying to increase the number of cameras. Then they will be even more in the hole. Since 90% of indexing is English, the situation is far worse for other non-English records.

“If we want genealogy records to be more helpful more quickly to more people, we need to look at other ways of indexing,” Jake said. He spoke about three ways that might accelerate the number of indexed records: efficiency, collaboration, and computerized assistance.

Tomorrow I’ll report on what Jake said about increasing efficiency and using collaboration. Thursday I’ll finish reporting about his presentation.

6 comments:

UnknownAugust 25, 2015 at 11:53 AM
Obviously Family Search is going to need help from non LDS members - but it is discouraging to index for free and then go back and see that all the images are on Ancestry or FMP and you have to pay to see them. I realize that these are not the same files , but the source says they are from an LDS film so one gets the feeling that you are indexing for Ancestry's benefit.
ReplyDelete
Replies
Karen KAugust 25, 2015 at 2:53 PM
This comment has been removed by the author.
ReplyDelete
Replies
Amanda H JensenAugust 25, 2015 at 10:05 PM
FamilySearch currently has many non-LDS indexers. There are many genealogical societies (the vast majority of which have few members of the LDS Church) that regularly index projects. Indexing in general is not well-advertised outside of genealogical circles. It's not as universal of a thing as using the records that the indices help populate for consumption.
ReplyDelete
Replies
Stuart GourdAugust 27, 2015 at 10:38 AM
i bet there would be a lot more indexing if they let people choose which films to index rather than assigning them films at random. People will index a film just so they can get the information they need.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Biography

The Ancestry Insider was a readers’ choice for the top four genealogy news and resources blogs, part of Family Tree Magazine’s “40 Best Genealogy Blogs” for 2010. He reports on the two big genealogy organizations, Ancestry.com and FamilySearch. He was named a “Most Popular Genealogy Blogs” by ProGenealogists, and has received Family Tree Magazine’s “101 Best Web Sites” award every year since 2008. A genealogical technologist, the Insider has a post-graduate technology degree and holds a dozen technology patents in the United States and abroad. He has done genealogy since 1972 and has worked in the computer industry since 1978. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Subscribe by Email