“FamilySearch just isn’t indexing records fast enough,” said Jake Gehring. “If that is the case,…then what do we do about it?” Jake is director of content development for FamilySearch and presented at the BYU Conference on Family History and Genealogy last month. Jake’s presentation was titled “FamilySearch Indexing, Robo-keying, and Partnering, Oh My!”
In the last little while Jake has been involved in some research and development, which is really rewarding. It’s fun to work on some things that may become real someday. He emphasized to me that these things may never become real, so keep that in mind as you read.
Back in the old days an index was that thing in the back of the book, not some multi-billion name index you can search from your home. “We index records so that they get used more,” he said. We gather records for the same purpose and have been doing so since 1938, he said. FamilySearch has about 280 cameras, roughly 40 in the United States and the rest abroad.
There have been huge improvements in the technology for capturing records and making them available. Jake showed an example, a Weber County, Utah marriage license. It is one of the rare collections that FamilySearch has captured twice. A scan from microfilm looks like this:
FamilySearch went back recently and captured the records digitally, in color.
Granted, viewing a record scanned from microfilm is often less clear than viewing it on a microfilm reader, but you can see the huge improvement.
FamilySearch does things so the captured images are easier to use. One of the things they have done from early on with books and microfilm was catalog them. A catalog entry can specify locations, authors, subjects, and so forth. For family histories, they might put in a list of surnames, but that was about it. You had to know what you were looking for to find the records you needed.
When you think about what we do now, things are quite a bit different. Indexes contain full names and direct you to individual images. We index (as FamilySearch calls extracting) more than names. We also capture dates and places and relationships. By doing this, not only can you search for them, but FamilySearch can recommend records to you. FamilySearch calls these hints.
There is a range of things that FamilySearch can do to make records more accessible. Some can be done with less cost than others. Jake showed a diagram showing treatments that can be made to a collection. With added accessibility comes added cost. Here is my version of his diagram, including my own definitions:
Definitions:
- catalog entry: a single entry for an entire collection
- film notes: individual notes for each film
- light waypointing: dividing the images of an entire collection into a few groups containing a large number of images
- heavy waypointing: dividing the images into more specific groups with fewer images
- light indexing: extracting a few, basic pieces of information from an image, perhaps just a name or a name and date
- heavy indexing: extracting most genealogically significant information
- lineage-linkage: using the record extracts to reconstruct families with links between parents, spouses, and children
Resources are limited. The more work invested in collections, the easier it is to use them, but the number of collections that can be published decreases. “The truth is that it is quite expensive to make collections very, conveniently searchable,” Jake said. “But it is still worth doing. In fact, we want to do it faster.”
The Church of Jesus Christ of Latter-day Saints, FamilySearch owner, has been indexing for a long time in one way or another.
- 1922 – Church employees started extracting information for the TIB, an early predecessor of the IGI. [I added this bullet point.]
- 1961 – Church employees started extracted names from historical records at Church headquarters.
- 1977 – Church members started extracting records at Church buildings via the stake records extraction program.
- 1986 – Church members began the family records extraction program (FREP) which used data entry by members using home computers.
- 1994 – The stake and family record extraction programs were consolidated.
- 2006 – FamilySearch began using the current FamilySearch Indexing tool, utilizing Church members and the general public.
Jake showed the current application, FamilySearch Indexing. He then showed the new, browser-based tool that is now being rolled out. The tool allows the data entry pane to be positioned in various places, such as the left of the screen or the top. One data entry mode, when field positions are well defined, allows data entry overtop of the image itself.
Jake showed statistics of the number of indexing volunteers since 2006.
This graph should be encouraging to everyone. Compare this to the number of people—probably 15,000—indexing the 1880 census some 20 to 25 years ago. The big explosion in indexers in 2012 was because of the 1940 census. This year FamilySearch is on track to have more volunteers than for the 1940 census project.
“It is a wonderful, exciting program to be a part of. You have the satisfaction of knowing that you and 350,000 of your closest friends are all working together to make documents more usable,” Jake said.
He compared the indexing project of the 1880 census to that of the 1940.
1880 US Census Index | 1940 US Census Index |
Index only | Index and images |
56 CD-ROMs | Web |
50 million records | 132 million records |
17 years to complete | 150 days |
But this amount of indexing is not good enough.
“Do the math,” Jake said. FamilySearch captures about 150 million images in the field each year. FamilySearch is also scanning the microfilm out of the Granite Mountain Record Vault. This year FamilySearch expects to scan about 300 million images. On average there are four to five records per image. That amounts to about 2 billion records digitized just this year. And Jake expects to have the same amount next year. But we are only indexing about 250 million per year. That’s only 12% of the records “brought in the door.”
“Now do you see why I say, we’re not going fast enough?”
Additionally, FamilySearch is trying to increase the number of cameras. Then they will be even more in the hole. Since 90% of indexing is English, the situation is far worse for other non-English records.
“If we want genealogy records to be more helpful more quickly to more people, we need to look at other ways of indexing,” Jake said. He spoke about three ways that might accelerate the number of indexed records: efficiency, collaboration, and computerized assistance.
Tomorrow I’ll report on what Jake said about increasing efficiency and using collaboration. Thursday I’ll finish reporting about his presentation.