Sunday, August 19, 2007

BYU Conference: The Digital Pipeline - Chapter 3

Stages of the Digital PipelineThis is the third installment of our report on Bill Mangum's presentation, "The Opening of the Digital Pipeline," from the final day of the BYU Genealogy and Family History Conference. In the first part, we presented the analogy of a pipeline and introduced the stages in the digital pipeline. In the second part, we talked about handling current gaps in the pipeline and the processes in the pipeline up through the scanning step. This week we move down the pipeline through the Infobahn stage. Next week we'll finish the presentation by talking about FamilySearch Labs and Record Search.

Scanning Stats

The Granite Mountain Record Vault contains about 2.5 million rolls of microfilm estimated to contain about 20 billion names. Five years ago when scanning began, it would have taken over 120 years to convert all the records in the vault. With advancements in scanning technology, they've increased the output by over four times utilizing four scanners. An employee in the audience informed us that they were now up to eight scanners and that the scanner manufacturer is about to double the speed of the scanners.

I suppose one can take the original 120 year estimate and shorten it to 30 years with the 4-time improvement in the last five years. But I don't suppose doubling the number of scanners and doubling the scanning rate translates completely into another 4-fold improvement. Pulling a number out of the air, scanning might be complete in ten years without any further scanners. But remember that the plan is to work up to 15 scanners. Depending on how soon the scanners are added, its conceivable that scanning could be finished in 5 to 10 years.

Describe - Waypoints

The describe team creates guidance information for users of a record collection, describing why the records were created, how to use the records and what information is included. The catalog descriptions are prepared for the Family History Library Catalog.

To make record collections useful before the records are indexed, the waypoints team creates image groups. These are called waypoints as we shall see later in Record Search. Waypoints might be set up by location, date, name or other grouping, depending on how the original records are organized. Waypoints are subdivided to make the number of images in each image group small enough so users can browse through the images to find the desired records.

Indexing

While browsing digital images has some value, the records are much, much more accessible when one can type in a name and search for desired records. To make this possible, the handwriting on the digital images must be converted to text. Because technology is not capable of performing this conversion, people must view the images and manually type (or transcribe) the information. A searchable list of text is called an index, so FamilySearch calls the transcription process "indexing." www.FamilySearchIndexing.org

FamilySearch Indexing is an Internet-based system that allows volunteers to transcribe information from digital images which is then used to make searchable indexes of those images. Because FamilySearch Indexing uses the Internet to deliver images to volunteers and return transcribed text back to FamilySearch, volunteers can work anywhere they can connect to the Internet.

FamilySearch Indexing
Click to view larger format
Photograph by Robert Casey
© 2007 IRI

Once a batch of images is downloaded to your computer, the FamilySearch Indexing application allows you to work without being connected to the Internet. One batch takes about 30 to 60 minutes to complete. Text is typed into a table and highlights show the part of the image to type.

Mangum reported that there were 55,000 volunteer indexers and the number was growing fast. Because of an article in the Ensign magazine, 7,000 people signed up the previous week, temporarily overloading the servers. Over 16,000 batches were completed in a day which amounted to over 782,000 names per day. There were 67 million names done in the past year. FamilySearch expected to hit 125,000 volunteers soon. They are on the verge of producing multiple language versions of FamilySearch Indexing which will allow them to scale up to millions of names every day. However, everyone's help is necessary to make the program successful. To sign up, visit www.familysearchindexing.org

Final Assembly and Infobahn

In final assembly, miracles happen. All of the data and images come together. A large amount of behind-the-scenes operations must be performed on the data and the indexes to make them easy to search These are called standardization, normalization, and addition of derived information.

Once assembled, the Infobahn team imports the information to a test environment called staging where the team validates that it all works. It is then published to the production website, that is, the actual website.

1 comment:

  1. Last Tuesday (14-Aug-2007), 1.2 million names were indexed by 70,000 indexers!

    Keep up the good work!

    (Source: Renee's Genealogy Blog)

    ReplyDelete