Thursday, February 4, 2010

Ancestry.com Bloggers Day: Lunch with Tim Sullivan

This is another in a series of reports about Ancestry.com Bloggers Day 2010.

We had lunch with Ancestry.com CEO, Tim Sullivan and general manager, Andrew Wait. Here’s my brief notes:

Andrew Wait told us that feedback from their My Story ads said the ads didn’t explain enough about what the genealogy experience was like. In fact, the life-changing stories set the bar so high that average people couldn’t identify with the experiences.

As a result, five days earlier Ancestry.com started a new advertizing campaign that goes back to the previous style a bit.

Tim Sullivan asked us if we had any questions for him. When there was a half-second pause, he said if we didn’t have any questions, he had questions for us. Then he asked us…  uh…  …something. I don’t actually remember what it was. My notes are devoid of anything Sullivan said. Sorry, Tim! I did jot down some comments from my fellow writers:

DearMYRTLE said, “Genealogy is a winter sport.” Does that mean Tim asked if we were seeing an upswing in genealogical interest?

iPhoneTreeToGoAt some point Andrew said, “Try a Twitter search of ‘Ancestry.com.’ You’ll see lots of positive feedback.” I think that means several of us expressed appreciation that Ancestry.com had taken the time to meet with us and said our opinions of Ancestry.com were much improved. I think someone even contrasted the day with the infamous “Internet Biographical Database” fiasco. [To read more on that subject, I recommend the series of articles by fellow attendee, Craig Manson.]

I can’t remember what led to my favorite comment of the day. Thomas MacEntee said, “I’ve always thought of genealogy as CSI without the icky bodies.” Mysteries. Dead people. Detective work. Yup; I think he nailed that one pretty well.

The final note I have on lunch was Andrew’s announcement that Ancestry.com had submitted “Tree to Go,” an iPhone application which would be available soon in the iPhone store. [Ancestry.com announced the application to the public on 19 January 2010.]

 

Who did Ancestry.com throw at us right after lunch? We were hoping it would be someone who could keep us awake. We were not disappointed. Stay tuned…

 

Ancestry Bios Tim Sullivan Tim Sullivan is the CEO of Ancestry.com, Inc. He was previously CEO of Match.com. Under Tim’s leadership, Match.com expanded globally into 29 local languages and grew paid subscribers from 189,500 to nearly one million while growing revenue more than six-fold. Prior to joining Match.com, Tim was vice president of e-commerce for Ticketmaster Online-Citysearch, Inc. Before that he spent seven years at the Walt Disney Company where he was vice president and managing director for Buena Vista Home Entertainment Asia Pacific. Tim is a graduate of Harvard Business School and was a Morehead Scholar at the University of North Carolina at Chapel Hill.

Wednesday, February 3, 2010

Vault Vednesday: Open House

Public tours at the GMRVThe public were invited to tour the awesome caverns of the Granite Mountain Record Vault (GMRV) starting 4 December 1963. After the open house, the vault would be closed to the public.

Storage vaults were constructed between about 120 and 350 feet into the mountain. Each of six vaults is about 200 feet long, extending 27 feet wide, and reaching over 15 feet high. The tunnels were lined with heavy corrugated steel and concrete was pumped in to fill the space between the steel and the granite tunnel walls.

NGS Conference Church Library Open House

The Church History Library is a state-of-the-art archival library for the Church of Jesus Christ of Latter-day Saints. It just opened in June 2009. See fascinating demonstrations of the latest conservation methods for photographs, sound recordings, and aging books. The archive uses high-density, climate controlled storage vaults for old manuscripts, photographs, maps, books, Church records, and other artifacts.

The tour is included in your conference registration for no extra cost. For a sneak peak of what you will see on the tour, click this link and then click the play button.

2010 NGS Family History ConferenceEarly bird registration must be postmarked by 8 March 2010. There are just 36 days left.
Pre-registration must be postmarked by 12 April 2010. There are just 71 days left.
The conference begins 28 April 2010. There are just 87 days left.

This is another in a series highlighting the Granite Mountain Record Vault (GMRV) and the NGS Family History Conference coming to Salt Lake City, 28 April—1 May 2010.


Sources

      Dexter Ellis, "Inspection Tours Set for Records Vaults in Canyon," Deseret News (Salt Lake City, Utah), 30 November 1963, Church News section, p. 3, cols. 2-5; digital images (http://news.google
.com/newspapers : accessed 25 December 2009). 
     "Church Invites Public To Visit Cottonwood Genealogy Vaults," Deseret News (Salt Lake City, Utah), 2 December 1963, p. B 5, cols. 6-8; digital images (http://news.google.com/newspapers : accessed 25 December 2009). Also see “Deep Vaults to Protect Church Files,” Los Angeles Times, 2 December 1963, p. b15; and “Plan to Show Record Vault of Mormons,” Chicago Tribune, 2 December 1963, p. C16.
     "Vault Toured By Church, Civic Leaders," Deseret News (Salt Lake City, Utah), 3 December 1963, p. 12 B, col. 1; digital images (http://news.google.com/newspapers : accessed 25 December 2009).
     The Genealogical Society of the Church of Jesus Christ of Latter-day Saints, Records Protection in an Uncertain World, 16 p. brochure ([Salt Lake City, Utah: self-published, 1973).

Tuesday, February 2, 2010

Ancestry.com Bloggers Day: Technology (Part 2)

Last year I intended to do stupendously rich articles about Ancestry.com Bloggers Day presentations. Since I never got around to it, this year you’re getting my stupidously poor notes.

Mike Wolfgramm and Jonathan Young gave us the last presentation prior to lunch. Yesterday we talked about Dexter, the flexible content digitization pipeline. Today we will talk about:

  • Named entity extraction
  • Vertical [unique to Ancestry.com] search engine
  • Record linking
  • Hint engine – technology behind the shaky leaf
  • PersonRank – Search engine that powers Mundia (pronounced, “Moon-dia”)

Named entity extraction

Named entity extraction derives facts from unstructured data using advanced algorithms to find names, dates, and places. As I mentioned yesterday, computers are very stupid. Ancestry.com uses machine learning to train the system to identify names, dates, and places.

Having these facts separate makes the records searchable.

Wolfgramm and Young showed us the example below. I’ve circled items in these colors:

  • Name of Deceased: Lime green
  • Age at Death: Yellow
  • Death Date: Orange
  • Obituary Date: Red
  • Locations Mentioned: Purple and pink (we’ll see why I used two colors in a moment)
  • Other Persons Mentioned: Green

JeanHessObit

Below I’ve included the corresponding record from the Ancestry.com U.S. Obituary Collection. I’ve circled items with the same colors as above so you can easily compare the two. As you can see, the algorithm did pretty darn well, for a stupid computer. It got the name of the deceased wrong, but did pick it up in the list of others mentioned. It got the obituary publication date wrong. The algorithm missed three locations (circled in purple): California, San Bernardino County, and Deplaines, although that last one is probably a misspelling of Des Plaines. It got the seven locations circled in pink. Lastly, it picked up all six names of other people.

Jean Hess Obituary Record from Victorville Daily Press

Interestingly, this same, exact obituary also appeared in another newspaper and was picked up by Ancestry.com a year earlier. Back then, the performance of their named entity extraction technology apparently didn’t work as well. Notice in the record, below, that no names were picked up.

Jean Hess Obituary Record from Barstow Desert Dispatch

I asked why the dates were displayed ambiguously, rather than spelling the month out. Wolfgramm explained that they received the data from a third party in that format. He told us that they could fix the problem. Sure enough, within a couple of days, Ancestry.com had the problem fixed. Wow! I wish I could get all bugs fixed that fast!

Vertical Search engine

The problem:

  • Variations in names, dates and places
  • Need to apply name authority (name alternatives)
  • In 1841 UK census ages of those over 15 were usually rounded down to next 0 or 5
  • Rogers, 1985 study found 15% of birth places differ between 1851 and 1861 censuses
  • Significant number of recording and transcription errors
  • Searching 4+ billion records quickly is a challenge

The solution is a vertical search engine that can measure closeness:

  • typographically
  • phonetically
  • date proximity
  • place proximity
  • fuzzy matching

Record Linking

  • Example: How do 3 tree records relate to each other?
  • [I can’t remember how this differed from PersonRank, below.]

Hint Engine

  • Leverages search technology and record linking
  • Computationally expensive – built with a scalable architecture
  • Key collaborative networking technology – don’t have to do brute force compare between all people in all trees when users establish links between trees
  • Acceptance vs. rejection of hints allows algorithmic improvements.
  • Slightly over 80% of hints are accepted.
  • Hint-originated searches are usually more effective because of the additional search information taken from the tree

PersonRank

  • PersonRank is the algorithm used to determine if two individuals in different trees are the same person
  • Q. Is PersonRank used only between tree individuals?
    A. It was Initially, but it is used now for all tree hints.
  • Q. Is it used for regular searches?
    A. No. Perhaps in the future.

Finally! We made it to lunch time! Lunch was with Tim Sullivan and Andrew Wait.

Monday, February 1, 2010

Ancestry.com Bloggers Day: Technology

Last year I intended to do stupendously rich articles about Ancestry.com Bloggers Day presentations. Since I never got around to it, this year you’re getting my stupidously poor notes.

Mike Wolfgramm, senior vice president, and Jonathan Young, vice president, of development gave us the last presentation prior to lunch. Their opening slide listed the six key agenda items below. In this article I’ll cover the first item. Tomorrow I’ll finish the remaining five items.

Key technologies we will talk about:

* Flexible content digitization pipeline

* Named entity extraction

* Vertical [unique to Ancestry.com] search engine

* Record linking

* Hint engine – technology behind the shaky leaf

* PersonRank – Search engine that powers Mundia (pronounced, “Moon-dia”)

It is a challenge to handle different document characteristics

* Documents are very different: census, Chinese family history, manuscript, newspaper, German directories

Ancestry.com works with a large range of sources

Flexible Content Pipeline “Dexter” – applies only the appropriate steps from a plug-in toolset:

Dexter, the Ancestry.com content pipeline has many optional tools

Scan manager* – handles alternate multi spectral scans.

-  Showed the examples that I included last Monday.

Kimberly Powell suggested including the spectrum type in the source metadata. I was not able to locate any alternate spectrum images online. I suppose these are yet to be published.

Auto de-skew*

Auto crop*

Image QA

Watermarking* – intelligent algorithms for placement of watermark
MyFamily.com watermark

Table of contents tool
Table of Contents of a book on Ancestry.com

Auto leveling

Image conversion

Binarization* * – converting a grayscale image to black and white [also called thresholding]

Niblack’s method has 0.67% OCR error rate

Ancestry.com method has 0.04%

Auto Classification

-  Automatically classifies document types per page

-  Routes desired page to next stage

-  Wolfgramm mentioned an example from the British Army WWI Pension Records 1914-1920. Only the first page—the Attestation page—of a soldier’s pension file required indexing. Automatic classification distinguished these pages (below left) from other pages (below right).
 Auto classification identified the pages that required indexing Auto classification could tell the difference between attestation pages vs. others

Auto name fielding* is necessary because computers are extremely stupid and must be taught to recognize names. This works best for consistently formatted pages. Yearbooks are an example. The computer must be taught that the words to the left of the pictures are names, family name first, an then given names (below).
NameFielding
Even though the computer is taught how to look for names, it still has a hard time. Because OCR is used instead of indexers, not all names are recognized. I tested the example above and found Eileen Draper and Joan Clark were recognized, but not John Earl or Mary Jo Ellis.

Keying tool*

Fact extraction

Inference engine – advanced algorithms to create inferences from existing data. Infer birth year from age, infer mother from wife, infer father from son-in-law

Field normalization - [I think this is the same as “Named entity extraction,” below.]

And more…

Dexter tools, above, that are followed by an asterisk * are technologies with patents or patent applications.

Tomorrow we’ll finish up the technology presentation.

Mike Wolfgramm of Ancestry.com Mike Wolfgramm serves as senior vice president of development and is responsible for the development and delivery of Ancestry.com (the website). Wolfgramm has over fifteen years of experience in technology and product development. Prior to Ancestry.com, he served as senior architect and senior director of development at Open Market, Inc., where he managed the overall development of the Infobase technology application which had more than 35 million customers worldwide. Wolfgramm began his career at WordPerfect in Orem, Utah. He is a graduate of Brigham Young University where he received a bachelor's degree in computer science.

Jonathan Young of Ancestry.com Jonathan Young is vice president of development for Ancestry.com. He joined Ancestry.com from Earthlink, where he served as vice president of development and was responsible for development, testing, subscription, and billing platforms across multiple sites. Prior to Earthlink, Young spent ten years at Turner Internet Technologies, where he served as vice president of product development for Turner’s internet properties. Young earned his bachelor’s degree in astrophysics and Asian studies from Williams College.

Friday, January 29, 2010

Visiting the Family History Library? Dine at the JSMB

Starting Monday, you’ll have to use a new ramp to enter the Joseph Smith Memorial Building (JSMB) parking garage. More on that later. Right now, I’m hungry.

Construction of the City Creek project has severely limited dining choices for Family History Library (FHL) visitors. Several choices not too far from the FHL are the restaurants in the Joseph Smith Memorial Building:

The Roof Restaurant The Roof Restaurant is Utah's premier gourmet buffet. With a selection of international and domestic cusine [I think they meant cuisine] prepared daily by our head chef, and an inspiring view of Temple Square that can't be beat. Open nightly for dinner. [Located in the northwest corner of the 10th floor of the JSMB. This is the most expensive “cusine” in Salt Lake City.]

 

The Garden Restaurant The Garden Restaurant is open for lunch and dinner with American cuisine that includes pasta, gourmet salads, hambugers [I hope they meant hamburgers], and our chef's special entrees. The casual garden atmosphere and affordable pricing [as opposed to the Roof] make it a great dining location for groups or families! [Also on the 10th floor, but in the southwest corner. Avoid sunset and ask for a table with a view of the Temple.]

 

The Nauvoo Cafe The Nauvoo Café is a downtown Salt Lake City, Utah hotspot for hot breakfast, lunch, and dinner. Known for its famous hot-carved sandwiches and succulent pot pies, The Nauvoo Café is sure to satisfy your taste buds. And the affordable pricing will satisfy your budget. [“Affordable” is true, relative to the other three. Utilizing a cafeteria style line, this place has the fastest service—if the line isn’t too long. First floor of the JSMB, on the west side.] (Source)

 

The Lion House Pantry Restaurant Enjoy exceptional home-style fare as you dine amidst the history that surrounds you, in Brigham Young’s personal residence, The Lion House, in Downtown Salt Lake City. Offering a selection of entrees that rival the best home cooking, The Pantry Restaurant features authentic recipes that have been passed down through generations. [Located East of the JSMB, go down the alley just past the Church Administration Building. The entrance is towards the back of the building. The home-style cooking is accentuated by the slightest suggestion of the smell in your grandparent’s basement. I’m not saying that’s a bad thing; just how many people can say they’ve eaten in Brigham Young’s root cellar?] (Source)

It’s a good idea to drop by the FamilySearch Center in the Joseph Smith Memorial Building and pick up a 10% discount coupon—good at any of these four restaurants.

Important: Parking Entrance Change

Now back to the parking garage changes.

Beginning Monday, 1 February 2010, the old entrance to the JSMB parking garage will close and a new entrance will open. The new entrance ramp will be in the middle of the street, like the entrance to the the Conference Center parking. (They call it an "in-street parking ramp.") To enter the parking garage, you must be going west. The entrance ramp begins at or near the intersection of State Street and South Temple Street.

New entrance to the JSMB parking garageThe Church Administration building backdrops the sign for the new entry ramp

New JSMB parking entry ramp 
A four-wheeler temporarily blocks the new
entry ramp to the parking garage of
the Joseph Smith Memorial Building
(visible on the right).
The old ramp will close permanently.

At the bottom of the ramp, you'll make a right-hand turn to enter the parking garage. (Once the City Creek underground parking is open, a left-hand turn will take you into it.)

When exiting, you'll leave the garage and make a right-hand turn to go up the exit ramp, again heading westbound. The exit ramp surfaces (like a theater vomitorium) towards the intersection of Main Street and South Temple Street.

Parking under the JSMB is limited, and is among the more expensive parking downtown. However, if you eat at one of the aforementioned restaurants, you can get your parking ticket validated.

Thursday, January 28, 2010

Ancestry.com Bloggers Day: DPS Tour

We were treated to a tour of Document Preservation Services (DPS) after Laryn Brown’s presentation. We were told this was the only place we could use our cameras, so of course I forgot and left my good camera behind. Fortunately, I still had my glasses and umbrella (wink, wink).

Digital Preservation Services (DPS) occupies a half floor in one of the two buildings at Ancestry.com. Workers were sandwiched into small cubicles with no sound barriers. It was like a hive of activity (right). Ancestry.com Digital Preservation Services
Microfilm scanner at Ancestry.com DPS Microfilm scanning is only done at the DPS facility in Provo, Utah. Any film that needs to be scanned is shipped here. Laryn Brown told us that they keep a high speed film scanner busy around the clock (left).
Images whirl by on the operator’s computer screen (right). While the scanner is capable of higher speeds, Ancestry.com limits the speed so the operator can perform a quick quality check on every image. Others perform more extensive checks (below). Images on the operator's monitor, Ancestry.com DPS
Ancestry.com Digital Preservation Services Ancestry.com Digital Preservation Services
Operator positions documents below a planetary camera Ancestry.com uses a planetary camera to digitize documents to fragile to run through a sheet-fed scanner. An operator places the documents on a flat surface underneath the scanner (left). The camera is mounted straight above the documents. The operator takes a picture, which is transferred directly into the computer (below).
Ancestry.com Digital Preservation Services camera The planetary camera transfers the image to a computer
Ancestry.com uses a Kirtas book scanner for high speed scanning of books. Two cameras are employed to photograph the left and right pages simultaneously (below). The scanner automatically turns pages (right). The Kirtas book scanner automatically turns book pages
Kirtas book scanner at Ancestry.com Ancestry.com Digital preservation services
While we were there, Ancestry.com proudly showed off some valuable records they saved from destruction. It hurt to see they were cutting off the spines so the pages could be fed through a sheet scanner. But it was good to realize that as a result, lots of people could get access. After we left, they asked us not to mention the records. They weren’t supposed to show us because the record set hasn’t been announced. Ancestry.com unidentified records
Ancestry.com DPS project tracking board Ancestry.com employee explains stuff about some sort of project development board. My memory fails me, but I think this room was used to track projects during imaging and keying? Maybe? (Left)
Ancestry 2010 DPS Project board Ancestry 2010 DPS Project board closeup
Another employee explains another project board, the purpose of which has again eluded me (above). I’m guessing that the board shows projects that are nearing publication. Each project has a flag and a photograph associated with it (above).
Canon DR-6050C Scanner Somehow I didn’t get a picture of the sheet-fed scanner Ancestry.com was using to scan the records that we weren’t supposed to see. The scanner is able to scan both sides of a page at once. It is the same scanner they take out to do free scanning for people. (More on that later.)

               

That’s it for our tour. Next week I’ll give you a report on the technology presentation by Mike Wolfgramm and Jonathan Young.

Wednesday, January 27, 2010

Vault Vednesday: Vault Toured by Leaders

Openings into the GMRV 
Four openings into the face of the
mountain are easily seen in this photo,
plus the access building at the right.

It’s Vault Vednesday! This is another in a series highlighting the Granite Mountain Record Vault (GMRV) and the NGS Family History Conference coming to Salt Lake City, 28 April—1 May 2010.

Vault Toured by Leaders

After three years of construction, by December 1963 the Granite Mountain Record Vault was virtually completed. Tours were given to top authorities of the Church of Jesus Christ of Latter-day Saints on the first Monday in December. A luncheon and film showing construction work preceded the tours. Hugh B. Brown and N. Eldon Tanner, counselors in the Church presidency spoke briefly. The following day, business leaders and educators toured the vast tunnel complex.

Brown paid tribute to those who conceived, planned, and built the vault. Tanner said that the natural humidity and temperature are most ideal and the vault would provide maximum protection for irreplaceable microfilm. The cost for the project, at the time, ran under $2 million.

2010 NGS Family History ConferenceThe 2010 NGS Family History Conference

If you come to NGS this year, you’ll get to attend “An Evening Celebration of Family History,” a special event Thursday evening.

Don’t miss this historic evening of entertainment and celebration! FamilySearch and the Utah Genealogical Association join to bring you a unique experience—a memorable evening held at the LDS Conference Center at Temple Square. The evening will include a multi-media tribute to family history, special guest speaker, and mini-concert by the Mormon Tabernacle Choir.

All conference attendees will receive a free ticket to this special event.

Tune in next week for public tour information, same vault time, same vault channel!

Early bird registration must be postmarked by 8 March 2010. There are just 41 days left.
Pre-registration must be postmarked by 12 April 2010. There are just 76 days left.
The conference begins 28 April 2010. There are just 92 days left.


Sources

      Dexter Ellis, "Inspection Tours Set for Records Vaults in Canyon," Deseret News (Salt Lake City, Utah), 30 Nov-ember 1963, Church News section, p. 3, cols. 2-5; digital images (http://news.google.com/news papers : accessed 25 December 2009). 
     "Vault Toured By Church, Civic Leaders," Deseret News (Salt Lake City, Utah), 2 December 1963, p. 12 B, col. 1; digital images (http://news.google.com/newspapers : accessed 25 December 2009).
     James B. Allen, et. al., “Hearts Turned to the Fathers,” BYU Studies Vol. 34 No. 2 (1994-1995).

Tuesday, January 26, 2010

Ancestry.com Bloggers Day: DPS (Part 2)

Last year I intended to do stupendously rich articles about Ancestry.com Bloggers Day presentations. Since I never got around to it, this year you’re getting my stupidously poor notes.

This is the second half of the presentation from Laryn Brown, Ancestry.com senior director, Document Preservation Services (DPS).

Indexing

Indexing is not transcribing. It is the process of creating a finding aid for the image. The indexes help narrow your search.

Here is an example of the information in an index:

Example index entry

-  For that example, the image below shows the additional information not available in the index:

Additional information is available from the image

Ancestry.com must work with a large range of sources: manuscript and printed sources, both in all states of legibility.

Ancestry.com works with a large range of sources

One of the toughest jobs in an indexing project is writing the instructions to indexers, precisely communicating what to do with exceptions. This is true whether indexers are English speaking community indexers or paid workers.

Paleography and Indexing accuracy

20-30% of records are indeterminate, even by paleographic experts. 

   [I think this is a little high, but maybe the statistic holds over a wide range of records. From my experience, I certainly agree with the point that “unaided interpretation” is much more difficult than “aided interpretation.”

   The biggest complaints about the quality of indexes come from genealogists who do genealogical lookups (“aided interpretation”), but haven’t done much indexing (“unaided interpretation”). For example, “Samuel” and “Lemuel” are often indistinguishable when indexing. But if you are looking for one in particular and all the other identifying information about the person, his relatives, and such, are as expected, it is pretty easy to give a proper interpretation.]

Ancestry.com has found that professional Chinese indexers have better character accuracy, and [native-speaking] community indexers have better word accuracy.

   [That sounded impressive at the time. In retrospect, for both to be true, Chinese indexers outshine native speakers only for characters that don’t occur in words, such as initials.]

Audit, arbitrators, and final reviewers ultimately determine the accuracy of an index.

Professional Indexing

Ancestry.com uses 2 or 3 firms that specialize in old handwriting. They are very, very fast. The best English paleographers are surpassed by the work of these firms.

The Chinese ability at character recognition is very good. They learn 2,000+ to read a newspaper. Learning 26 to 30 more is not difficult.

   [The Chinese in Taiwan use traditional Chinese characters, for which it takes about 4,000 characters to read a newspaper. Communist China simplified its character set to increase literacy. Adding to the difficulty of learning several thousand characters, each character must be learned in two or more forms, such as standard script, semi-cursive script, grass script, and simplified. As new characters are added, the size of a standard dictionary has grown, from 48,000 characters a century ago to over 100,000 today.

   It should be little wonder, then, that professional Chinese indexers can quickly adapt to unfamiliar handwriting.]

When it comes to unstructured documents, Ancestry.com often uses a firm in Uganda. Since the people there speak English as their native language, they can read narrative English better.

[The Drouin Collection is a good example of narrative records. In the example below, the indexer read “Hogan, Terence Married” in the margin, then scanned the text for the event type and came upon “born.”]

Example from the Drouin Collection index_thumb

Record from the Drouin Collection

-  Infrastructure can be an issue when working with foreign firms. Ancestry.com lost connectivity to a partner for a day because of an earthquake.

Healing Indexes

Users are allowed to make corrections and index fields that weren’t indexed by Ancestry.com.

Ancestry.com has seen a huge increase in corrections since the change to the new record viewer. Andrew thinks they are doing tens of thousands of corrections per week. They are now doing per day what they used to do per month.

If you index a field that is not in the search form, use Keyword on the search form to search for it.

World Archives Project

Maybe 30,000 registered volunteers

Why Document Preservation Matters

-  In March last year, Cologne’s historic archive collapsed into a subway construction site. The archive was one of the three largest in the country, holding 65,000 priceless documents, thousands of maps, and a half million photos. The oldest document dated from 922 A.D.

  An archivist looks at debris of Cologne's archive [It is estimated that the collapse tore apart one-quarter of the archive’s documents. In a weird twist, plans are underway to piece many back together using software developed by the former East German secret police to spy on citizens by restoring shredded documents. (Source)]

-  A month later, an earthquake in L’Aquila, Italy caused the collapse of the cupola of the 18th-century Baroque church of St Augustine, completely flattening the adjoining Palazzo del Governo that housed the state archives.

  Aquila State Archive [Officials are attempting to recover around four kilometers of shelves of manuscripts, books, and rare documents. (Source)]

*  The digitizing priorities we set are not unlike your experience scanning your aunt’s records. You may start with the intention of scanning everything, but after a while you decide what is most important and you scan it first.

 

After Vault Wednesday we’ll return to Ancestry.com Bloggers Day with pictures from our tour of DPS.

Monday, January 25, 2010

Ancestry.com Bloggers Day: DPS

After the data center, Ancestry.com loaded us back into black vans and whisked us down south of Salt Lake City, past Stonehenge, to Ancestry.com headquarters in Provo, Utah. I was hoping for fake motorcycle cops ahead and behind us, but no luck.

When we arrived, Laryn Brown, senior director, document preservation operations, gave us a presentation about Digital Presentation Services (DPS), followed by a tour. DPS is the Ancestry.com equivalent of the FamilySearch Digital Pipeline, but with a less imaginative name.

Data Preservation Services (DPS) - Our aim is to preserve family history records across the globe and to make them searchable online.

A map of Ancestry.com locations from May 2009

DPS operates globally

20 locations around the world 

Permanent offices in Provo, Washington DC, and London

DPS branch operations in Ancestry.com offices in: Sydney, Paris, Munich, China

Domestic DPS operations are in: Worcester MA, Albany NY, Atlanta GA, Montgomery AL, Savannah GA, Topeka KS, and Kansas City MO.

The DPS process consists of

1.  Discovery & Licensing – the process of finding archives and libraries willing to share their material.

2.  Acquisition

Even public domain material takes permission to physically access.

Took 4 years of negotiation permission to digitize at one British archive

3.  Preservation/stabilization

Some material in lesser archives needs to be stabilized prior to imaging


Photograph of rain soaked records found by Ancestry.com In 2008 Ancestry.com arrived at an archive and found the records soaked and piled in a restroom. “If we hadn’t been there at that moment,” says Brown, “about 1.2 million pages of birth and death records would have disappeared.”

4.  Imaging

Document forensics are required for Special problems. As we got to see last year, Ancestry.com developed a special digitization camera employing a technique usually called Multispectral imaging (although Brown didn’t use that term).  Below are examples from the 1851 England Census in the Manchester area that were water damaged during storage, but were restored using lighting outside the visible spectrum. The examples below show how three example documents appear under normal lighting conditions, and how the documents appear when digitized using multispectral imaging.


Visible Light
Alternate spectrum light
Sample 1 in visible light Sample 1 in infrared light
Sample 2 in visible light Sample 2, probably IR light
Sample 3 in visible light Sample 3 in UV light

[It’s worth noting that prior to Ancestry.com, the Manchester & Lancashire Family History Society used volunteers and multispectral transcription to produce legible transcriptions of these same records. Read more on the project’s website where some of the information is available at no cost.]

5.  Indexing – more about Indexing tomorrow.

6.  Result is posted online, live to customers.

 

Next time we’ll finish the remainder of Brown’s presentation.

Ancestry.com's Laryn Brown Laryn Brown, senior director of document preservation, is a ten-year veteran at Ancestry.com, working as a product manager, development manager, and now in Document Preservation Services. He has a background in imaging and spent two years in London establishing the global imaging group that is currently photographing records in nine countries. He graduated from Brigham Young University with a masters degree in business management. An avid genealogist, Laryn spends most of his time doing Scottish research.