Tuesday, January 26, 2010

Ancestry.com Bloggers Day: DPS (Part 2)

Last year I intended to do stupendously rich articles about Ancestry.com Bloggers Day presentations. Since I never got around to it, this year you’re getting my stupidously poor notes.

This is the second half of the presentation from Laryn Brown, Ancestry.com senior director, Document Preservation Services (DPS).


Indexing is not transcribing. It is the process of creating a finding aid for the image. The indexes help narrow your search.

Here is an example of the information in an index:

Example index entry

-  For that example, the image below shows the additional information not available in the index:

Additional information is available from the image

Ancestry.com must work with a large range of sources: manuscript and printed sources, both in all states of legibility.

Ancestry.com works with a large range of sources

One of the toughest jobs in an indexing project is writing the instructions to indexers, precisely communicating what to do with exceptions. This is true whether indexers are English speaking community indexers or paid workers.

Paleography and Indexing accuracy

20-30% of records are indeterminate, even by paleographic experts. 

   [I think this is a little high, but maybe the statistic holds over a wide range of records. From my experience, I certainly agree with the point that “unaided interpretation” is much more difficult than “aided interpretation.”

   The biggest complaints about the quality of indexes come from genealogists who do genealogical lookups (“aided interpretation”), but haven’t done much indexing (“unaided interpretation”). For example, “Samuel” and “Lemuel” are often indistinguishable when indexing. But if you are looking for one in particular and all the other identifying information about the person, his relatives, and such, are as expected, it is pretty easy to give a proper interpretation.]

Ancestry.com has found that professional Chinese indexers have better character accuracy, and [native-speaking] community indexers have better word accuracy.

   [That sounded impressive at the time. In retrospect, for both to be true, Chinese indexers outshine native speakers only for characters that don’t occur in words, such as initials.]

Audit, arbitrators, and final reviewers ultimately determine the accuracy of an index.

Professional Indexing

Ancestry.com uses 2 or 3 firms that specialize in old handwriting. They are very, very fast. The best English paleographers are surpassed by the work of these firms.

The Chinese ability at character recognition is very good. They learn 2,000+ to read a newspaper. Learning 26 to 30 more is not difficult.

   [The Chinese in Taiwan use traditional Chinese characters, for which it takes about 4,000 characters to read a newspaper. Communist China simplified its character set to increase literacy. Adding to the difficulty of learning several thousand characters, each character must be learned in two or more forms, such as standard script, semi-cursive script, grass script, and simplified. As new characters are added, the size of a standard dictionary has grown, from 48,000 characters a century ago to over 100,000 today.

   It should be little wonder, then, that professional Chinese indexers can quickly adapt to unfamiliar handwriting.]

When it comes to unstructured documents, Ancestry.com often uses a firm in Uganda. Since the people there speak English as their native language, they can read narrative English better.

[The Drouin Collection is a good example of narrative records. In the example below, the indexer read “Hogan, Terence Married” in the margin, then scanned the text for the event type and came upon “born.”]

Example from the Drouin Collection index_thumb

Record from the Drouin Collection

-  Infrastructure can be an issue when working with foreign firms. Ancestry.com lost connectivity to a partner for a day because of an earthquake.

Healing Indexes

Users are allowed to make corrections and index fields that weren’t indexed by Ancestry.com.

Ancestry.com has seen a huge increase in corrections since the change to the new record viewer. Andrew thinks they are doing tens of thousands of corrections per week. They are now doing per day what they used to do per month.

If you index a field that is not in the search form, use Keyword on the search form to search for it.

World Archives Project

Maybe 30,000 registered volunteers

Why Document Preservation Matters

-  In March last year, Cologne’s historic archive collapsed into a subway construction site. The archive was one of the three largest in the country, holding 65,000 priceless documents, thousands of maps, and a half million photos. The oldest document dated from 922 A.D.

  An archivist looks at debris of Cologne's archive [It is estimated that the collapse tore apart one-quarter of the archive’s documents. In a weird twist, plans are underway to piece many back together using software developed by the former East German secret police to spy on citizens by restoring shredded documents. (Source)]

-  A month later, an earthquake in L’Aquila, Italy caused the collapse of the cupola of the 18th-century Baroque church of St Augustine, completely flattening the adjoining Palazzo del Governo that housed the state archives.

  Aquila State Archive [Officials are attempting to recover around four kilometers of shelves of manuscripts, books, and rare documents. (Source)]

*  The digitizing priorities we set are not unlike your experience scanning your aunt’s records. You may start with the intention of scanning everything, but after a while you decide what is most important and you scan it first.


After Vault Wednesday we’ll return to Ancestry.com Bloggers Day with pictures from our tour of DPS.


  1. Dear Insider:

    With respect to your comment about transcribing narrative sources like the Drouin Collection - note that the example you gave had a further complication: the margin note said married, but it was really a christening record "I baptized Terence Glynn born July the seventeenth". I speak French, so I have been helping index the Canadian Quebec parish registers, which are narrative form, and have found instances like this.

  2. Dear Charles,

    Oops! Silly me! Thanks for catching that. I noticed "baptized" and, without reading carefully, wondered why he was talking about baptism in a marriage record! Oh, well.

    -- The Insider

  3. I must compliment Ancestry on the ease of making corrections and annotations. I also like the way that they are appreciative of our efforts to improve accuracy and make the record findable for the next searcher. By way of contrast, FindMyPast seems to almost take umbrage at corrections, even when they recognize the error and make the change. The message that they send back is along the lines of, "well, in this rare instance it appears that you are correct, so we have reluctantly made the change in the record."

  4. AI, you said:

    "- Users are allowed to make corrections and index fields that weren’t indexed by Ancestry.com."

    Users are not allowed to make corrections (delete the erroneous Ancestry so-called-record entry from record and index). Users can add suggestions that usually will be added to the index and noted in brackets on the purported 'record'.

    What users can make suggestions about for the purpose of indexing and 'record' entries is very limited. For example the recently-added Delaware Marriages database purporting to begin in 1806 actually begins in 1861: the entries indexed for all earlier years were erroneously indexed. Users are allowed to add suggestions only for names (for indexing and 'record' purposes). There is an option to add a "comment," but it will only be visible if someone clicks to view the 'record' for the entry for *that person* (so for marriages, a comment correcting a wrongly entered date must be entered for each party to the marriage). This is one way that idiotic data gets put into trees -- when people click on the 'record' and link it into a tree person, but don't look at the actual record image to verify what it says.

    In US Federal Census enumerations for 1880 and later, often a grandchild is living with a grandparent. In these instances the Ancestry.com indexer/abstractor usually invented who they thought the parent or parents were of the grandchild, if someone else with the same surname as the grandchild was living in the household. Ancestry.com added fields for the grandchild stating who the father and/or mother was. Since this information was *not* given in the enumerations (only the grandchild's relationship with the head of household), this is very often wrong. There is no way in the Ancestry.com forms to 'correct' this or add alternate information. The background coding only allows linking the invented parent to the grandchild in making a tree connection, not another person in the household who the viewer may know was actual parent. Another set of junk for trees.

    "- Ancestry.com has seen a huge increase in corrections since the change to the new record viewer. Andrew thinks they are doing tens of thousands of corrections per week. They are now doing per day what they used to do per month."

    Ancestry.com has also added a large number of badly indexed databases, such as the aforesaid Delaware Marriages (alao the equally badly indexed, recently added Delaware Births and Delaware Deaths databases, and another Delaware Marriages database, including Marriage Bonds, added several months back).

  5. Geolover points out the hazard of accepting some one else's conclusions and what can happen when you "...don't look at the actual record image to verify what it says."

    That is exactly what all searchers need to remember. I cannot imagine that any serious researcher would accept someone else's conclusions about relationships and other information in a record without looking at the actual image.

    The indexing is nothing more than a means to help us find the actual record, and Ancestry could not possibly even think of verifying the accuracy of comments, or attempting to mediate differences of opinion. The beauty of the their correction process is that added opinions also become searchable, but are never represented by Ancestry as being the final word.

    I cannot assume responsibility for the garbage that others put into their tree, even when it involves people on my own tree. I can only do my best to make my own record as nearly accurate as possible. I do not worry about "corrections" that others (including the original indexers) may make. I am just glad that the records are there and that someone has indexed them to make them searchable. In fact, when I cannot find a record that I think should be there, I try to think of all the wrong ways that it might have been indexed and search for those. It often pays off handsomely.


Note: Only a member of this blog may post a comment.