Last year I intended to do stupendously rich articles about Ancestry.com Bloggers Day presentations. Since I never got around to it, this year you’re getting my stupidously poor notes.
This is the second half of the presentation from Laryn Brown, Ancestry.com senior director, Document Preservation Services (DPS).
* Indexing is not transcribing. It is the process of creating a finding aid for the image. The indexes help narrow your search.
- Here is an example of the information in an index:
- For that example, the image below shows the additional information not available in the index:
* Ancestry.com must work with a large range of sources: manuscript and printed sources, both in all states of legibility.
* One of the toughest jobs in an indexing project is writing the instructions to indexers, precisely communicating what to do with exceptions. This is true whether indexers are English speaking community indexers or paid workers.
* Paleography and Indexing accuracy
- 20-30% of records are indeterminate, even by paleographic experts.
[I think this is a little high, but maybe the statistic holds over a wide range of records. From my experience, I certainly agree with the point that “unaided interpretation” is much more difficult than “aided interpretation.”
The biggest complaints about the quality of indexes come from genealogists who do genealogical lookups (“aided interpretation”), but haven’t done much indexing (“unaided interpretation”). For example, “Samuel” and “Lemuel” are often indistinguishable when indexing. But if you are looking for one in particular and all the other identifying information about the person, his relatives, and such, are as expected, it is pretty easy to give a proper interpretation.]
- Ancestry.com has found that professional Chinese indexers have better character accuracy, and [native-speaking] community indexers have better word accuracy.
[That sounded impressive at the time. In retrospect, for both to be true, Chinese indexers outshine native speakers only for characters that don’t occur in words, such as initials.]
- Audit, arbitrators, and final reviewers ultimately determine the accuracy of an index.
* Professional Indexing
- Ancestry.com uses 2 or 3 firms that specialize in old handwriting. They are very, very fast. The best English paleographers are surpassed by the work of these firms.
- The Chinese ability at character recognition is very good. They learn 2,000+ to read a newspaper. Learning 26 to 30 more is not difficult.
[The Chinese in Taiwan use traditional Chinese characters, for which it takes about 4,000 characters to read a newspaper. Communist China simplified its character set to increase literacy. Adding to the difficulty of learning several thousand characters, each character must be learned in two or more forms, such as standard script, semi-cursive script, grass script, and simplified. As new characters are added, the size of a standard dictionary has grown, from 48,000 characters a century ago to over 100,000 today.
It should be little wonder, then, that professional Chinese indexers can quickly adapt to unfamiliar handwriting.]
- When it comes to unstructured documents, Ancestry.com often uses a firm in Uganda. Since the people there speak English as their native language, they can read narrative English better.
[The Drouin Collection is a good example of narrative records. In the example below, the indexer read “Hogan, Terence Married” in the margin, then scanned the text for the event type and came upon “born.”]
- Infrastructure can be an issue when working with foreign firms. Ancestry.com lost connectivity to a partner for a day because of an earthquake.
* Healing Indexes
- Users are allowed to make corrections and index fields that weren’t indexed by Ancestry.com.
- Ancestry.com has seen a huge increase in corrections since the change to the new record viewer. Andrew thinks they are doing tens of thousands of corrections per week. They are now doing per day what they used to do per month.
- If you index a field that is not in the search form, use Keyword on the search form to search for it.
* World Archives Project
- Maybe 30,000 registered volunteers
* Why Document Preservation Matters
- In March last year, Cologne’s historic archive collapsed into a subway construction site. The archive was one of the three largest in the country, holding 65,000 priceless documents, thousands of maps, and a half million photos. The oldest document dated from 922 A.D.
|[It is estimated that the collapse tore apart one-quarter of the archive’s documents. In a weird twist, plans are underway to piece many back together using software developed by the former East German secret police to spy on citizens by restoring shredded documents. (Source)]|
- A month later, an earthquake in L’Aquila, Italy caused the collapse of the cupola of the 18th-century Baroque church of St Augustine, completely flattening the adjoining Palazzo del Governo that housed the state archives.
|[Officials are attempting to recover around four kilometers of shelves of manuscripts, books, and rare documents. (Source)]|
* The digitizing priorities we set are not unlike your experience scanning your aunt’s records. You may start with the intention of scanning everything, but after a while you decide what is most important and you scan it first.
After Vault Wednesday we’ll return to Ancestry.com Bloggers Day with pictures from our tour of DPS.