Tuesday, July 31, 2012

Ancestry.com Versus FamilySearch Indexing Quality

Ancestry.com vs. FamilySearch Indexing QualityI’ve warned before that no matter what index you use, you’re going to find your relatives misindexed. You have better context than cold indexers. (See the Indexing Illustration in “Indexing Errors: Test, Check the Boxes.”)

To demonstrate the point, I thought I would compare the 1940 U.S. Census Indexes of Ancestry.com and FamilySearch for the state of Utah. I figured FamilySearch’s large Utah indexing workforce would have a big advantage over Ancestry’s offshore workforce. I searched for all people named Alonzo. The name is unusual (because I didn’t want a name with too many matches), and offshore indexers were likely to be unfamiliar with it. I didn’t think about it at the time, but it can be challenging to recognize the z and to differentiate o from a.

The exact search on Ancestry.com returned 124 results.
The exact search on FamilySearch.org returned 163 results

Ancestry had seven results that FamilySearch did not, giving a sample size of 170. Four of the Ancestry results did not live in Utah as requested (their 1935 addresses were in Utah). However, the four were in states published by FamilySearch, so I was able to include them in the sample set.

Here are the results:

  Given Name(s) Correct Surname Correct Both Names Correct
Ancestry.com 125 (74%) 150 (88%) 114 (67%)
FamilySearch.org 159 (94%) 167 (98%) 157 (92%)
Both websites wrong 3 (2%) 2 (1%) 4(2%)

As I mentioned, the results for the given name Alonzo were stacked against Ancestry. Ancestry’s keyers made some egregious errors: Alanna, Alenae, Alomo, Aloms, Alorysw, Alorze, Donzo, and Hanzo. Ancestry also had several errors caused by combining Alonzo with a middle initial (Alonzob, Alonzoe, Alonzor, and Alonzos). It made me wonder if one or more of their keyers were not following instructions.

However, the sample set contained a random sampling of surnames, so the results for Ancestry keyers should be given some consideration. Here, Ancestry suffered a 12% error rate.

FamilySearch’s 2% error rate for surnames should be given less consideration, since Utah is FamilySearch Indexing’s “home court.”

It has been said many times that there is value in having more than one index. This test shows that to be true. The FamilySearch index got the full name correct 92% of the time. But if one checks both the FamilySearch and the Ancestry indexes, the success rate goes up to 98%.

 


Notes:

  • Judging the difference between a and o in Alonzo was difficult, so the results for given names should be taken with great caution.
  • Even though I cross checked some values against the 1930 census or other collections, I considered the “correct value” to be what the image indicated, whether that was truly right or wrong. Otherwise it becomes too painful to try to differentiate enumerator error and enumerator handwriting.
  • Where letters were illegible, I ignored them when scoring.

16 comments:

  1. Did you take the time to put in corrections for the incorrect transcriptions on Ancestry?

    ReplyDelete
  2. As a follow-up to Karen's question - if someone else had already entered corrections, did those corrections have an impact on the results returned in your search? And, were the member-entered corrections RIGHT? One of my pet peeves is finding that someone else has entered incorrect alternate information. There is not arbitrator for member corrections.

    ReplyDelete
    Replies
    1. I completely agree with AnnieB that member entered corrections are often incorrect. I suspect that someone is doing this to "complete or find" a missing person in their research. This is especially prevelant in Ancestry.com.

      Delete
  3. Anyone who has used both databases has seen the egregious errors and rate of errors in Ancestry's "Made in China" database . Frankly, I'm surprised it's not higher having seem the number of errors some have encountered within a single search. Since I am a FS indexer I can see that no index will be perfect, all enumerators are not perfect either, it's not an easy job. Still, in the rush to provide a cheap index, volunteer is certainly less expensive than Chinese labor and in today's economy I see it as almost immoral to outsource work that would be better done by paid US workers. I'm a supporter of Ancestry but not of their outsourced indexing and this shows that their public philosophy that outsourced letter by letter transcription without translation is actually better than translation by English speakers falls on it's face. Go Family Search !!

    ReplyDelete
    Replies
    1. True. Sad that Ancestry felt like this was a horse race. I think quality should be job one. I'm not saying that doing it slower would improve off-shore indexing. I AM saying that common surnames like Leslie should not be transcribed as Leclie, that should be easy to spot by anyone that speaks English. Now we get to live with it forever, because even if we submit corrections, the search engines don't seem to find the corrections, just the Leclie's.

      Delete
  4. There didn't seem to be any corrections.

    --The Insider

    ReplyDelete
  5. The protocol used to compare the two indexes show quite clearly the differences in quality.
    What is clear to all of us USING Ancestry's 1940 census index, is that we have to go through a lot of hoops to find OUR target person or family. I may start with the full name, but I'm using a lot of iterations with the minimum 3 letters and wild cards. I'm trying to maximize age and location in the search parameters, and grudgingly accepting I need to page through longer match lists (which have to be copied to a spreadsheet to be filtered or sorted). Metaphone variants are often useless when letters (even first) are substituted or omitted. Spelling variations by the enumerators just add another layer of uncertainty.
    75% of the 2000+ individuals on my search list are in Pennnsylvania or Ohio, so I'm currently stuck using Ancestry only. For various reasons, I didn't waste my time trying to hunt for them without the index. Thankfully, my planning included probable household composition as well as probable location. I'm adding corrections for my benefit as well as others', but wow, there are a lot needed.
    If Ancestry is listening, "Europe" and "USA" in the birth location field does not work for 1940. In searches of earlier censuses, it returns appropriate multiple variants, helpful to distinguish the immigrants from the native-born.

    ReplyDelete
  6. Can anyone figure how, in Ancestry's 1940 renderings, "Ohio" clearly written out, even quite legibly ~printed~, gets converted to "Oklahoma" in the index/extract? This is even worse an error than the still-not-fixed index/extract spell-out of "Iowa" in the 1850 enumeration (substituted for the abbreviation "Ia") which nearly always was actually the abbreviation for **Indiana** at that time. So more than 60,000 residents in 1850 for Indiana were born in Iowa???? Ridiculous on all counts.

    ReplyDelete
    Replies
    1. Exactamentally!!!! KNEW Charles Phillips was born in Ohio and original page CLEARLY says Ohio but index says Oklahoma............ I guessed an isolated error ........ but you're saying that same goof happened often?????? Awful to contemplate!

      Delete
  7. Thanks for spending the time to do this comparison. Your results appear to be consistent with those of Randy Seaver. That and the numerous smaller samples given by your readers suggest that the comment “We are confident that our index, delivered in record time and optimized as it is to work with our proprietary system, provides the best and most powerful 1940 experience on the market,” by ancestry's Todd Jensen is wishful thinking on ancestry's part. It is surely clear to anyone not associated with ancestry (and probably to many who are associated with ancestry) that their indexing quality for the US 1940 census is way below par. It's too bad they can't just admit this and take steps to correct their substandard results. Perhaps they could make a deal to purchase FamilySearch's US 1940 census index.

    ReplyDelete
  8. I haven't checked Ancestry, but for my most unusual surname, Wurts, I was ecstatic to see that it was indexed 100% correctly in every instance in Colorado's Family Search database. That, to me, is a major accomplishment.

    ReplyDelete
  9. There have been several of these articles about the "competitive" indexing efforts. Sine do seem to encourage us to stand up and cheer for one "side" or the "other." Go Ancestry! Go My Heritage! Go FamilySearch--or should I say, Go Tom, Sue, Joe, and yes Go Jose, Chang and Sven, too.

    Genealogy has a deep volunteer base and a long tradition of volunteerism. The valuable FamilySearch indices that it owns and makes available have benefited from this spirit of working together on volunteer projects. Just because an index is made freely available to the consumer today does not mean that index is free of commercial dealings, though.

    One reader wrote, "Sad that Ancestry felt like this was a horse race." and "anyone that speaks English." Another wrote about a "Made in China" database.

    I'm not sure where some of the impressions originate that motivated well meaning souls to post.

    Genealogy is a global community. In my own genealogical journey, I've learned more about the cultures from which my family descends. If the "community project" has contributed to making us less open to others, especially those from other cultures, then I gladly would have waited 6 years for the 1940 census indices.

    I think it's great that there ARE multiple indices. I celebrate every hour of time and nickel and dime that has gone into bringing the 1940 census to our computers.

    GeneJ--Nobody's fan-boy. My close family is multi-cultural, and I'm darn proud of it.

    P.S. Might you update me on changes in the administration of the FamilySearch owned indexes. Have they yet have figured out how to fix "egregious" indexing errors. The last time I submitted corrections to FamilySearch about egregious errors in its indexing, I was told that after all this time, FamilySearch didn't even have a way of keeping track of such errors, much less a protocol for entering corrected information to the index. The note I received a year ago read as follows:

    begin quote
    Thank you for contacting FamilySearch about the error in our records for Lorenzo Preston. We looked at the record you described, and find you are correct.
    FamilySearch does not, at the present time, have the functionality in place to accept corrections or additions to individual database entries without reloading the entire collection.
    However, a future feature is under consideration that would accept corrections or additions to the searchable index, so we encourage you to keep a list of the corrections you feel need to be made. Both the original index and the correction to the index would be searchable, thus preserving the ability to locate original indexes and images as well as the corrected or added patron entries. We appreciate your patience in this matter.
    End quote

    ReplyDelete
  10. thank you for this. i have often noted that i find some things more easily on one search versus another but its nice to see a bit of data to support that. i agree, it's best to have multiple indices - gives the best chances of finding what we're looking for.

    ReplyDelete
  11. I ran a small comparison as well and FS always won the contest.

    ReplyDelete
  12. Dear Ancestry Insider,

    I am a school of information resources and library science graduate student at the University of Arizona and am working on a paper covering this same topic (Ancestry.com Versus FamilySearch Indexing Quality). I plan to find a group of records (i.e. 1870 US Federal Census - one small town) that are on both sites that have been indexed independently and compare the two for errors to see if indexing done by volunteers and indexing done by paid workers affects the quality of the index.

    I called Ancestry.com to get some information, but was unable to find out much from the call center. I am going to call FamilySearch.org as well. I wondered if you have time, would you be able to 1) refer me to papers/research discussing this same topic, 2) tell me a little bit more about how Ancestry.com indexes it's records, and 3) let me know if you know of any one set of records that can be found on both sites and that was indexed independently?

    I appreciate any help on the matter.

    Sincerely,
    Margaret - megches at google dot com

    ReplyDelete
    Replies
    1. Meg,

      The email address you left doesn't work, so I'll reply here.

      1) I don't know of any papers or research. 2) Ancestry.com pays offshore keyers to index their records. I think most form-based records have been done in China where a 3rd party vendor has an established workforce of indexers keying English records for Ancestry.com. 3) That could be a challenge. Ancestry.com and FamilySearch have participated in many record swaps, so many record collections found on both sites are not independent. The WWI and WWII draft indexers might be independent. Try finding a corrected record on Ancestry.com and check the same name of FamilySearch. With any luck it will have been indexed correctly there. If those collections don't pan out, let me know and I can do some poking.

      --The Ancestry Insider

      Delete