The Ancestry Insider: Indexing Errors: Test, Check the Boxes

Wednesday, November 17, 2010

Indexing Errors: Test, Check the Boxes

Last week I talked about Elizabeth Shown Mills's lecture on boxes we trap ourselves in. And I asked that everyone come prepared today to work more boxes.

Did everyone remember to bring your #2 pencil? Good; I’m glad you remembered…

…that you didn’t need to bring one. (Marking a computer screen with a pencil… Well, that’s just silly.)

Remember that Elizabeth Shown Mills illustrated a point in her class with two individuals with the same name, living in the same place, at the same time. When she mentioned that the two were both listed in the census, I opened up Ancestry.com to see for myself. It was a little difficult to find them because…

BOTH WERE MISINEXED!

Sorry; I didn’t mean to shout. But it just seems like every time I search I find indexing errors.

Then it occurred to me that this would make a good test case. Are the Ancestry.com indexes inferior because they were done by non-English speakers? Will the FamilySearch volunteer indexers do a better job?

The problem may not be non-English indexers. Another possibility to consider is that reading a record cold is not nearly as easy as targeted searching. Contrast the indexer who comes at a record cold with the searcher who examines the record armed with information about the target individual and family members. The targeted searcher has the liberty to ask, "with so many other legible bits of information matching my guy, is the shape of that miserable ink blot—masquerading as handwriting—consistent with the name I am looking for?"

Indexing Illustration

Consider the following illustration. Try to cold-index the following eight names, written by an enumerator who has the worst handwriting in the entire world. I’ll publish the answers tomorrow.

Now try targeted searching. Here’s the context:

A long-lived census employee has enumerated the White House for over 200 years, enumerating presidents from George Washington to Barak Obama. This sample shows eight of the better known presidents.

After writing each character, he drew a box around it and colored it in—perhaps a misguided attempt at security. As you check the boxes, notice some letters descend below the base line (like g, j, p, …), and some ascend higher than others (b, d, f, …). It is really easy to pick out dotted letters (i and j).

Check the boxes again and see how many you can read—despite the atrocious handwriting.

This illustration (hopefully) shows why cold indexers can not match your ability to read the names of your ancestors.

Can FamilySearch indexers do a better job than Ancestry.com indexers? Is the cold indexing handicap sufficient to account for the problems in Ancestry.com’s indexes? Or does the language of the indexer also affect the quality?

What if a native English speaker in Uganda that had never learned anything about U.S. presidents tried the illustration? Perhaps the problem with offshore indexing is not one of language but of historical and cultural knowledge.

Back to My Test Case

That brings us back to my little test case. I didn’t tell you the misindexed name from Elizabeth Shown Mills lecture because I don’t want anyone entering the correction because I think FamilySearch is incorporating corrections on Ancestry.com into their indexes. After FamilySearch publishes the relevant index, I’ll check and see if they did any better.

Stay tuned…

9 comments:

TessNovember 17, 2010 at 6:16 AM
The example is illegible--printed by Ugandans? BTW, I work with a Nigerian. He British taught English is better than mine. So is the British taught English of my husband's Indian university department chairman. The problem is coming in cold to index.
ReplyDelete
Replies
JohnNovember 17, 2010 at 7:10 AM
I'm more than a little confused. Is there a joke I'm missing? All I can see in the figure readers are supposed to cold index, are solid black boxes. Talk about bad handwriting!

And where are the examples from the long-lived census employee? What am I missing here?
ReplyDelete
Replies
AnonymousNovember 17, 2010 at 9:35 AM
I believe that the indexer's language has a great impact on the quality of the index. My example is a very complete set of parish registers from Quebec commonly referred to as the Drouhin, which has been indexed on Ancestry. The index is so bad that at least half the records cannot be found through the index. Even worse is the census. Fortunately, 7 million Quebecois descend from only 2600 families, so most of the records can be accessed through family trees. Although these are usually unsourced, they give hints as to date and place, which allow discovery by browsing. Once one finds the record, the indexed name is often totally unrecognizable. It is clear that the indexers had no sense at all about French Canadian names.
ReplyDelete
Replies
The Ancestry InsiderNovember 17, 2010 at 10:57 AM
Dear John,

No joke is intended. Consider the black boxes to be worst case ink splotches. You may be amazed, but many of your fellow readers will pull names of presidents from them.

HINT

If you don't want any hints, skip the rest of this comment.

Remember, the exercise is an illustration of targeted search. Pick the name of the best known president. Write it down as it would appear printed in a book. Compare it to each line of blotches until you find a match. You KNOW that the name is one of those listed. All you have to do is pick the line that best matches the printed name. Repeat for the president you feel is next best known. Continue until you have "indexed" every name.

Have fun. Look for the answer tomorrow.

-- The Insider
ReplyDelete
Replies
AnonymousNovember 17, 2010 at 12:51 PM
Got all 8. Had to look up a list of presidents for a clue on #7, but the rest I could do just by thinking about either famous or more recent presidents.

I won't spoil the fun by publishing the list of 8 though.

Bonnie M.
ReplyDelete
Replies
JohnNovember 18, 2010 at 10:28 AM
Ah, I see! Well, you were just a bit to subtle for me. Very clever. Thanks for pointing out what I was missing. Now it all makes sense.
ReplyDelete
Replies
LefflerResearcherNovember 19, 2010 at 10:49 AM
It always helps to be familiar with the names you are indexing - the name I often look for is Leffler and I know the indexer may have written it a dozen ways - Including Sefler, Tuffner (to name a couple I have found). I feel it is very important to put in a correction when I find an error and I have many on the ancestry.com census. This is especially true when I really had to dig to find what I was looking for. Obviously, someone whose first language was not English could have problems with the English, but even more of a problem with lack of familiarity.
LefflerResearcher
ReplyDelete
Replies
JoNovember 22, 2010 at 2:34 AM
I think familiarity with the language and local names and places is important. I'd like to volunteer to do some indexing, but am waiting till a Scottish project comes up as I don't think my indexing would be nearly as good for an unfamiliar area.
ReplyDelete
Replies
ShelinaNovember 26, 2010 at 12:52 PM
The biggest part of the problem is the atrocious handwriting. The indexers do the best they can. Having someone familiar with the names of the regions would certainly help, but I have also seen that backfire, where an indexer will type in what they think it is at first glance, when if you look at each letter carefully, it is clearly another name or spelling.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Biography

The Ancestry Insider was a readers’ choice for the top four genealogy news and resources blogs, part of Family Tree Magazine’s “40 Best Genealogy Blogs” for 2010. He reports on the two big genealogy organizations, Ancestry.com and FamilySearch. He was named a “Most Popular Genealogy Blogs” by ProGenealogists, and has received Family Tree Magazine’s “101 Best Web Sites” award every year since 2008. A genealogical technologist, the Insider has a post-graduate technology degree and holds a dozen technology patents in the United States and abroad. He has done genealogy since 1972 and has worked in the computer industry since 1978. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Subscribe by Email