Tuesday, July 31, 2012

Ancestry.com Versus FamilySearch Indexing Quality

Ancestry.com vs. FamilySearch Indexing QualityI’ve warned before that no matter what index you use, you’re going to find your relatives misindexed. You have better context than cold indexers. (See the Indexing Illustration in “Indexing Errors: Test, Check the Boxes.”)

To demonstrate the point, I thought I would compare the 1940 U.S. Census Indexes of Ancestry.com and FamilySearch for the state of Utah. I figured FamilySearch’s large Utah indexing workforce would have a big advantage over Ancestry’s offshore workforce. I searched for all people named Alonzo. The name is unusual (because I didn’t want a name with too many matches), and offshore indexers were likely to be unfamiliar with it. I didn’t think about it at the time, but it can be challenging to recognize the z and to differentiate o from a.

The exact search on Ancestry.com returned 124 results.
The exact search on FamilySearch.org returned 163 results

Ancestry had seven results that FamilySearch did not, giving a sample size of 170. Four of the Ancestry results did not live in Utah as requested (their 1935 addresses were in Utah). However, the four were in states published by FamilySearch, so I was able to include them in the sample set.

Here are the results:

  Given Name(s) Correct Surname Correct Both Names Correct
Ancestry.com 125 (74%) 150 (88%) 114 (67%)
FamilySearch.org 159 (94%) 167 (98%) 157 (92%)
Both websites wrong 3 (2%) 2 (1%) 4(2%)

As I mentioned, the results for the given name Alonzo were stacked against Ancestry. Ancestry’s keyers made some egregious errors: Alanna, Alenae, Alomo, Aloms, Alorysw, Alorze, Donzo, and Hanzo. Ancestry also had several errors caused by combining Alonzo with a middle initial (Alonzob, Alonzoe, Alonzor, and Alonzos). It made me wonder if one or more of their keyers were not following instructions.

However, the sample set contained a random sampling of surnames, so the results for Ancestry keyers should be given some consideration. Here, Ancestry suffered a 12% error rate.

FamilySearch’s 2% error rate for surnames should be given less consideration, since Utah is FamilySearch Indexing’s “home court.”

It has been said many times that there is value in having more than one index. This test shows that to be true. The FamilySearch index got the full name correct 92% of the time. But if one checks both the FamilySearch and the Ancestry indexes, the success rate goes up to 98%.

 


Notes:

  • Judging the difference between a and o in Alonzo was difficult, so the results for given names should be taken with great caution.
  • Even though I cross checked some values against the 1930 census or other collections, I considered the “correct value” to be what the image indicated, whether that was truly right or wrong. Otherwise it becomes too painful to try to differentiate enumerator error and enumerator handwriting.
  • Where letters were illegible, I ignored them when scoring.

Monday, July 30, 2012

Monday Mailbox: Olympics

Dear Ancestry Insider,

Do you think the Ancestry Insider might be interested in republishing a blog post?

“In celebration of the London 2012 Olympics starting this week, FamilySearch is pleased to announce…”

http://search.ancestry.com/cgi-bin/sse.dll?indiv=1&MS_AdvCB=1&db=1940usfedcen&rank=1&new=1&MSAV=2&msT=1&gss=angs-

Thank you for your consideration!
Nathan

Dear Nathan,

Sorry. The Insider does not republish other people’s blog posts.

--The Insider

Monday Mailbox: Catalog Corrections

Dear Ancestry Insider,

How does one get changes made in incorrect titles to some Historical Records in the FamilySearch Card Catalog? This affects both the process of ordering useful microfilm and creating accurate source citations.

Signed,
Geolover *

Dear Geolover,

To submit catalog corrections, contact FamilySearch Support or use the Feedback function of FamilySearch.org. The feedback button is located on the right side of the website window.

The options for contacting FamilySearch support can be found at http://contact.familysearch.org. Use Live Chat, send a message, or call. I assume support@familysearch.org still works. I prefer it so I have a copy of my request in my e-mail system.

Signed,
--The Insider

Friday, July 27, 2012

Ancestry.com Laps FamilySearch in Indexing Horse Race


The FamilySearch 1940 census indexing status map for 26 July 2012
While the FamilySearch indexing map looks good,
Ancestry.com has published twice as much.
Amidst growing reports that the Ancestry.com index has large numbers of errors, Ancestry’s release of 12 additional states on Thursday vaulted their position in the horse race to twice that of FamilySearch (as of 11:00 AM, Thursday). Ancestry has published indexes for about 70% of the 1940 U.S. Census compared to FamilySearch’s 35%.

Since my last update, Ancestry has published indexes for these states: Alaska, Arkansas, Idaho, Massachusetts, Minnesota, Missouri, New Mexico, North Dakota, Oklahoma, Rhode Island, South Dakota, and Utah. During the same time period, FamilySearch did not publish any, but did finish indexing Connecticut, Illinois, Kentucky, New York, Pennsylvania, Texas, West Virginia, and Wisconsin.

In terms of number of states, Ancestry has only published six more states than FamilySearch (38 to 32, respectively). The size of their lead is a result of publishing bigger states than FamilySearch. Of the ten biggest states, Ancestry has published seven. FamilySearch has finished indexing six of the top ten states but has only published one.

How quickly it can clear its backlog will decide the winner of this horse race.

Quality

Meanwhile, the question of quality looms large in users’ minds. Many users are reporting problems in the Ancestry index. I contacted Ancestry for comment and got answers to some of your questions.

“We are confident that our index, delivered in record time and optimized as it is to work with our proprietary system,  provides the best and most powerful 1940 experience on the market,” said Todd Jensen. Jensen is senior director of document preservation services at Ancestry.com.

Several of you asked where Ancestry’s keying vendors are located. “We used four vendors to key the 1940 Census,” said Jensen. “Two were located in China and have been involved in Family History record transcription for many years. Another was located in Bangladesh and the fourth in the Philippines.”

While Ancestry doesn’t share details about their quality and audit methods, Jensen calls them “rigorous” and explained the process generally. If the quality tolerance is not met for a batch, it is sent back to the vendor for rework followed by another, separately sampled audit. “We can say that throughout this process we have taken every effort to ensure accuracy by holding our keying partners to high quality thresholds and implementing new and advanced quality assurance processes.”

Ancestry’s search system takes indexing errors into account. “Once batches are passed,” said Jensen, “there is extensive post production work which occurs. Index data is further augmented to maximize its chances of being ‘found’ in a search or through hints. Even names which have difficult handwriting have a chance of being found with our proprietary systems.”

Jensen acknowledged the comparisons being made between Ancestry’s and FamilySearch’s indexes. As reader AnnieB has pointed out, Randy Seaver has done one such comparison. I plan to do my own as soon as time permits. “Whilst we don’t discount such reviews,” said Jensen, “evaluation of indexes of this size is problematic with even large samples being statistically unrepresentative of overall quality.”

Jensen remembers statistics a little differently than I do. Large samples can be quite representative of overall quality. The problem with most reviews—including the one I will do—is that the samples are not random. That, not the size of the 1940 Census, makes it unwise to generalize results to the entire index.

Still, such reviews, as well as your individual experiences, have meaning and value in their own sphere. Leave a comment and tell us what you’ve found. Is anyone having positive experiences with Ancestry’s index? Or dare I ask, negative ones with FamilySearch?

Wednesday, July 25, 2012

BYU 2012 Family History and Genealogy Conference

2012 BYU Family History and Genealogy Conference

In times of distraction we might describe ourselves as being “half there.” I confess that for the first half of the month, I was only “half here.” The posts I made during that time were written before I left for a two week Mediterranean Cruise. But I wasn’t able to escape completely. Out sailing on the Mediterranean I ran into none other than John Best, organizer of the 2012 Brigham Young University (BYU) Family History and Genealogy Conference. I am presenting at the conference and he said he was there to check on my preparations. The conference is scheduled for 31 July to 3 August 2012.

It’s truly a small world after all.

One of the keynote speakers is Rod DeGiulio, director of FamilySearch Data Operations. DeGiulio’s division “manufactures” the record collections found on FamilySearch.org so it should be pretty interesting. Also scheduled is Ron Tanner, product manager for new.FamilySearch and the new Family Tree, slated for public release later “this year.” (Sorry; I couldn’t resist placing the parentheses, given FamilySearch’s intention of releasing the tree to the public every year since… well, since I can remember.) Anyway, Tanner will be filling us in on Family Tree.

If you’re in the Provo area, it may be well worth your while to attend. If you are a family history consultant, you are eligible for a $25 discount on the $180 price. To learn more or to register, visit http://ce.byu.edu/cw/cwgen/.

Tuesday, July 24, 2012

Fold3 Discount for NGS Members

The National Genealogical SocietyThe National Genealogical Society (NGS) recently announced a discount to its members on the purchase of a Fold3 membership. Fold3 is a subscription website owned by Ancestry.com. According to the announcement, “Fold3 features over 94 million historical records from US institutions including the National Archives. Military records, naturalization records, and city directories are just a few of the different collections found on Fold3.”

NGS members can subscribe for the special price of $39.95, which is half the normal price. Additionally, Ancestry.com will donate 30% of the sale back to NGS.

To obtain the discount, members should go to http://www.ngsgenealogy.org/cs/fold3 after logging into the www.ngsgenealogy.org website.

Read the full text of the announcement on the NGS website.

Monday, July 23, 2012

Monday Mailbox: Ancestry.com Indexing Inaccuracy

Dear Ancestry Insider,

I hope that the accuracy rate for the FamilySearch 1940 Census Indexing is better than what I have experienced with Ancestry.com.

Today looked up about 20 families in Michigan and found errors on five of them including wrong places. One was listed in Briley, Montmorency Co, Michigan and should have been Evergreen Twp, Montcalm Co, Michigan. Another had Pine Twp, Montcalm Co and should have been Day Twp. They had the name Meek for Mark and Lealia for a man named Leslie and Shurn instead of Shurr. All were clearly legible in their correct form.

Happy they are being speedy, but I'd rather have more time taken to proofread the transcriptions...

Cherie
Orange County, California

Dear Cherie,

Your experience echoes that of many. See for examples, comments made to my article last Thursday.

Signed,
--The Insider

Friday, July 20, 2012

Kehrer Webinar: Future of the New Catalog

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.”


Robert Kehrer presented a glimpse of future catalog features.

image

(Genealogists have a litmus test which we use to determine if a product manager understands the old catalog. Someone who really understands the old catalog knows the importance of showing for a place, what place it is within and what places are within it. From the “Upcoming Features” slides, I can see that Kehrer understand the old catalog.)

When asked the time frame for these new features, he said he can’t really give a date. (These things are very dependent on competing features, competing products, backlogs, and bug fixes. There are lots of good reasons why a product manager can not give an expected release date.)

 

This concludes my series on Kehrer’s webinar. For more information, see the entire webinar.

Thursday, July 19, 2012

Ancestry.com Takes Lead in 1940 Census Race

Ancestry.com 1940 Indexing Status Map for 16 July 2012For Ancestry.com, Friday the 13th was a lucky day. With the release on that day of indexes for 15 additional states, Ancestry took the lead in the 1940 U.S. Census horse race.

“We are working hard here at Ancestry.com to bring the 1940 US Census to the public,” said Ancestry spokesperson, Matthew Deighton, “and are now very well ahead of schedule from our initial completion predictions.”

Ancestry has published indexes for 26 states, accounting for 2.1 million pages, or 55% of the entire census.

Since my last update, and as of 18 July 2012, the FamilySearch coalition released the District of Columbia (Washington, DC), Minnesota, and Rhode Island. This brings their state count to 32 and their page count to 1.3 million. That is 35%, 20 points behind Ancestry.

MyHeritage is still working on their second state, New York. According to calculations based on record count, they have published just 7% of the census.

Ancestry’s move into the lead is unexpected considering FamilySearch’s commanding lead in indexing where it has finished 3.4 million pages. One FamilySearch manager used a vending machine analogy to explain how FamilySearch can be so far ahead in indexing but behind in publication: You can have enough coins to buy a coke, but if you divide them among multiple vending machines, you’ll end up thirsty.

FamilySearch indexers have indexed considerable amounts of states that aren’t yet 100%. That work is reflected in indexing totals but won’t be reflected in publication totals until the states are complete.

This raises the question as to whether or not Ancestry can retain its lead. Have they increased their indexing capacity? Or is this a temporary situation that the FamilySearch juggernaut will overwhelm?

Stay tuned…

Wednesday, July 18, 2012

Kehrer Webinar: New FamilySearch Catalog

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.” Look for my comments in parentheses.


The holdings listed in the Family History Library Catalog (FHLC) are expanding beyond those included at the Family History Library in Salt Lake City. So it is only natural that the new catalog is called the FamilySearch Catalog.

The new catalog is still called a beta. There are key features missing. “If you don’t need those missing features,” said Robert Kehrer, “then I highly recommend you use the new catalog.” The data is the same between the old and the new. And some things are better in the new catalog.

FamilySearch has tried to optimized the number of pages you must pass through to get what you want. (I care less about the number of pages and more about the number of mouse clicks.) Kehrer demonstrated the page savings. What used to take five or more pages is now done in two. (Here are my measurements, including clicks and keystrokes.)

Action Old Catalog New Catalog
Select search type (Place name) 1 click
1 page
2 clicks
0 pages
Select location
(Texas)
5 keystrokes (depending)
2 clicks
2 pages
3 keystrokes (depending)
2 clicks
1 page
Select record type (Vital records) 1 click (minimum)
1 page (minimum)
1 click (minimum)
0 pages
Select title
(Texas death records)
1 click (minimum)
1 page (minimum)
1 click (minimum)
1 page
TOTAL 5 keystrokes (depending)
5 clicks (minimum)
5 pages (minimum)
3 keystrokes (depending)
6 clicks (minimum)
2 pages

What used to take five (or more) pages now is done in two. It is not necessary to type in the entire place name. As you type, a drop down list presents matching names. As soon as your desired place shows up, click on it.

(The old catalog has a slight edge on the minimum number of mouse clicks. However, the actual number of clicks is highly dependent on the amount of scrolling needed through long lists of subjects and titles. What the new catalog gains in reduced page count it loses in the long length of the pages.)

View the webinar to see a demonstration of the new catalog, or to get more information about all the features I’ve mentioned in this series of articles.

Tuesday, July 17, 2012

Kehrer Webinar: Refine Your Search Results

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.”


When last we left you, Robert Kehrer had just done a parent search for father Samuel Martin and mother Lovina in Vermont.

imageAlong the edge of the window, to the left of the search results, is the Refine Form. Here you can refine your search terms.

“Below the Refine Form we have some filters which are tremendously powerful,” said Kehrer.

One feature offered by filters is result counts. Clicking on a filter name reveals a list that indicates result counts. For example, after the parent search above, click on collections. The little “fly out” square indicates there are 33 results from collections that record births, marriages, and deaths. It also indicates there are 11 results and census collections.

Clicking on Births, Marriages and Deaths immediately filters the results to those of that collection type.

Clicking on the resulting collection type, activates another fly out which can be used to further filter the results. For collection types, individual collections are listed. In the example above, Kehrer noticed that in addition to eleven children in Vermont, there are an additional two in Wisconsin.

Filtering can be performed on a variety of things like birth, marriage, and death information, as shown in the image to the left.

 

Stay tuned…

Thursday, July 12, 2012

Kehrer Webinar: Parent Search

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.”


Robert Kehrer said, “One thing that people often ask is, ‘How do I do a parent search?’” With a parent search, you enter the names of the parents to get search results with all the children.

To conduct a parent search, start by clicking the word “Parents” on the home page.

To perform a parent search, click the word Parents

This opens an area on the search form where you can enter the parents’ names.

Enter parents names to perform a parent search

Kehrer searched for father Samuel Martin and mother Lovina in Vermont. The results return their children, with birth dates spaced as expected.

Stay tuned…

Tuesday, July 10, 2012

Kehrer Webinar: Wildcards and Exact Search

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.”


Robert Kehrer started his search demonstration by searching for his great grandfather, Franklin Bernard Allor. In the live presentation he demonstrated use of a wildcard. For the first name he typed in “Frank*”. Notice the asterisk on the end. This will match Frank, Franklin, Frankie, and so forth.

FamilySearch’s search system also employs a name matching system, matching Allor equivalent names like Euellar, Ellare, Ehelhar, and so forth.

Kehrer demonstrated how to turn this matching off. The box next to the name turn on “exact matching.” WIth the box checked, “Allor” is the only name that will match. Searching the 1910 census for “Franklin Bernard Allor,” without the box checked, returns 46 results. With the exact match box checked, searching returns four results and the top result is Kehrer’s ancestor.

The box next to a name turns on Exact Search

Stay tuned…

Friday, July 6, 2012

Mr. Darnedest Street

Records say the darnedest things

We depend upon records to reveal the “truth” about our pasts.

Yet sometimes records have anomalies. Some are amusing or humorous. Some are interesting or weird. Some are peculiar or suspicious. Some are infuriating, even downright laughable.

Yes, Records say the Darnedest Things.”

Records Say the Darnedest Things: Mr. Darnedest Street

It is exciting to see indexes to the 1940 census coming online. It is also funny to see therein the names that uncaring parents give or unlucky brides inherit. Elmer Zink shared one on the FamilySearch blog that some may find hilarious and some may find horrific.

Mind you, each record is indexed twice. If the two indexers differ, then the record is sent to a third person, an arbiter, who can change any value in the batch to anything he chooses.

Two bad indexers or one bad arbiter, can do a lot of damage.

Observe:

image

I’m actually 2nd cousins with the Streets, so I know a little about this crazy family. He was named Here at the hospital on Main Street, after his four grandparents. His Main line ends End, but not Here. Got that? A dead End left his grandmother a widow. To hear Here, you’d think Here shares his name here with the numerically named 500 great-grandparents on his mother’s side. They were poor and lived on the Streets. In dire Streets, the Ends justified the Mains.

Don’t even ask who was on second…

Thursday, July 5, 2012

Kehrer Webinar: International Genealogical Index

I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.” Look for my commentary in parentheses.


The International Genealogical Index (IGI) mixes two types of records: community indexed records and community contributed records (formerly called patron submissions). There are about 461 million names in the former and 205 million names in the latter.

On the new website (which I call “the current website” to avoid confusion with the “new FamilySearch” website—boy won’t we be glad when new.familysearch.org goes away) on the new website the two types of records are separated. (I’m very glad to see that. The evidentiary value of the two types is too different to leave together in one database.)

IGI Graph from Untitled 03 - left only

The community indexed records, also called extracted records, have been divided into 185 historical record collections and are available alongside the 1005 other historical record collections on FamilySearch.org.

For compatibility with the old website, it is possible to search just the 185 extracted IGI collections. In the record collection list, scroll down to the special “International Genealogical Index” collection.

Kehrer questioned why anyone would want to search just the 185 collections instead of all 1190 collections. “I recommend strongly that anyone searching for their ancestor search not just the IGI,” he said. Search all historical record collections from the FamilySearch.org home page.

In the transition to the new website (no animals were harmed and) no records were “lost in transition.” (Actually, the community contributed records have been temporarily lost.) The community contributed records will be available shortly.

From the special IGI collection page, it will soon be possible to search either the extracted IGI records or the community contributed IGI records.

IGI Graph from Untitled 05

Stay tuned for more from Robert Kehrer’s webinar…

Wednesday, July 4, 2012

Census Indexing Horse Race Update for 30 June 2012

The big news this week is that Ancestry.com has released six new states!

Pennsylvania 9,900,180 records
Ohio 6,907,612 records
Tennessee 2,915,841 records
Virginia 2,677,773 records
Colorado 1,123,296 records
Vermont 359,231 records

“These states will join four other searchable states and Washington, D.C,” said Ancestry.com spokesperson, Matthew Deighton. The previously searchable states are Delaware, Maine, Nevada, and (the biggy) New York. These eleven make up more than 39 million records of the 132 million total records.

Ancestry.com 1940 Indexing Status Map for 30 June 2012

According to Deighton, Ancestry.com will finish the project this year and the records will remain free through 2013 on Ancestry.com.

Tuesday, July 3, 2012

Features of the FamilySearch Collection List

This week I’m sharing some of my notes from the 21 June 2012 webinar by Robert Kehrer, senior product manager, search technologies. The webinar was titled “FamilySearch Historical Records and Library Catalog.”

I’ve diagrammed some of the features Robert Kehrer showed on the collection list page.

Features of the FamilySearch Collection List Page

Stay tuned…

Monday, July 2, 2012

Monday Mailbox: Fantastic Webinar

Dear Ancestry Insider,

A fantastic webinar a couple of weeks ago revealed this gem: both old and new catalogs now access the same database, and thus there is no difference in frequency of update.

The webinar was made available live via the FamilySearch Blog and has now been made available for viewing at your pleasure at http://broadcast.lds.org/eLearning/fhd/Community/en/FamilySearch/Product_Webinars/Robert_Kehrer_-_June_21_2012/Player.html. In it Robert Kehrer also gave some good information on some in-process changes at FamilySearch! Its a "must watch" for all serious users of FamilySearch; well worth the hour it takes. I highly recommend it!

And perhaps, if you are one of those folks who have been continuing to use the old catalog for all your work, it will motivate you to give the new catalog another look.

Signed,
Mike St. Clair*

Dear Mike,

I too enjoyed the webinar very much. Robert Kehrer is one of my favorite product managers. I echo your endorsement. Actually, he’s redone the webinar. (He showed his own tree, with information about his immediate family, in the original broadcast. I recommended he avoid that.)

For those that can’t budget an hour, this week I’ll share with you some of my notes. Here’s the first snippet:

Historical Record Collections Update

FamilySearch now has

  • 1190 collections
  • 800 million images
  • 1.3 billion records
  • 2.875 billion names

Stay tuned…

 

Remember: Letters to the Insider are edited for length, style, and content.