Sunday, February 3, 2008

Unbelievable Name Count Claims

What are we to understand by genealogy vendors name count claims? When claims "872,278,874 Names in 5,389 Databases," aren't we led to believe these are counts of people names? But claims 337,484 names in Lippincott's Gazetteer of the World, 1895. This is a gazetteer! Yeah, yeah; many places are named after people. But the database information notes there are only 125,000 place names on 2,895 pages. The claim of 337,484 names amounts to, on average, 2.7 names per place and 117 names per page! claims 2,112 in Lippincott’s Gazetteer of the World, 1913. A little experimentation shows the book has grown to 2115 pages, which means Ancestry claims about 1 name per page.

Page 1000 from Lippincott, 1913

I picked a page from the 1913 edition at random to examine. Page 1000 is about half-way through the book. (Click on the adjacent image to see it yourself.) I pulled it up and started looking for names. I ignored people-place names such as Baltimore, St. Louis and Clay County. The only name I found was Albus Dumbledore. Oops, Albertus Magnus. A sample of one is hardly scientific, but I find the claim of 117 names per page in a gazetteer quite incredulous.

I don't think is all alone. I know Ancestry has some isolated problems as well. I call upon genealogy vendors to provide transparency with published name counts. Consumers have a right to know when counts are exact and when they are estimates. Vendors should disclose basic definitions and methodologies. Only transparency will provide consumers the information necessary to make intelligent purchase decisions with their limited funds in an increasingly competitive market.


  1. This is a good point. I found similar exagerated claims for the number of pages available on a digitized newspaper website.

  2. The real underlying issue here is whether such name counts, even if fairly accurate, are even useful to evaluating the offerings of a provider, and if so to what degree.

    World Connect trees on Rootsweb, newspaper databases, and any databases that contain lots of records for the same year or a close range of years like a consecutive run of city directories, are simply going to be bloated with repetitious entries.

    I myself would rather evaluate the likely value of a product I am contemplating purchasing/subscribing to, on the basis of coverage of geographical areas I am interested in.

    The latter issue, extent of geographical coverage, is perhaps something else you could speak to in the future. It seems the strategy of Ancestry and other providers is to have something for every state, instead of concentrating on providing a lot of content on a smaller group of states before moving on to others. But I wonder if it wouldn't be more effective marketing wise to first really provide some in depth coverage a few states at a time, and really be able to target potential customers in those areas. I realize that a lot depends on the availability of records from repositories, but what good is it to we customers if a newspaper database covers one city in every state, but not the city in my state I need?


  3. But I wonder if it wouldn't be more effective marketing wise to first really provide some in depth coverage a few states at a time, and really be able to target potential customers in those areas.

    Certainly not. I believe both marketing and business-wise reaching for the whole country is a better way to do it, even if records for some locations are sporadical. The customer base is simply larger from "day one".

  4. Radix,

    The question isn't one of whether you reach for the whole country, but the order in which you do it. While sporadic coverage might be a neat marketing trick to have a larger customer base from day one (or today after so many years), that only works if customers either have an interest that the sporadic coverage fulfills, or if they aren't astute enough to figure out that the coverage is only sporadic.

    There is also another sub-issue which is important. And that is that some states are more important than others. The original colonies plus Kentucky and Tennessee, are where most lines will reach back to if your ancestors weren't late immigrants. And then there would be a couple more tiers below.

    I am of course assuming that customers are back far enough for that to be important. If they are still stuck in the 1900s in most of their lines, then what I said wouldn't be as relevant.


  5. Mike,

    I think your point is especially relevant to newspaper coverage. A case in point is the Utah newspaper, the Salt Lake Tribune. From a business and marketing perspective, it gives the impression of fair coverage for the state. Closer examination shows that Ancestry's collection consists of 2 issues. It's little wonder that Joe Public is left with little trust in Corporate America.

    -- The Ancestry Insider

  6. Dear jdr,

    Thanks for your comment.

    -- The A.I.

  7. Oh, I forgot to tell Mike and Radix that lately Ancestry has approached the topic of coverage by addressing record types. For example, last year you saw lots of military records, state censuses, maps and African-American records. This year the company intends to concentrate on vital records while it continues to expand its immigration and military collections.

    -- The Ancestry Insider

  8. Insider,

    Thanks for your responses. Regarding Ancestry's strategy of concentrating each year on selected record groups, such as on national or statewide record groups that will appeal to the broadest customer base, that also makes sense. But that is only so IMO if those additional databases are complete for a given timeframe.

    Regarding military for example, I asked a question in a message board thread here:
    about the apparent discrepancy between the depth of the Civil War databases provided by Ancestry from a 3rd party provider, whose own website seems to indicate that Ancestry only has gotten half of what is available. Maybe there is some of explanation in this particular case, but naturally in keeping with Ancesty's tacit policy of refusing to answer such questions on a forum dedicated to such, I never got a reply.

    Regarding that survey, I have participated in that one and a more recent one. One thing I mentioned in my suggestions in one of them was that a good, and I would think moderately inexpensive, class of databases that Ancestry could offer and which would be hugely helpful, is indexes to land and other such records at the county level. A lot of the time there are already such on microfilm for the earliest years if not to the present which could be used instead of making an original index. Although you wouldn't be able to get the actual record itself, just knowing that such existed could be a *huge* help when you don't know all the places an ancestor formerly lived. At least you would know that you could get such a record in a county courthouse yourself.

    This brings to mind a larger issue which perhaps you could address in a future blog entry, which is the response to Family Search's request last year for commercial firms to give them proposals on serving up digitized records. I myself am somewhat dubious that a commercial entity can serve up monster amounts of county level records like complete deed books in an affordable manner. However perhaps that isn't so. So I was wondering if the commercial firms like Ancestry and others are in fact interested in serving up a broad array of such records online, or only in cherry-picking the ones most likely to appeal to the broadest base of customers.


  9. Insider and all,

    I agree it's a more competitive field these days. We see more and more alliances, agreements, etc - btw those who have content and those who wish to digitize, index, publish, distribute, etc - in both free and commercial arenas. There's room for all and I welcome having options. You're so right though, that it begs the question: how do we know who really has what so we can spend both time and dollars most effectively? (case in point - your name counts)

    Re: strategies (record types, etc)?

    Personally, I'd like to know what's planned - then - if it's going to be finished or not. A good ex are the state census records you mentioned. I kept checking, but the NY state census collection hasn't moved in a long time now. Will it be?

    General comment - whether Ancestry, FamilySearch, WorldVitalREcords, Footnote, etc .. they all tell us what exists in one degree or another. As new things are added, I'd like to see lists with percent finished or some such. If they don't intend to do a full "set" - say that too. Say xxxx records in yyyy location for years zzzz. Might agree or disagree on strategy directions, but most of all - I want to clearly understand what's coming and not keep checking and wondering. :)

    Enjoy your blog. Glad you're here. ;)


  10. You make a great point, Ancestry Insider, and it's one that we can and will address.

    With the large number of databases that we put online in the last year, some we have an exact name count for, because we have the full name index, and others, like newspapers we calculate the name count based on the names-per-page on a sample dataset. But clearly, we goofed by using a name-per-page estimate with the Lippincott Gazetteer.

    With more than 5,000 databases online and thousands more coming, we haven't had the time or man-power to be perfectly accurate in our name estimations or in our database descriptions.

    Fortunately, we are having a major upgrade tomorrow in our company's genealogical and editorial expertise when Matt Wright, senior book editor at Ancestry for 8 1/2 years joins World Vital Records. He will, in fact, take responsibility for our database descriptions, name counts, and all of our other editorial content.

    Since you pointed out the silliness of our gazetteer namecount, we will address this immediately.

    Also, with our World Collection, which launched yesterday, we have done what one of your readers suggested, and are making it clear what percentage of each database is actually online, since many projects happen in stages. So you can see, for example, that we are currently 12% complete in putting the GPC content online and 33% of the Archive CD Books Australia is online.

  11. I forgot to sign my name on the above comment.

    Paul Allen
    CEO, /

  12. Mike, Deb and Paul,

    Thank you for your excellent comments. They deserve a longer response than I have time for at the moment. I'll try and get to them soon. Paul, the fact that you responded, and did so personally, speaks well of your understanding of the digital marketplace. Again, thank you all.

    -- The Ancestry Insider


Note: Only a member of this blog may post a comment.