Monday, November 16, 2015

Monday Mailbox: What is a Record?

The Ancestry Insider's Monday MailboxDear Ancestry Insider,

When Ancestry or FamilySearch says they added "100 million new 'records'" what are they really describing?

As an example, you have one census sheet.  It has six households, of twenty-four rows of names, with eight columns of personal background for a total of one hundred and ninety-two cells of raw data.

So does that census sheet represent 1, 6, 24, or 192 "records" according to Ancestry and FamilySearch?

Signed,
amiable 160

Dear amiable 160,

You’ve hit upon an amazingly complex question and a particularly confusing case.

When FamilySearch first published its U.S. census collections, the published record counts (found on https://familysearch.org/search/collection/list) were quite a bit smaller than its competitors. That raised questions as to whether they had published the entire census, or were they still indexing it, or had they accidentally missed some of the microfilms. A comparison of the record counts showed they were similar to the record counts of the old Ancestry.com censuses from way back when Ancestry published just the names of the heads-of-households. So had FamilySearch published just the heads of households? I helped index for FamilySearch, so I was very much aware that we had indexed all the names.

Apparently, FamilySearch defined a census record as a household, while Ancestry defined it as a single row.

Well, overnight the FamilySearch numbers all jumped up to the same neighborhood as Ancestry’s. For your example, the answer is 24—if every row is used. As a side question, why are the record counts on the two sites not exactly the same? How does a website go about losing persons? I can only imagine that both organization do it. If so, there are some records on each that are not present on the other.

Another counting anomaly practiced by FamilySearch is that when they announce the total size of their collection the number is about two billion larger than the number arrived at by adding up all the record counts on the collection list. You have to listen very closely to what they say to understand the discrepancy. When they want the size of their collection to sound big, they announce the number of names, not the number of records.

I’ve spoken before about my dislike for name counts. (See “Unbelievable Name Count Claims” and “Name counts in table-style databases” for examples.) Name counts can miscommunicate in so many ways. After several of my editorials against Ancestry’s use of name counts, Ancestry stopped using them prior to going public. Bravo, Ancestry.

Someone recently noticed that their record counts for their “Select” series of databases obtained from FamilySearch are quite a bit higher than the record counts reported by FamilySearch. Presumably, they have reverted to name counts for these databases. Or perhaps Ancestry or FamilySearch created a database record for each indexed name.

That brings me back to your question. What is a record?

I’ve explained before how Ancestry defines them. It varies by collection type. See “What is a record?” for details.

For FamilySearch, you must read the announcement wording carefully. The most recent example is “New FamilySearch Collections Update: November 9, 2015.” It has a column titled “indexed records” and one titled “digital records.” I believe the latter is actually images of genealogical records. In early announcements, that is how that column was titled. (See an example on the FamilySearch website.) An image is a digital record. But it isn’t a record of a single genealogical event. Sometimes there are multiple records on an image (such as two marriage licenses on a page).

I don’t know for certain what “indexed records” on that announcement means, but I have a theory. FamilySearch’s announcement for the week of 13 July 2015 (see “New FamilySearch Collections: Week of July 13, 2015” on Dick Eastman’s blog), stated they added 47 million indexed records to “United States GenealogyBank Obituaries 1980-2014” collection. Yet the collection list states that the collection has just 16 million records. I don’t have internal knowledge explaining the discrepancy, but I think it lies around the odd way that obituaries are indexed. When you index an obituary, you index each name in a separate row, which I think results in a separate database record. My theory is that the number on the announcement is just what it says it is, “indexed records,” while the number on the collection list is what you would expect it to be: number of obituaries.

Short question. Long answer.

Signed,
--tai

2 comments:

  1. My immediate thought on this is, "Who cares what the actual number is?" Of course the genealogical web sites care because people may compare the numbers from different sites and conclude that the one with the greatest number is the best, or at least the most complete. But as you point out in your article, the way the counts are made is not consistent from site to site or even among different types of records on a single site. Thus comparisons are not really a reliable indicator of anything of significance. When I see those numbers, it indicates to me only that there are lots of records available, I don't really care if a census image is counted as 1 record or 24 records or any other number. What I care about it that the information from the census is available for research.

    ReplyDelete
  2. Bravo! This is important. I want to spend my time researching databases where I might find my ancestors. I want to spend my dollars where it pays off. Thanks for exposing the marketing strategies we all suspected were behind the numbers. Can't blame 'em for tryin'.

    ReplyDelete