Tuesday, May 20, 2008

Are Incomplete Databases Ancestry's Policy?

Ancestry Insider reader, Mike, posted a question recently and I thought you might all like to hear the answer.

I have a question about a comment you made where you said, "Ancestry's practice is to release at least one database every workday". Obviously that is a marketing driven thing, but my question is how significant it really is. That is, is a brand new database released every day, or do "updated" databases count too? And if the answer to that question is yes, then as a marketing scheme, is it Ancestry's policy to intentionally release incomplete databases so as to be able to tout updates?

Don't misunderstand me.

While it's Ancestry's practice to release a new database everyday, it's not a policy, to my knowledge. I've never seen Ancestry claim they do so. I wouldn't be shocked if they've missed one or more days. I wouldn't be surprised if there are high ranking managers that don't know that it occurs. It just happens. Call it corporate memory. Call it bureaucracy. Call it a legacy, maybe even a tribute, to Paul Allen. It survives as a practice of a former policy.

The practice, as I've observed it, is to release a new database each business day. (Take a look at my list for the last 60 days and let me know if I'm wrong.) When you release thousands of new databases each year, it's not difficult to schedule 250 of them to cover each business day of the year.

Why Does Ancestry Release Incomplete Databases?

It is almost always more expensive when Ancestry posts a database piecemeal. So if Ancestry doesn't need to release incomplete databases so it can tout updates, why update databases or release incomplete databases? There are several situations where databases are updated or incomplete databases are released:

  • Additional records regularly become available. This is the case for the SSDI and can happen for vital records where states annually release records of a legislated age.
  • Additional records become available from the original data source. For example, a national archive microfilms additional records in a series. I believe last week's update to California Passenger and Crew Lists, 1893-1957 is an example of this happening.
  • An important database is so large that it will take weeks, months or even years to complete. A U.S. Federal census is an example. The 30-April update of the U.S. School Yearbooks is an example.
  • Portions of a database are coming from different sources. In the case of state censuses, this might happen when different years and counties are coming from individual counties, multiple university libraries or private vs. public historical organization.
  • Source media for a database are entering Ancestry's digital factory at widely spaced times. This might happen when Ancestry places a large microfilm order that overwhelms an institution's capacity to speedily duplicate all the films ordered. This can also happen when problems in Ancestry's production process cause part of a job to be sent back for rework at an earlier factory stage.
  • Ancestry feels that posting the images for a database before the creation of an index gives the customer enough value to warrant the extra costs. The Canadian Drouin Collection is an example where this occurred.
  • Ancestry is performing maintenance (fixing problems) in a database. I'm guessing that was the situation with the 30-April update of the 1851 and 1871 England Censuses.
  • Ancestry is combining 2 or more closely related databases. If I recall correctly, an example is the
  • Florida Marriage Collection, 1822-1875 and 1927-2001, which is a combination of a database for 1822-1875 and a database for 1927-2001.

Databases can remain incomplete when:

  • Historical records have been lost. If you scroll down to the bottom of the 1790 Census, database page and select "Click Here", you'll see that the returns for two states have been lost! Many censuses taken by individual states have been lost as well.
  • Agreements can't be reached with some of the record custodians.
  • Production costs are too great for a portion of a record set. For example, some of the records might be index cards that can be scanned with auto-feed scanners while another part is a solid clump of water-damaged, irregular sized manuscripts.

As you can see, there are many reasons why Ancestry releases incomplete databases or updates databases. Rest assured that tricking you is not one of them.

3 comments:

  1. Insider,

    Thanks again for a thorough response. However I still have a couple bones to pick :). While I understand the difficulty with some sub-sets of various record groups, as in more fragile ones that don't lend themselves to auto page turners, and thus necessitating hand turning and focusing for every page, I would hope that Ancestry would nonetheless make the effort later when such sub-sets are a small part of the overall database or class of records.

    Also if that cannot be done for whatever reason, then I think Ancestry should state that in the info on the database. As an example I mentioned in an earlier post, Sullivan County, Tennessee's marriage records are not included in the TN marriage database. If there is either no intention for financial reasons, or an unlikely to change difficulty like inability to reach an agreement with records custodians for same, then stating that explicitly and the general reason would be helpful. And if there is an intention to later add such a sub-set, then stating that would be helpful as well.

    While I recognize that Ancestry and other providers need to keep on good terms with records custodians, I also believe it needs to be made publicly known which ones are refusing to cooperate with digitization efforts, whether out of some desire to promote "genealogical tourism" or whatever. That is the only way the genealogical community can work to effect a change in those policies.

    Back to incomplete databases, I would again say that Ancestry should explicitly note what is not included and why, and whether there is an intention or ability to complete the database in the future where the records are extant and a non-dynamic dataset is involved (which is obviously not the case with the SSDI and such which continue to grow).

    For another example, I posted to the Ancestry Comments message board last year asking questions about the Civil War Research database:
    http://boards.ancestry.com/topics.ancestry.ancsite/9421/mb.ashx

    As I note there, Ancestry apparently obtained those records from another commercial provider, but the provider's own website seems to indicate via a comparison of record totals, that Ancestry only obtained about half of what is available. I got no response from Ancestry there to my questions about same or about original source citations for listings.

    Again, thank you for taking the time to listen and respond.

    Mike

    ReplyDelete
  2. Insider,

    One other comment on something you said about offering up unindexed records. You seem to indicate that is more expensive for the company than waiting for indexing to be completed. I am surprised at that, although no doubt because I don't understand the process. I would think that a table of links to such static images would just involve a couple other columns in the table for indexing and pointers to the images which could be added later. Unless of course the preferred arrangement of images is based partly on the index.

    And FWIW, I would welcome more records being offered unindexed if it meant much quicker access to same for customers. It would be no different than searching unindexed original records which any competent amateur genealogist is eventually going to have to do for many lines if brick walls are to be overcome. While many customers may not be interested in paging through without an index, many of us would be.

    Mike

    ReplyDelete
  3. I agree with the lady regarding the incomplete data. The number one and two databases that come to my mind are the Tennessee Marriage Records and the Missouri Birth Records. The Tennessee Marriage records are missing a lot more than Sullivan County. The best I can tell they are missing the largest county, Shelby, the second largest County, Davidson and the third largest County, Knox as well as Sevier County and maybe some more as well. The Missouri Birth Records are very incomplete and an explanation with both of these databases would be very helpful so we would know if these records will ever be added or if the database is as complete as it is going to be.

    ReplyDelete