Wednesday, June 16, 2010

Ancestry.com Missing Documents

Microfilm Documents Missing Staff at the National Archives and Records Administration (NARA) recently responded to accusations that Ancestry.com posts NARA record collections that are missing documents.

Earlier this year at the Ancestry.com Annual Bloggers Day Todd Jensen briefed us on their new NARA scanning facility in Washington D.C. (I alluded to the presentation in one of my articles. I was waiting on Ancestry.com for photographs before I wrote my article. Now I probably can’t remember enough to do the presentation justice. But I digress…)

At that time I asked if Ancestry.com still dropped images from NARA collections when they published them. Andrew Wait assured us that their policy has always been to publish every image. Another Ancestry.com employee in the room (I don’t remember who) leaned over and whispered some circumstance in which they had dropped images.

My hearing isn’t all that sharp, so I didn’t hear the circumstances mentioned, but it is well known by most Ancestry.com subscribers that Ancestry.com has always done so. Ancestry.com seems to feel they are doing everyone a favor by chopping and dicing up census microfilms:

  • Dropping images with no legible names:
    • microfilm headers
    • NARA publication booklets
    • covers
    • census totals
    • blank forms
    • pages that can’t be read because they were imaged too dark, too light, or too blurry
  • Rearranging census districts according to alphabetical jurisdictions
  • Preventing going past either end of a group of images

These changes are perfectly reasonable to decision makers that dabble in genealogy just enough to be dangerous.

And, in fact, these changes might indeed be an improvement if Ancestry.com also allowed unimproved access. The former without the latter has serious repercussions:

  • Removing context removes information
  • Tampering with evidence decreases its evidentiary value
  • The changes rob users of any way to detect documents that were inadvertently dropped
  • Removing illegible images gives NARA staff members no way to know that access to the originals is warranted

For these reasons, members of the Association of Professional Genealogists (APG) have criticized Ancestry.com’s practices. Last year Peggy Reeves pointed out that all but one of the first 25 soldiers from roll 402 are missing from Ancestry.com’s publication of T-288, General Index to Pension Files, 1861-1934. If Ancestry.com allowed unimproved access to this NARA publication, Reeves would have discovered one of two things. Either the images were illegible, or Ancestry.com had inadvertently left out all the index cards from “Charles Roe” to “Allen Rogers.”

That’s a lot of missing documents.

Digital publishers might want to take a lesson from microfilming practice. FamilySearch (as the Genealogical Society of Utah) always filmed every document, but when an original document was illegible, included a label indicating “Illegible Original.”

Next time I’ll share what NARA had to say about all this.

8 comments:

  1. I am sure we have all looked for online documents and assumed our family was missed by the census enumerator, or a pension file was lost or page was missing. It is a lot more troubling to think that Ancestry, Family Search, Heritage or Footnote simply skipped an image or left it out because it was imaged improperly or was too hard to read. If they didn't inlcude it, maybe they figured no one would request a better image (requiring more work) or they could claim more of their images were readable than the competition.
    As you said the real danger is in NARA pointing people online and not allowing access to originals or even destroying microfilm copies and indexes.

    ReplyDelete
  2. I encountered this situation a few months ago. I discovered a text copy of everyone in a county in KY. By matching names to the Ancestry search, I realized 2 pages were missing. I found the two missing pages by manually browsing through the Heritage Quest census records.
    Part of the page unreadable, but the information I needed was intact.

    Donna H.

    ReplyDelete
  3. Thanks for your article - As an information systems security professional, I have to agree that missing data can be as big a problem as incorrect info.

    As a fairly new genealogist I am someone who probably falls into the "knows enough to be dangerous" category and there have been at least a few times where I have had to retrace my search methodically to figure out if I'd missed something or if something was missing.

    Multiply the time spent determining there is missing data by the number of users trying to access that data and that can add up.

    It seems to me that making sure all images are included, or at least noting what is missing would minimize customer service inquiries leading to less support costs and more revenue for the company.

    Win, win.

    Anyway, good to know I'm not nuts. In this particular instance, anyway.

    ReplyDelete
  4. There is more than one way Ancestry omits images.

    In the 1820 US Federal Census enumeration for Monongalia County, (West) Virginia, the images for the Western part of the County had 2 to 4 folios per image.

    Last year someone went through and cropped out all but the uppermost image. Whoever did this knew what they were doing: they also removed the indexed links for the removed pages. The remaining index for the remainder often linked to the wrong URL.

    After some pointed outcry and a ~subscriber's~ listing what was missing (after Ancestry.com request), most of the deleted images were replaced after several months. They could not compare their backup images to what was on the website?

    The removed indexing was not replaced, so about half of that part of the County's enumeration for 1820 still is not indexed.

    ReplyDelete
  5. The more the merrier is definitely helpful for context purposes in genealogy, although the practicality of digitizing all of that can be difficult at times.

    I used to scan the digital images and help with indexing etc. at Ancestry.com. If there is missing content due to the light density (not talking about the no-name images since the decision to include those or not are often determined by the content acquisitions project managers), sometimes this will cause the scanning machine itself to not pick up the image depending on the settings. The quality of the microfilm and the settings of the scanner can make it difficult to ensure that everything is captured adequately. The entry level operator must be very careful to ensure that everything is captured, but human mistakes can be made in the process unfortunately.

    On the upside, other technologies like "ribbon scanning" have been implemented as well as other methods and measures taken to provide quality images. The ribbon scanning system captures the content by digitizing the entire width of the microfilm in on long image (rather than detecting the light of the content then capturing it page by page into multiple images) helps to avoid that oversight in hardware setup or user errors.

    Maybe more information than you wanted to know, but there you have a little more insight :)

    ReplyDelete
  6. I looked (browsed) for two years for a census image with an ancestor's name, in a tiny township in Kalkaska County, Michigan. The township was so small, I could read the entire thing in just an hour or so. No ancestor. Districts/Townships at the top of the pages not matching the descriptive title of the section. It was 1880, and the names and pages showed on the FamilySearch index. Frustrating, since the page just wasn't digitized, but it must have been filmed, or it would not have been in the index, right? I finally found it after a couple of years. Images were added and there they were. Hurray or Boo!

    ReplyDelete
  7. Insider,

    Thanks for raising this important issue again. Although it has been discussed in the past both on your blog and by many of us in comments on the Ancestry blog, Ancestry still refuses to acknowledge the problem and to implement best practices. Those simply are:

    1) image every page including blank and spoiled pages;
    2) leave intact the original order of the documents.

    Re #2, if they feel that neophytes will benefit from some other order, then that should be accomplished through a link chain for that purpose that leaves the original ordering as is for browsing purposes by knowledgeable genealogy researchers.

    In this matter as in so many others, Ancestry.com is not only wrong on the issue at hand, they are wrong-headed in their approach to such issues in general.

    These types of issues raise questions not only of best practices, but possibly of integrity. Ancestry has on its staff many well known and respected professional genealogists. And these persons presumably feel as we do and have advised Ancestry to adopt something similar to simple practices above but which advice was rebuffed. I say this because I would not think that a professional could in good conscience advise otherwise, or put a cost objective above one of providing customers full and complete access to the advertised product.

    In the Standards Manual (Millennium Edition c2000, p. 5) of the Board for Certification of Genealogists, standard 10 reads:

    "Scanned images of photographs, graphics, or text include the entire document or item of interest."

    While this is for an individual researcher dealing with an individual document, surely it is fair to apply same to a paid provider such Ancestry further up the chain. Otherwise how could a professional researcher using Ancestry for client work, purport to have made an exhaustive search, the first element of the Genealogical Proof Standard, or to provide images of all of a document of interest when one cannot be sure it is complete (and note that failure to provide original ordering frustrates an effort to determine if the record is complete).


    This to my mind raises a couple questions.

    1) Is there a gap in the standards of the APG and BCG as applies to professionals engaged by companies such as Ancestry.?

    2) Knowing that so many collection have various levels of incompleteness on Ancestry, can professionals using Ancestry continue to do so without at least always giving clients the caveat that an exhaustive search only applies to Ancestry's version of the originals which they refuse to image completely?

    3) At what point, or what level of failure to observe genealogical standards and best practices, would such a professional be obliged to cease working for such a company when sound professional genealogical advice is ignored?

    Ancestry's image is much enhanced by the presence of professional genealogists on its staff. Perhaps that image is undeserved.

    MikeF

    ReplyDelete
  8. OOhhh!!!Sooooo that's why!!!!!!!!!!!!
    I was on Ancestry.com looking on the 1900 Census for my ancestor. Couldn't find her so I decided to try Family Search Records pilot site. I did find my ancestor but the Census was hard to read but I made it out and found my ancestor.

    ReplyDelete