Thursday, September 20, 2012

Evidence Explained and the Index Derivative Type

Indexes are nearly the worst genealogical record derivativeWhen it was recently released, I read Elizabeth Shown Mills’s latest online lesson, “QuickLesson 10: Original Records, Image Copies, and Derivatives.” While we often speak of sources as being originals or derivatives, real life is not always that tidy. Mills presents three caveats to consider when classifying a source. One thing that can be done is to distinguish between formats that preserve the original content and those that process the content and the form of the content. Mills lists about 10 of each type.

I want to emphasize the characteristics of one of the derivative types she presents: indexes.

I regard indexes (as they are called on and databases (as they are called on as nearly the worst of all derivative types. Indexes are used as finding aids. To that end, publishers apply all sorts of treatments to the information found in the original records. The information originally in the records is interpreted and transformed, and conclusions are drawn. Some of these are made by keyers and indexers. Some are applied en masse by computer algorithms.

  • Names. Name parts are divided into given and surnames, sometimes incorrectly, even swapped. Indexers might be instructed to interpret abbreviations. Keyers and indexers misread names. You should also be aware that to increase findability, publishers standardize names—behind the scenes “Jack” becomes “John” and so forth.
  • Dates. Dates are often assumed to be Gregorian, regardless. Or dates from other calendar systems may be forced into Western Calendar format.
  • Birth Dates. Birth dates may be inferred from age. Birth years calculated this way for a June 1 census are wrong over half of the time.
  • Places. Abbreviations may be interpreted by indexers or computers. Indiana’s “Ia” may become Iowa and its “In” may become India. Place names may be forced into a hierarchy of three jurisdictions (town, county, state for the U.S.) regardless of reality. As with names, publishers standardize place names behind the scenes, sometimes using pick lists, making it impossible to find some records.
  • Race. Race, color, nationality, and ethnicity may be confused, standardized, and reduced to pick lists that exclude and confuse many values.

The next time you use an index, remember these shortcomings. Indexes should be considered finding aids. When available, always view the image. When images are not available, always use the index information to obtain copies of the original record.

The other nine lessons can be found on the website Evidence Explained: Historical Analysis, Citation & Source Usage.


  1. Dear Ancestry Insider,

    I totally agree with what you have said about Indexes, no matter the source.

    I agree that an Index is a Finding Aid, but it may be some time between seeing an Index for someone I am researching and the actual viewing the documentation that is referred to in the Index.

    My question is, Do You Cite an "index" based on what you have posted?

    Thank you,


  2. I keep files of all sources of metadata and notes/comments -- scans or downloaded images -- sequence numbered by source type, in hard and soft copy. In addition to labeling the source by title, I code the file number--distinguishing between indexes or any other derivative source and the "original" -- e.g. between an index of births and the actual birth certificate. That having been said, I'm curious to know how ESM would treat, say, the BMD records ordered from the GRO in England. In principle, they are at least secondary info, since they aren't the original handwritten entries in the church records-but they're closer than the entries in the BMD indexes.

  3. You said "...publishers apply all sorts of treatments to the information found in the original records..."
    Two additional examples might be helpful to other readers:
    Zip codes may include several municipalities; programming may select a preferred location from the list or the boundaries of the zip code may have changed since the record was created. In either case, the location may be misleading.
    Programming errors may lead to index errors: ancestry's "Cuyahoga County, Ohio, Marriage Records and Indexes, 1810-1973" repeats the Father's Name as the Spouse's Father's Name; luckily the database includes attached images.

  4. Good warnings, AI. In addition to the items you pointed to, some databases at least on include totally invented material in the extracts in addition to invented place designations. In post-1870 US Census enumerations, where a child is in a grandparent's household, the extractors invented what other person in the household is the child's parent (who may not be present at all), although the actual enumeration hardly ever supplies this information. This occurs when the picked-parent is both too young to be the parent and the surnames are wrong. In one instance the invented parent was a present son of the head of household instead of the actual parent, the unmarried daughter who was also in the household in this case.

    There is no need at all for this gratuitous fictionalizing since these enumerations do not state intra-family relationships other than to the head of household. actually added this data field -- for its own inexplicable reasons.

  5. Mccainkw writes: "I'm curious to know how ESM would treat, say, the BMD records ordered from the GRO in England ... secondary info but closer than the entries in the BMD indexes."

    Mccainkw, I suspect ESM would cite whatever it was she was using. If she could not get an image copy of those "original handwritten entries in the church records," then she'd have to settle for using the GRO-issued copies. If the data in that next-best-thing presented conflicts or if she had some other serious research problem in that generation and she knew that "originals" existed, she'd likely try a few time-honored tactics to break through the barrier keeping her from those originals. But if that GRO-issued copy dealt with a 8th cousin 13 times removed, she'd likely settle for common-sense over perfection.


Note: Only a member of this blog may post a comment.