Last Friday I was excited to see Ancestry.com had added a whole lot of new newspapers to its collection. I took especial notice of the Bozeman Daily Chronicle of Bozeman, Montana and immediately checked it out.
You’ll be happy to know that if you need the Daily Chronicle for 14 October 1926, you’re in luck because that is the only issue in Ancestry.com’s collection. I previously complained about Ancestry.com’s Salt Lake Tribune database. Number of issues at the time? Two. (I’m pleased to find as I write this article that they’ve beefed up the Tribune considerably. However, according to the card catalog, the database has never been updated. That may be why the additions were never announced on the New and Updated Databases page. But I digress…)
The new card catalog can be used to identify the leanest newspapers in the Ancestry.com collection. Sort by record count and page through to the end of the list.
Interestingly, all the sizes are multiples of 60. If one investigates the last newspaper in the list, one finds it is comprised of a single page from a single issue. One can also verify that the papers with size of 120 have but two pages. Continuing, one finds that the number of pages is equal to the reported size divided by 60.
Title | Size | Pages |
Chicago Daily News (Chicago, Illinois) | 360 | 6 |
Weekly Evening Gazette (Reno, Nevada) | 360 | 6 |
Decatur Daily Review (Decatur, Illinois) | 360 | 6 |
The Daily Mail (Charleroi, Pennsylvania) | 300 | 5 |
The Southern Immigrant (Cullman, Alabama) | 300 | 5 |
The Hancock News (Hancock, Wisconsin) | 240 | 4 |
The State Sentinel (Decatur, Illinois) | 240 | 4 |
East St Louis Journal (East Saint Louis, Illinois) | 240 | 4 |
Californian (Monterey, California) | 240 | 4 |
Bulletin Sentinel (Decatur, Illinois) | 120 | 2 |
Bridgeport Sunday Post (Bridgeport, Connecticut) | 120 | 2 |
Weekly Decatur Magnet (Decatur, Illinois) | 60 | 1 |
New York Times (April 15th, 1865) | 60 | 1 |
The Mountain Democrat and Placerville Times (Placerville, California) | 60 | 1 |
Why does Ancestry.com claim newspaper sizes 60 times larger than actually size?
You may recall we previously saw an anomaly in the new card catalog that led us to the assumption that the size displayed by the card catalog is a name count rather than a record count. (See “Extras in the Ancestry.com New Card Catalog.”) Ancestry.com needs to identify what they mean by “Size.”
So what is the probability that each page of these newspapers has exactly 60 names?
Not very likely, is it? In response to my article “Unbelievable Name Count Claims,” Paul Allen of WorldVitalRecords.com commented that vendors sometimes estimate the number of names in a database.
Name counts in OCR databases
Estimation is necessary for non-table-style databases such as books and newspapers because the index is an every-word index, obtained via OCR. OCR, optical character recognition, is a software process wherein a computer program attempts to read the images and create a matching document with all the words found on the image.
After the task of recognizing words, the computer still doesn’t know what the words mean. What you and I easily recognize as a person’s name is beyond the computer’s ability to identify with any degree of certainty. Thus, newspaper vendors must estimate the number of names present.
Obviously, some pages will have more and some less. To be sure, the subject of a news article will likely be named multiple times. Should a vendor attempt to count unique names? That will undercount the actual number of names. Should a vendor attempt to count unique people? Equating and differentiating people requires intelligence well beyond that of machines.
It would seem from our table above that Ancestry.com assumes an average of 60 names per page across all the pages of a newspaper. In my career, I’ve seen samples justifying higher numbers, when repeated names are included. Overall, I think 60 is very reasonable.
My recommendations to Ancestry.com and FamilySearch (and other vendors) are to be completely transparent in your size claims:
- Don’t report a size number without identifying if it is a record count or a name count
- Report record counts, name counts, and image counts
- For table-style databases, report the actual number of names present
- When an estimated count is published, designate it as such (I favor use of the approximation symbol (≈) to identify estimated numbers)
- For estimated counts, document how the estimate was obtained
Ancestry.com’s New Newspapers
Here’s the entire list of new newspapers posted Friday:
Genealogy Database Title | Posted |
Deming Headlight, The (Deming, New Mexico) | 5/29/2009 |
Star Herald (Scottsbluff, Nebraska) | 5/29/2009 |
Bozeman Daily Chronicle (Bozeman, Montana) | 5/29/2009 |
Dundee Record (Dundee, New York) | 5/29/2009 |
Daily Journal (Herrin, Illinois) | 5/29/2009 |
Boone Svenska Herald (Boone, Iowa) | 5/29/2009 |
Cherokee Daily Times (Cherokee, Iowa) | 5/29/2009 |
Dyersville Commerrial (Dyersville, Iowa) | 5/29/2009 |
Evening Journal Farm Edition (Washington, Iowa) | 5/29/2009 |
Gladbrook Tama Northern (Gladbrook, Iowa) | 5/29/2009 |
Grinnell Herald Register (Grinnell, Iowa) | 5/29/2009 |
Hopkinton Leader (Hopkinton, Iowa) | 5/29/2009 |
Humboldt Republican (Humboldt, Iowa) | 5/29/2009 |
Jewell Record (Jewell, Iowa) | 5/29/2009 |
Lake City Graphic (Lake City, Iowa) | 5/29/2009 |
Leon Journal Record (Leon, Iowa) | 5/29/2009 |
Lone Tree Reporter (Johnson, Iowa) | 5/29/2009 |
Manchester Democrat Radio (Manchester, Iowa) | 5/29/2009 |
Manning Monitor (Manning, Iowa) | 5/29/2009 |
Marshall County Time (Marshall, Iowa) | 5/29/2009 |
North Iowa Times (McGregor, Iowa) | 5/29/2009 |
Sac Sun (Sac County, Iowa) | 5/29/2009 |
Schaller Herald (Schaller, Iowa) | 5/29/2009 |
Story City Herald (Story City, Iowa) | 5/29/2009 |
The Alabama Courier (Athens, Alabama) | 5/29/2009 |
Bruce News-Letter (Bruce, Wisconsin) | 5/29/2009 |
Green-Bay Intelligencer (Navarino, Wisconsin) | 5/29/2009 |
The Fairfield Daily Ledger (Fairfield, Iowa) | 5/29/2009 |
Indiana Progress (Abbeville, Alabama) | 5/29/2009 |
The Western Telegraph (Rossville, Ohio) | 5/29/2009 |
Chicago Heights Star Sports (Chicago Heights, Illinois) | 5/29/2009 |
Belmont Gazette (Belmont, Wisconsin) | 5/29/2009 |
Oconomowoc Democrat (Oconomowoc, Wisconsin) | 5/29/2009 |
Yellow River Lumberman (Necedah, Wisconsin) | 5/29/2009 |
Oconto County Enterprise (Oconto, Wisconsin) | 5/29/2009 |
Du Buque Visitor (Du Buque, Wisconsin) | 5/29/2009 |
Winnebago Anzeiger (Menasha, Wisconsin) | 5/29/2009 |
The Maiden Rock Press (Maiden Rock, Wisconsin) | 5/29/2009 |
The Rice Lake Times (Rice Lake, Wisconsin) | 5/29/2009 |
The Benton Advocate (Benton, Wisconsin) | 5/29/2009 |
The Revealer (Bloomington, Wisconsin) | 5/29/2009 |
The People's Champion (Ellsworth, Wisconsin) | 5/29/2009 |
Prison Press (Waupun, Wisconsin) | 5/29/2009 |
The Fairchild Graphic (Fairchild, Wisconsin) | 5/29/2009 |
Fox Lake Representative (Fox Lake, Wisconsin) | 5/29/2009 |
The Wisconsin Standard (Geneva, Wisconsin) | 5/29/2009 |
Voice-Herald (Hales Corners, Wisconsin) | 5/29/2009 |
The Breeze (Pardeeville, Wisconsin) | 5/29/2009 |
Beloit Journal, of Politics, Literature, and General Intelligence (Beloit, Wisconsin) | 5/29/2009 |
Agitator (Wellsborough, Pennsylvania) | 5/29/2009 |
Arizona Silver Belt (Miami, Arizona) | 5/29/2009 |
Charleston Daily Mail (Charlestown, West Virginia) | 5/29/2009 |
Fort Dodge Messenger And Chronicle (Fort Dodge, Iowa) | 5/29/2009 |
Horicon Argus (Horicon, Wisconsin) | 5/29/2009 |
Lockhart Post-Register (Lockhart, Texas) | 5/29/2009 |
Marble Rock Journal (Marble Rock, Iowa) | 5/29/2009 |
Monroe Sentinel (Monroe, Wisconsin) | 5/29/2009 |
Mt Pleasant News (Mt Pleasant, Iowa) | 5/29/2009 |
Natrona County Tribune (Casper, Wyoming) | 5/29/2009 |
Neenah Bulletin (Neenah, Wisconsin) | 5/29/2009 |
Newcastle News-Journal (Newcastle, Wyoming) | 5/29/2009 |
Northwestern Record (Sheboygan Falls, Wisconsin) | 5/29/2009 |
Reedsburg Herald (Reedsburg, Wisconsin) | 5/29/2009 |
Richland County Observer (Richland Center, Wisconsin) | 5/29/2009 |
Mt Ayr Journal (Mount Ayr, Iowa) | 5/29/2009 |
Fayette Journal (Fayetteville, West Virginia) | 5/29/2009 |
Lebanon Daily News (Lebanon, Pennsylvania) | 5/29/2009 |
West Eau Claire Argus (West Eau Claire, Wisconsin) | 5/29/2009 |
Milford Mail (Milford, Iowa) | 5/29/2009 |
Terril Record (Terril, Iowa) | 5/29/2009 |
Lake Park News (Lake Park, Iowa) | 5/29/2009 |
Spirit Lake Beacon (Spirit Lake, Iowa) | 5/29/2009 |
Adams County Free Press (Corning, Iowa) | 5/29/2009 |
Desert Hot Springs Sentinel (Desert Hot Springs, California) | 5/29/2009 |
Aiken Standard (Aiken, South Carolina) | 5/29/2009 |
Tyrone Daily Herald (Tyrone, Pennsylvania) | 5/29/2009 |
Daily News (Huntingdon, Pennsylvania) | 5/29/2009 |
Tyrone Star (Tyrone City, Pennsylvania) | 5/29/2009 |
Evening Chronicle (Marshall, Michigan) | 5/29/2009 |
Sioux County Capital (Orange City, Iowa) | 5/29/2009 |
Alton Democrat (Alton, Iowa) | 5/29/2009 |
Boyden Reporter (Boyden, Iowa) | 5/29/2009 |
Sioux Center News (Sioux Center, Iowa) | 5/29/2009 |
Hospers Tribune (Hospers, Iowa) | 5/29/2009 |
Ireton Ledger (Ireton, Iowa) | 5/29/2009 |
Maurice Times (Maurice, Iowa) | 5/29/2009 |
Sioux County Index (Hull, Iowa) | 5/29/2009 |
Avalanche (Lubbock, Texas) | 5/29/2009 |
Laredo Times (Laredo, Texas) | 5/29/2009 |
Insider,
ReplyDeleteVery good general topic, i.e. newspaper databases. All of the major providers reek of lack of integrity in advertising. Ancestry, and Newspaper Archive from whom it gets a lot (most?) of its newspaper material, is following a familiar pattern:
1) Hyping numbers of names or titles added. When what is needed is broad coverage of a locality for decades, instead of scattered odd papers with less than 10 issues (or even only one full year). GenealogyBank (part of Newsbank), is the title count hyper sine qua non.
2) Deceptive timespans claimed. On Newspaper Archive's site if one clicks on a title, it is not rare to find a claimed run of decades and then looking deeper finding only issues at each end of that span. GenealogyBank does this too with their "America's Obituaries", in claiming for example to have obits for a newspaper from 1977 to current when based on multiple searches for known obits/surnames they can only have sparse coverage. Ancestry's obit coverage isn't good, but they don't make the claims of obituary completeness that GB does.
Newspapers potentially have a lot of promise, but once needs so much coverage to find a little. Kind of like panning for gold. Especially for early papers. On GB's blog a while back found here:
http://blog.genealogybank.com/2009/01/genealogybank-adds-170-newspapers-from.html
I commented and questioned TK on these issues, but without getting a straight answer for most of them. I asked why not really push to cover a state or two a year instead of a scattershot approach everywhere that the major sites seem to think is a best practice. And I asked about the costs of approaching say the Indiana State Library and seeking to digitize the state's newspaper microfilm collection (created through grants from the US Newspaper Project). Is it too expensive for the potential rewards? I don't know but am interested in the answer.
In summary give me long continuous runs of newspapers in areas I research, not scattered issues here and there while claiming huge title and name counts.
Mike
Mike,
ReplyDeleteThanks for your thoughts.
Obtaining newspapers from microfilm is much, much cheaper than scanning actual pages.
The bug-a-boo comes in the form of intellectual property rights. Possession of the microfilm doesn't give the possessor the rights to permit copying it. The party that sold the microfilm to the library has contractual rights, since there is a purchase agreement, explicit or implied. The party that produced the microfilm might claim copyrights, to the extent that it can be argued that producing a good microfilm image is more art than science. And the party that possessed the originals has the contractual rights agreed to between itself and the filming company; that agreement probably places boundaries around what the microfilming company can do.
The contracts involved were usually written early and don't address the question of digital imaging. Contracts have been lost, so companies don't always know what their working boundaries are. Libraries, the poorest of these organizations, are at the end of the food chain, so face the greatest risk of lawsuit.
For all these reasons, holders of newspaper microfilm collections are sometimes reticent to allow digitization of their collections.
-- The Ancestry Insider
Insider,
ReplyDeleteThanks for the response. I am sure what you said is valid, and the legal flip side of the cost side. But I specifically mentioned the IN State Library's holdings for a reason. They filmed papers through federal grants of the US Newspaper Project from what I understand (though I by no means know all the details). So the ISL produced the microfilm. Then for all such holdings not still under copyright, why can't Ancestry/GB/whoever, approach them and say they will digitize the collection for their commercial site, and in return, allow free access at all Indiana public libraries or something like that (but without home access through the libraries to retain potential for in-state customers). And if there are indeed any other dangling legal issues, then as part of the state government the ISL can invoke sovereign immunity (note that I am not talking about even coming close to breaching copyrights).
My main point is the question of whether such a plan is feasible within a certain timeframe and then can be applied to other states. I don't live in IN but just am familiar with the newspaper holdings there from investigating same (which are mostly available through inter-library loan to other states).
Mike
Good sleuth work on figuring out the name count issue. Ancestry.com does have its issues, but it's the best thing we've got going for online research at the moment. I met some Ancestry.com executives a few years ago when I filmed a testimonial for them for an infomercial (something I got invited to do after submitting a written testimonial), and I was really surprised that most of them didn't seem to know exactly how their own website worked! My husband and I were teaching THEM about some of the functions of the site that they didn't even know about!
ReplyDeleteStephanie at the Irish Genealogical Research blog