Tuesday, May 24, 2011

Ancestry.com launches Web Search

Eric Shoup at Ancestry.com Reception
Eric Shoup of Ancestry.com

Eric Shoup of Ancestry.com announced the release of Ancestry Web Search at a reception Thursday night of the 2011 NGS Conference. Once burnt by their Internet Biographical Collection, Ancestry.com is not shy about explaining how this time things have changed.

The Internet Biographical Collection

The Internet Biographical Collection (IBC) was a collection Ancestry.com released on 26 August 2007. It consisted of copies of web pages containing genealogically relevant information. The copies were made without warning to or permission from page owners. See Becky Wiseman’s “Is this Fair Use?” and another example from USGenNet for examples showing how the IBC worked.

Hundreds of people blasted Ancestry.com. Dozens of bloggers flogged them. Page owners objected. And Ancestry.com capitulated. Just three days after launch, Ancestry.com pulled the plug on the collection.

My favorite flogging was done visually:

Susan K. Kitches, image composer, “Ancestry.com Scrapes Websites,” Family Oral History Using Digital Tools (http://familyoralhistory.us : dated 28 August 2007, accessed 21 May 2011).


Web Search versus IBC

Ancestry.com says they have addressed the complaints made about the original version of the IBC. Here’s my comparison:

Internet Biographical Collection Web Search
Made copies of owners’ web pages without their knowledge or approval. Sites can be added or removed by owner request. I think I asked and Ancestry.com said that they had the permission of the three sites that they have currently indexed.
Required subscriptions. (On the 28th, Ancestry.com opened the collection to registered users.) Anyone can use it.
Result lists contained links to Ancestry.com’s copies of the pages, not the owners’. This deprived owners of several benefits, including advertisement revenues. No improvement. Still no links to owners’ pages.
Result pages contained information abstracted from the owners’ pages. Same. Ancestry.com’s stated intent is to limit the information shown so users are incented to click through to the owners’ pages.
Result pages contained links to Ancestry.com’s copies of the pages, not the owners’. (On the 28th, Ancestry.com supplemented the links to their copies with links to the owners’ originals.) Result pages contain links to the owners’ websites.
Ancestry.com made nearly complete copies of owners’ web pages, including text and graphic design. Many felt this was a clear violation of the law. I felt that they had, indeed, crossed the line, but that they were playing in a gray area and had not wantonly violated the law. Does not copy others’ pages. I believe this is the key difference between the IBC and web Search.
Citations did not specify the original sources. Thus, citations did not give credit to the owners. Citations do not specify the derivative sources.
Thus, citations do not disclose Ancestry.com’s involvement. They need to read my series on citation principles so they understand why a citation needs to specify both the derivative and the original.
Links went directly to the pages with the indexed information. Links do not go directly. Links go to the search page and users must retype the search parameters. This might be owners’ preferred behavior, but as a user I’d like the links to take me directly to the results.
Indexed the same content as search engines like Google. According to Ancestry.com’s Brian Edwards, Web Search indexes deep web content, stuff you can’t find doing a Google search.


Ancestry.com seems to have followed the advice of fellow blogger, Randy Seaver.

I wish that Ancestry would carefully consider the reaction to adding a database like this before they do it.

I spoke with Web Search product manager, Brian Edwards. He said, “We’ve spoken with many members within the genealogical community to try to make sure we approach this in the right way.” He said Ancestry.com believes it’s important to respect the wishes of the owners of the content indexed by this new product.

Really? No Caching?

Caching others’ web pages (copying, really) was at the heart of the IBC controversy. If you know where to look, Google.com has links to cached copies of indexed pages. As I wrote this on Saturday, I experienced the downside of Ancestry.com not caching pages. IndyGov.org, the website indexed by “Web: Marion County, Indiana Marriages since 1925,” was down. Ancestry.com's misuse of caching in the IBC may have poisoned the possibility of using it now.

Google caches copies of inde
Google caches indexed pages

I think web page owners will be more amenable to Web Search than to the Internet Biographical Collection. And I think Ancestry.com haters and conspiralists will like it in their own way; it gives them more fodder. But I’m not so certain about the rest of you. What do you think?



Private message to Brian Edwards: In our interview I mentioned I wondered if Google.com might be doing deep web searches on some websites. While researching this article, I came across an example of why I think this might be so. I did a Google search for ["internet biographical collection" (source OR citation)]. I clicked on one of the results, Kimberly Powell’s “Cache 22” article. The page came up with my search terms already in the About.com search box. Interesting, huh?

Does Google performs deep web searches?

1 comment:

Note: Only a member of this blog may post a comment.