Wednesday, October 29, 2008

Opinion piece: Ancestry.com / USGenWeb squabble

The well publicized squabble between Ancestry.com and U.S. GenWeb Project (USGenWeb), in my opinion, has hurt both. But perhaps the greatest damage has been suffered by USGenWeb and has been of its own doing.

USGenWeb is an unincorporated non-profit association of volunteers that maintain a set of geographically organized web sites. Separate, but linked, web sites exist for every county and state in the country. The binding philosophy among all these non-commercial web sites is, "Keeping Internet Genealogy Free." Many had made use of RootsWeb's free genealogy web site hosting service. When Ancestry.com acquired RootsWeb, they continued the program, despite dire predictions by some that Ancestry.com would discontinue it.

The squabble arose when Ancestry.com announced that the RootsWeb.com address was being automatically replaced with RootsWeb.Ancestry.com and that mandatory headers would be automatically added to the free genealogical web sites hosted by RootsWeb. For some sites, the headers were merely a change from the mandatory top and bottom advertisements that Ancestry.com added to the sites. For USGenWeb sites, the headers were new.

While the organization's bylaws allowed "a website [to] acknowledge any entities who may host their website (i.e., provide server space at no cost)" (Article IX, Section 2.), some web site coordinators feared the worst. (See this post or this for a couple of examples.) USGenWeb sites contain genealogical data gathered through thousands of hours of volunteer work. The mere specter of Ancestry.com assimilating these contributions led some web site coordinators to move their sites off RootsWeb. Even the national site made a quick decision to move off RootsWeb, temporarily using a private server donated by a member before moving the site to IX web hosting.

"After many years at RootsWeb, we made a quick move to another option for web hosting," Mike St. Clair, USGenWeb Advisor Board Member later reported. He advised the board that, "a more organized evaluation of the options available would be useful before we decide to confirm that quick decision for the longer term."

Those sites that have moved have spent focus and time on the task, and many are still not finished. (See for examples, ILGenWeb, Town of Essex and the Kidz Project.) Changing URLs have produced broken links, upsetting easy navigation among sites, and cutting off some outside traffic.

I just experienced a case in point

The Phillips Library of the Peabody Essex Museum Visiting the Peabody Essex Museum's web site, I found the Phillips Library page on featured collections highlighted Essex County (Massachusetts) genealogy. The web site referred interested persons to "RootsWeb" for more information. Don't bother clicking the link, it points to www.rootsweb.com/~maessex, a dead URL. I know because I clicked the link.

When I found the link was dead, I assumed the link was to the RootsWeb resource page for Essex County, so I searched RootsWeb and noticed a link to www.rootsweb.ancestry.com/~macessex. That URL, differing by just the letter "c" surely was related, so I followed the link.

The address was for the USGenWeb Project's Essex City, Essex County site, so the Peabody's bad link must have been to a USGenWeb site. According to the Internet Archive, it was. The site was active from as far back as 18 August 2000, when it was part of the RootsWeb Genealogical Data Cooperation or GenConnect, until as recently as 24 December 2007, when it was part of USGenWeb.

Well, I was sitting on the Essex City web site. It should have been a simple matter to get to the county. I just clicked on the link to the county and...

...I was back to the dead URL www.rootsweb.com/~maessex. I used a search engine to locate the Essex County site at http://essexcountymagenweb.com, although http://essexcountyma.net will work as well. There, I found the address of the Massachusetts state web site had changed from www.rootsweb.com/~magenweb to http://magenweb.bettysgenealogy.org.

What a mess. And so I suppose it goes across the width and breadth of the U.S. GenWeb Project.

Pages spurned, Lessons Learned

From what I think I've learned from this experience, I would offer the following advice to the U.S. GenWeb Project:

  1. Domain names should be uniform (http://cc.ss.usgenweb.org) and centrally controlled. State and county coordinators would still arrange for their own web hosting and the national organization would set the DNS address to resolve to the current host. A site could change web hosting services and one DNS change by the national organization would heal all links to the site.
  2. Keeping data free is easier than preventing commercial exploitation. Richard Stallman, founder of the free software movement learned this the hard way when firms commercialized free software he developed. This led to the development of such copyleft copyright licenses as GPL and Creative Commons. Scientists in the Creative Commons project have abandoned attempts to prevent commercial exploitation in order to achieve their primary goal of keeping scientific data free. USGenWeb should likewise reexamine the relative importance of making data available for free versus preventing commercial exploitation of that data.
  3. Copyright provides very little protection to USGenWeb data. While the documents as a whole on USGenWeb web sites and in the archives are copyrighted, it is by no means clear if the data in those documents are protected. There are plenty of legal justifications for anyone that wanted to "harvest" that data. The U.S. Copyright Office says, "What is not protected? ... Information that is common property [such as] lists or tables taken from public documents or other common sources." (Circular #1, p. 3.) See also, "Can You Copyright Your [Genealogy] Data," and "7th Circuit Rules that Extraction of Public Domain Data from Copyright-Protected Database Is Not Copyright Infringement." Ultimately, the decision would require judicial interpretation. An unfunded volunteer cooperative would be no legal match for a determined, cash-rich corporation. If USGenWeb is intent on preventing commercial exploitation of its data, it should seek the advice of a nationally recognized Intellectual Property (IP) lawyer. Law schools may be the place to find individuals sympathetic to their cause.
  4. The transition away from RootsWeb would have been a great time to convert the USGenWeb Project to wiki format. Site coordinators that were moving their sites anyway could have moved the content into wiki pages. Other coordinators who had to update links to the sites that moved, could have moved their sites or simply changed the links to point to the appropriate wiki pages. A consistent page naming scheme would allow all coordinator to know what the wiki page URL would be. For sites that didn't move, wiki pages could be created with links out to the appropriate web site. Site copyrights would become page copyrights. Or members could entertain placing the copyrights in the national organization. Editing rights could be restricted to current coordinators, or opened up to any registered member. Templates could be used to encourage uniform layouts by desired groups of coordinators.

In fairness, I should write about Ancestry.com mistakes in their relationship with the USGenWeb project. I envision a piece outlining how they should have engaged the entire free genealogy community from the moment they bought RootsWeb. That's going to take hours to write. And they still don't have it right. And it's late. And I'm off to bed, so if you have an opinion, leave a comment.

9 comments:

  1. Insider,

    You said "The mere specter of Ancestry.com assimilating these contributions led some web site coordinators to move their sites off RootsWeb."

    This was not a 'specter' [spectre].

    A couple of years ago Ancestry began linking to USGenWeb genealogical content pages as if the content were now part of its subscriber-available content.

    The people who had worked long and hard researching, writing and posting that content for free access on USGenWeb pages were horrified, and saw it as a Taking.

    Some immediately moved the sites that they individually co-ordinated, and some in addition deleted the content to which Ancestry had linked.

    No few USGenWeb site co-ordinators foresaw the trend of Ancestry's mucking about with Rootsweb functions such as Message Boards, the WorldConnect pages and the Mailing Lists. The trepidations have come true (while there has not been major interference with the Mailing List functions themselves, Ancestry has installed obtrusive ads on the List Archive pages).

    Your suggestion to move to 'wiki' pages would have been a nighmare. No one wants USGenWeb users or unconcerned idiots to be able to change content or insert malicious code.

    ReplyDelete
  2. I have found the easiest way to find broken links new home is to google search for parts of the end of the broken link.

    ReplyDelete
  3. Dear geolover,

    Regarding the wiki concept: When Wikipedia first appeared, detractors also prophesied a nightmare. Who wants unconcerned or malicious idiots writing a encyclopedia? Understanding how and why Wikipedia succeeded despite your fears is important for any community such as USGenWeb or New FamilySearch.

    In my opinion, New FamilySearch Family Tree doesn't yet have what it will take to survive. I haven't seen any reason yet to believe they've figured out what (I think) has made Wikipedia successful.

    I'm not certain what your point was with the specter/spectre sentence. Were you asking me what the proper spelling is? According to Merriam-Webster, "specter" is the preferred (main entry) spelling while "spectre" is a variant (also acceptable).

    As far as links to USGenWeb sites a couple of years ago, are you speaking of the infamous Biographical Collection debacle? Excellent point. I forgot how that played out. What began as a "Google for Biographies" evolved into something that appeared extremely insidious. In that instance, they copied pages from USGenWeb sites and elsewhere, so it didn't make any difference if the original site moved or was deleted. The stuff they took remained on Ancestry.com until they removed the collection.

    Instead of funneling extra traffic to your websites, as Google does, their implementation eliminated the need for subscribers to traffic your websites.

    You are right. That was "taking."

    -- The Insider

    ReplyDelete
  4. Dear kmduff,

    Thanks for the suggestion. Does it work to type in actual parts of the URL, such as "/s400/presscensus.jpg"? Or do you have to translate them into English?

    -- The Ancestry Insider

    ReplyDelete
  5. Yes, I've just been searching for actual parts of the url. It works best when you are looking for specific things like a cemetery transcription. But I copy/paste part of the end of the url and shorten as necessary until it shows up on the first page of results. It has worked well for my needs so far.

    ReplyDelete
  6. "An unfunded volunteer cooperative would be no legal match for a determined, cash-rich corporation." I am glad you recognize what Ancestry.com is all about - money. It is fine to charge for what they research, extract and purchase, but it is not ok to post information they have "taken" from other genealogical sites and than charge for it. It is basically plagerism. Using the work of someone else without asking permission.

    "In my opinion, New FamilySearch Family Tree doesn't yet have what it will take to survive. I haven't seen any reason yet to believe they've figured out what (I think) has made Wikipedia successful." In my opinion, they do. So do at least 13,193,999 members and more who use New Family Search.

    "As far as links to USGenWeb sites a couple of years ago, are you speaking of the infamous Biographical Collection debacle? Excellent point. I forgot how that played out. What began as a "Google for Biographies" evolved into something that appeared extremely insidious. In that instance, they copied pages from USGenWeb sites and elsewhere, so it didn't make any difference if the original site moved or was deleted. The stuff they took remained on Ancestry.com until they removed the collection." It didn't just appear to be "extremely insidious", it was extremely insidious. No matter that it was just Biographys - again plagerism.

    ReplyDelete
  7. Insider,

    You say "In my opinion, New FamilySearch Family Tree doesn't yet have what it will take to survive. I haven't seen any reason yet to believe they've figured out what (I think) has made Wikipedia successful."

    You don't say why you think Wikipedia is successful. One attractive feature is the strong inclination to at least cite sources for fact-based articles.

    The New FamilySearch Trees do not give evidence or even sources for anything.

    In this they are like the majority of trees on the web, and follow the horrendous trend in the items added to the IGI as well as the innumerable, largely erroneous, family group sheets that have been so widely copied into error-ridden family trees on the web.

    While some of the IGI data does come from records sources that are cited (albeit with many typographical erros and misreadings), the rest of the material has been good reason for some to call LDS the biggest recycler of genealogical error and poor genealogical research method.

    The Labs site is great concerning actual records, but there are many typos and misreadings there - and no way to alert the site operator as to these errors.

    Ancestry is catching up in volume in these respects, preferring to enter out-of-copyright books rather than records images. But at least Ancestry.com has a way to notify *volunteers* about misreadings typographical errors in indexes.

    ReplyDelete
  8. THE ANCESTRY INSIDER SAID: "What a mess. And so I suppose it goes across the width and breadth of the U.S. GenWeb Project."

    Why didn't you just go to the USGenWeb Home page and find the Essex County, Mass from there? That would make more sense than slamming USGenWeb with your previous statement.

    Obviously, you didn't go to the USGenWeb website first b/c you didn't even create a link until 3/4 of the way down the page.

    If you were supportive or a supporter of USGenWeb, you would know the proper URL #1 and #2 you would also take the time to update your OLD USGenWeb links to the appropriate new USGenWeb URLs.

    Updating old links has never been a favored task of any webmaster.


    It is not USGenWeb's responsibility to see to it that everyone who has OLD links *to them* become updated. I believe you are being unfair about the move - sour grapes if you will...

    ReplyDelete
  9. No, it is not plagiarism to use content that is public record anyway. You can argue for the format of the transcription, the corollary content (remarks, addenda, footnotes and the like)but not the actual records themselves.

    I call them 'screaming freebies' - people who want everything free.

    Simply keep adding content available free. Don't worry about commercial exploitation. So what? As long as material is freely available at the rootsweb site & other volunteer sites (and that means getting out from under any site control by Ancestry and their ilk)and the free information continues to expand I reiterate: so what if Ancestry and WVR and others harvest the info? It's a shame Ancestry moved to commercialize free access sites with ads, and its a shame the 'screaming freebies' can;t see that both sides are vital to genealogy. There is no way in blue blazes free volunteer work is going to put a tiny dent in digitizing records. Even with the huge dollar contributions of Ancestry and Family Search and all the hard working volunteers from GenWeb and private sites and blogs there are probably 10,000 sources out there for every one online. We need each other people!

    ReplyDelete