Thursday, March 24, 2016

Full URLs in Citations?

Dorothea Lange, “Destitute pea pickers in California. Mother of seven children. Age thirty-two. Nipomo, California,” 1936When citing a web page, one must decide whether or not to use a full URL or a URL to the home page. One usually cites the website home page and includes the additional information necessary to guide users to the target page. Citing the full URL is a good alternative under two conditions: 1. The URL is long lived. 2. The URL is not too long. The longer the URL, the harder it is for a user to enter it without making a typographical error.1

How do you know if a URL is long lived?

URLs suffer from a process called link rot. For various reasons, they cease to work. Companies cease to exist or rename or reorganize websites. Some URLs are set to expire within minutes. Others never work anywhere but on your computer in your current browser. How might you know? Try copying and pasting the URL into a different browser. If it fails, you know right away it is not long lived.

For example, NARA included descriptive pamphlets at the beginning of their microfilm publications that sometimes contain rich information. Some are available online only through the NARA microfilm store. In the store, the product page for each microfilm contains a link (“View Important Publication Details”) to download the pamphlet. Unfortunately, the URL of the pamphlet (such as the one for M1328) is nearly impossible to obtain, is seven lines long, and won’t work again. And the URL of a product page expires immediately; even refreshing the page sends you back to the welcome page. The only way to access a pamphlet in the store is through a lengthy set of instructions.

Another class of URLs that fails to work are URLs of records found using databases at your public library or through their website.

Some publishers provide URLs that they intend to work for a considerable amount of time. How long? Let’s say they will work almost to eternity. However, remember that “Internet time” runs much faster than regular time. “Eeternity” is no more than 30 years away.

What are some of the systems and websites providing long lived URLs?

PURL and the GPO

The U.S. Government Publishing Office utilizes a system called PURL (persistent uniform resource locator) for some online publications.

As part of the online dissemination of Federal information, the FDLP uses persistent uniform resource locators (PURLs) to provide stable URLs to online Federal information. When a user clicks on a PURL, the request is routed to the Federal publication. As Federal agencies redesign and remove information from their sites, GPO staff reroute PURL entries to the appropriate location.2

For example, the PURL for the tri-fold brochure, USCIS Genealogy Program, is http://purl.fdlp.gov/GPO/gpo64668. When you enter that URL into your browser, the GPO server reroutes you to the current location of the brochure, where ever that might be. Similarly, http://purl.fdlp.gov/GPO/gpo26239 sends you to Guide to Tracing Your American Indian Ancestry. Apparently, GPO even supports some government publications on non-government websites. http://purl.fdlp.gov/GPO/gpo43102 sends you to a poster, National Atlas of the United States of America. Presidential Elections, 1789-2008, on the University of Iowa’s website.

The GPO PURL system will work if resources haven’t been altogether removed from the Internet and if GPO personnel have the time to update the links. I would use PURL links in a citation unless they were too long.

ARK and FamilySearch

FamilySearch provides long-lived URLs for its historical records, record images, IGI, and personal genealogies. Any URL containing “ark:” (archival resource key) or “pal:” (persistent archival link) is expected to work for a long time. I consider these safe to use in citations. Also, I think it is safe to remove the question mark and everything past it.

URLs to collections, persons in Family Tree, photos, user uploaded documents, wiki articles, and other pages don’t contain the “ark:” characters so I don’t consider them long lived.

LOC DIGITAL IDS and Handles

Online items on the Library of Congress website often have a permanent URL containing a digital ID.

To find a permanent URL for an item first look at the bottom of the item record. In some collections, you will find shorter permanent addresses in the "Digital ID" field of the item record. The URLs begin with "http://hdl..." and are called "handles" or "handle addresses."3

The URL https://www.loc.gov/item/mfd.45004/ currently leads to three death certificates from the Frederick Douglass family. But that URL may not work in the future. On that page one can find a digital ID in URL form: http://hdl.loc.gov/loc.mss/mfd.45004. If you use the digital ID URL, the LOC computers will interpret it and generate a URL that currently works. Go to it and you find yourself back at https://www.loc.gov/resource/mfd.45004. LOC has the latitude of changing the latter URL, but the digital ID URL is longer lived. I consider it safe to use in a citation.

The URL https://www.loc.gov/item/fsa1998021539/PP/ used to point to an instance of a famous Dorothea Lange photograph (shown at the top of this article).4 That link is broken now and I don’t know the digital ID, so I am unable to return to that webpage. You can see the original, unretouched photograph using digital ID URL http://hdl.loc.gov/loc.pnp/ppmsca.12883.

CONTENTdm and Reference URLs

CONTENTdm is software many universities use to display their digital collections. It has a reputation for links that fail. Let’s say I search the Robert Hawley Milne papers from Lewis University on the CARLI digital collections website and find the birth certificate of Flora Jane Putnam. The URL displayed by my browser is http://collections.carli.illinois.edu/cdm/singleitem/collection/lew_rhm/id/311/rec/4. It is not guaranteed to work if I change browsers or clear my cookies or use it tomorrow. If I poke around, I find a link labeled “Reference URL.” I click it and am rewarded with this URL: http://collections.carli.illinois.edu/cdm/ref/collection/lew_rhm/id/311. If you wish to share a URL to Flora’s birth certificate, shre this one. But I wouldn’t use it in a citation. Why?

If an institution switches from CONTENTdm to another software solution, CONTENTdm reference links will break. This is true for the software systems employed by most universities and small- to medium-sized archives. That brings us full circle.

Conclusion

It is often better to cite a homepage and include information inherent to a digital artifact—information that is likely to survive a switch from one software solution to another. That information can then be used with the search function. Digital identifiers, titles, and author/creators are information likely to survive.

In the Flora Jane Putnam example, the digital artifact title is “Birth Certificate for Flora Jane Putnam” and the identifier is “Flora Jane Putnam Birth Certificate 1893.tif.” One or both of these are likely to survive. I could cite the certificate and the digital artifact like this:

Illinois Department of Public Health, certified copy of delayed record of birth no. 201472, Flora Jane Putnam (1893); Robert Hawley Milne Papers; Canal and Regional History Collection; Lewis University Library, Romeoville, IL; digital image, (http://www.lewisu.edu : accessed 18 March 2016), search the library’s Milne digital collection for "Flora Jane Putnam Birth Certificate 1893".

My personal practice is to use complete URLs sparingly and, if there is any doubt as to their persistence, include enough other information that a user can find the webpage even after the URL has rotted. 

 


Portions of this article were adapted, with permission, from a post made on the BCG ACTION mailing list.

SOURCES

     1.  Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, third edition, Adobe Digital Edition, (Baltimore, Maryland: Genealogical Publishing, 2015), 59, 269, 283, 597, 626, 767.
     2.  Federal Depository Library Program Persistent URL Home Page (http://purl.access.gpo.gov : accessed 19 March 2016).
     3.  “Frequently Asked Questions,” The Library of Congress: American Memory (https://memory.loc.gov/ammem : accessed 19 March 2016), Bookmarking [and] Linking.
     4.  Dorothea Lange, “Destitute pea pickers in California. Mother of seven children. Age thirty-two. Nipomo, California,” 1936; retouched photograph of Florence Thompson with left thumb removed, LC-USF34-T01-009058-C (b&w film dup. neg.); Farm Security Administration/Office of War Information Black-and-White Negatives collection; Prints and Photographs Division; Library of Congress, Washington, D.C.; digital image (http://hdl.loc.gov/loc.pnp/fsa.8b29516 : accessed 19 March 2016).

2 comments:

Note: Only a member of this blog may post a comment.