Thursday, September 3, 2015

Artifact Citations

Sarah A. Skillin's SamplerI came across a beautiful sampler on the Smithsonian’s website. I thought it would make a great example for an artifact citation. One possible format for an artifact reference notea is

     1.  Creator, title or description, artifact type, creation date, archival identification; archive name, archive location. Optional explanatory notes.

For the Smithsonian sampler the corresponding citation is

     2.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” sampler, 1835, id number 1983.0617.03, American Samplers collection; Smithsonian National Museum of American History, Washington, D.C. Gift of Mrs. Robert B. Stephens.

One principle of citation writing is that redundant information can be eliminated if the citation remains clear. Notice in note 2 that the title identifies the artifact as a sampler. There is no need to repeat it as the artifact type.

      3.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.03, American Samplers collection; Smithsonian National Museum of American History, Washington, D.C. Gift of Mrs. Robert B. Stephens.

But this sampler is not available for public examination at the Smithsonian. Most of us can access it only via the high quality image on the Smithsonian website. Derivative copies (be they images or textual) of artifacts accessed via a website require a layered citation.b In the case of a textual derivative, the citation layer for the online item proceeds the citation layer of the original. In the case of a high-quality image derivative, the citation layer for the original typically proceeds that of the derivative:

     4.  Citation to the original; citation to the online item.

The citation to the online item—devoid of provenance—would follow the pattern for a separately-authored chapter of a published book.c

     5.  Item creator, item title, item type, website title (URL : accessed date), navigation instructions.

For our sampler in particular, the citation to the online item could look like this:

     6.  Smithsonian Museum of American History, “Sarah A. Skillen’s Sampler,” digital image, Smithsonian: The National Museum of American History (http://americanhistory.si.edu/collections/search/object/nmah_1096007 : accessed 22 August 2015), click the thumbnail.

Put together, the citation looks as follows. I’ve color coded the layer containing the citation to the online item.

     7.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.03, American Samplers collection; Smithsonian National Museum of American History, Washington, D.C.; Smithsonian Museum of American History, “Sarah A. Skillen’s Sampler,” digital image, Smithsonian: The National Museum of American History (http://americanhistory.si.edu/collections/search/object/nmah_1096007 : accessed 22 August 2015), click the thumbnail. Gift of Mrs. Robert B. Stephens.

Here we again invoke the principle of eliminating redundancy, again. The title of the artifact and the title of the online item are both “Sarah A. Skillin’s Sampler.” The latter can be dropped. The item creator, Smithsonian Museum of American History, is redundant with the title of the website, so the former can be dropped. That leaves:

     8.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.03, American Samplers collection; Smithsonian National Museum of American History, Washington, D.C.; digital image, Smithsonian: The National Museum of American History (http://americanhistory.si.edu/collections/search/object/nmah_1096007 : accessed 22 August 2015), click the thumbnail. Gift of Mrs. Robert B. Stephens.

You’ll notice I didn’t drop the name of the artifact creator, Sarah A. Skillen, even though it appears in the artifact title. That was a judgment call. Does “Sarah A. Skillin's Sampler” mean Sarah created the sampler or merely owned it? I thought it ambiguous enough to warrant leaving Sarah’s name as artifact creator.

Reference note 8 uses the complete URL of the sampler. Citing the URL of the item versus citing the home page URL is another judgment call. Item URLs are often short lived. Home page URLs are typically valid longer. Another way to cite the image is to use the home page and instructions on how to navigate from the stated URL to the image. Here are some alternatives:

     9.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.03, American Samplers collection; Smithsonian National Museum of American History, Washington, D.C.; digital image, Smithsonian: The National Museum of American History (http://americanhistory.si.edu/ : accessed 22 August 2015), search for “Sarah A. Skillin's Sampler.” Gift of Mrs. Robert B. Stephens.

   10.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.03; Smithsonian National Museum of American History, Washington, D.C.; digital image, Smithsonian: The National Museum of American History (http://americanhistory.si.edu/ : accessed 22 August 2015), path: Collections > Object Groups > American Samplers > Sarah A. Skillin's Sampler.” Gift of Mrs. Robert B. Stephens.

   11.  Sarah A. Skillen, “Sarah A. Skillin's Sampler,” 1835, id number 1983.0617.0; Smithsonian National Museum of American History, Washington, D.C.; digital image, Smithsonian: Seriously Amazing (http://www.si.edu/ : accessed 22 August 2015), search for “American Samplers.” Gift of Mrs. Robert B. Stephens.

Note 9 cites the home page of the National Museum of American History and navigates via a search. Note 11 navigates from that same page, but using a click path. Note 11 cites the home page of the Smithsonian itself and navigates via a search. In this last case, interestingly, the title of the website changes. In all three notes, I dropped the instruction to “click the thumbnail.” This is another judgment call. Another principle of citation creation allows you to leave out information that is common knowledge. I decided that everyone knows that you click a thumbnail to get the full-size image.

One of the lessons to be learned here is that there is leeway in citation format.

What if this sampler was in a private collection? The basic citation format of note 1 can be adapted for private ownership. Without archival identification, the item must be described in greater detail. And describing the provenance is at least as important as an artifact under the control of a trusted archive. Since artifacts under private ownership are transitory, it is helpful to know what year the artifact was at the stated location.d

   12.  Creator, generic description, artifact type, creation date, collection identification; privately held by owner, owner’s location, access year. Explanatory notes including greater description and provenance.

If I owned the sampler and gave you access to it sometime in 2015, the note could look like this:

   13.  Sarah A. Skillen, Simeon and Nancy (Adams) Skillen family sampler, 1835, artifact collection; privately held by Ancestry Insider, [address for private use,] Salt Lake City, Utah, 2015. The names and birth dates of Simeon, Nancy, and children, and the death date of Silas are embroidered, framed by flowering vines wrapped around columns, on a linen cloth measuring 21 1/8 x 17 5/8 inches. The present owner obtained the sampler under dubious circumstances from the Smithsonian National Museum of American History, which received it as a gift in 1982 from Mrs. Robert B. Stephens, Potomac, Maryland. It is unknown how Mrs. Stephens obtained it.

So, what do you think? Questions? Comments? How would you have cited it?


Sources

     a.  Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, 3d ed. (Baltimore, Maryland: Genealogical Publishing, 2015), 124-5.
     b.  Mills, Evidence Explained: Citing History Sources, 58. Elizabeth Shown Mills, “QuickLesson 19: Layered Citations Work like Layered Clothing,” Evidence Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/content/quicklesson-19-layered-citations-work-layered-clothing : accessed 22 August 2015).
     c.  Mills, Evidence Explained: Citing History Sources, 57.
     d.  Ibid., 138-9.

Wednesday, September 2, 2015

Guiding Principles for Cleaning Up Messes in Family Tree – #BYUFHGC

Ben Baker gave guiding principles about cleaning messes in FamilySearch Family TreeThis is the second of two articles about Ben Baker’s presentation at the 2015 BYU Conference on Family History and Genealogy. Ben’s topic was “Help! My Family is all Messed Up on FamilySearch Family Tree.” His slides and syllabus are available at http://www.slideshare.net/bakers84/help-my-family-is-all-messed-up-on-familysearch-family-tree and http://www.slideshare.net/bakers84/help-my-family-is-all-messed-up-on-familysearch-family-tree-handout, respectively.

Ben presented a list of guiding principles to use when cleaning up messes in Family Tree.

Play Nice With Others

Remember this is a shared tree. Don’t be too bullheaded. Apologize when you’ve messed up. Be nice how you approach people. When people mess up, it’s generally because they don’t realize what they are doing. Some users delete people thinking they are operating in a private tree.

Watch out for mytreeitus. Ron Tanner came up with the term; Ben Baker gave a dictionary-like definition:

mytreeitus \mī-trē-ˈī-təs\ (noun)
An inflammation common to many genealogists,
particularly heavy users of PAF. Symptoms include
extreme anxiety over others modifying their extensive
genealogical research, possessiveness of ancestors,
unwillingness to work in collaborative family trees and
disregard for others when removing erroneous
persons from their family. Usually occurring in more
mature adults and rarely seen in those under 40.
[Ouch! Ben didn’t score any points with his largely older-than-40 audience.]
Learning to use FamilySearch Family Tree has been
shown to be an effective treatment for this affliction.

Make Your email address public. To do so, click on your name in the upper-right corner of the screen. Click settings. Click Contact. Enter your email address and check the Public box next to it. There is a messaging system coming soon that will allow you to send messages to others, even if their email address is not public. [Since the conference, that feature has been released.]

Draw Pictures and Take Notes

Most of the problems that Ben runs into are messed up families. To help sort things out, draw a picture showing the relationships as they should be. Here’s a diagram with a father who fathered his first child with his first wife and his second child with his second wife:

One of Ben's diagrams showing relationships

Pay attention to the PIDs. Each record has a PID. If a person has two different PIDs, then there are two different records that need to be merged. If two different persons have the same PID, then they aren’t really two at all. They are merely showing up twice in the same diagram. I’ve created an example, below. While Imaginary Child (LKPR-R95) and Imaginary Child (LKPR-R9N) are the same person, there are two PIDs. That means there are two records that need to be merged. Also notice that there are two of Imaginary Child (LKPR-R9N). By paying attention to the PIDs, we see that there are not really two; it is the same record showing up twice.

An imaginary family showing 1 person with two PIDs and one person shown in two places

To keep track of things, open up multiple browser tabs. To open a new tab or window when clicking a link, use a middle click or a right click of your mouse [or hold down the control-key while clicking].

If you are really worried about how to do things, try things out on http://beta.familysearch.org. Beta has almost the same information as the real Family Tree, changing stuff on beta doesn’t change the real tree. If you are uncertain how to go about making a change, go over there and try things out. FamilySearch also tests new features there. To see features that might be coming, you can go over there every once in a while and see what looks different.

Family Tree has two relationship types: parent-child and couple. FamilySearch developers call a parent-child relationship a tertiary relationship because there are three people involved: a father, a mother, and a child. Family Tree uses the same innards for a single parent situation, but leaves one parent empty.

Two relationship types in Family Tree

A married couple with one child is represented in Family Tree with two relationships: a couple relationship (because of the marriage) and a parent-child relationship. Ben showed the screen snippet, below, with little icons overlaid showing the couple relationship and the parent-child relationship. To edit or delete the couple relationship, click the pencil icon to their right. To edit or delete the parent-child relationship, click the pencil icon to the child’s right.

Parents and child with relationship icons overlaid

Let me make an aside here. A nuance sometimes lost on people is that there can be a parent-child relationship with parents who don’t have a couple relationship with each other. The biological father might be nothing more than a sperm-donor, for example. In the Imaginary family, above, there is no couple relationship between Imaginary Father and Imaginary Mother. Instead of showing a marriage date between them, Family Tree shows a link to “Add Couple Relationship.”

We return now to Ben’s presentation, already in progress...

“Let me reiterate! Above all! DO NOT CLICK THERE!” [Oops. Makes me wish I had been listening. Oh well.]

Ben showed a family not unlike the imaginary family I showed previously. Imaginary Child (LKPR-R9N) is shown once with both parents and once with just his father. This is a common scenario. Ben asked attendees how to fix it. One suggestion was to add the missing mother. That was not the correct answer. The child is part of two parent-child relationships. The first parent-child relationship has both parents. The second parent-child relationship has just the father. It is incomplete and unnecessary; delete the extra relationship.

Deleting a person, on the other hand, is rarely the right thing to do. When there is an extraneous person in a family, don’t delete him, delete the relationship. (This makes sense when you think about it. Family Tree is intended to be the family tree of all mankind. Everyone who ever lived needs to be in there. Keep the person, just get him out of the family.)

There are probably only two times when you should delete a person: If you find a fictitious person such as the god Odin or Mickey Mouse, you should delete him. Or if you have just barely added a person and realize that was a  dumb thing to do, delete him. In fact, FamilySearch will soon make changes so the latter condition is the only one in which you can delete a person. For a fictitious person, you’ll have to call support and ask them to delete him.

“I think delete person is evil, personally,” Ben said only half-jokingly. “It’s doing really bad stuff in the tree.” Deleting a real person can be a double-whammy (my description, not Ben’s). When you search for a person in the tree, including spouse and parent names is very powerful. When you delete a person’s spouse or parent, that person becomes harder to find. If the person is left with absolutely no relationships, they may never be found again. FamilySearch employees call such persons “dark matter.”

I had to leave early, so I didn’t get to hear the remainder of Ben’s presentation. I’m guessing he didn’t have time to finish all the material he prepared, but it is covered in his slides and syllabus. Let me call out a few more  guiding principles:

  • Base your actions on verifiable sources.
  • Provide good reason statements.
  • Act on icons to achieve regular, small successes with the possibility of adding new persons to the tree.
  • Contact support when you need to and ask to escalate if necessary.
  • Report abuse if you believe someone is purposefully destroying data.
  • Use the Watch List more effectively.
  • Learn to understand and use the Change Log better.
  • Read, maybe even subscribe to, the blog.
  • Embrace change.
  • Realize that some things are not fixable yet.

Well, that’s it for this year’s BYU Conference on Family History and Technology! It only took me a month to cover the small part of it that I attended. I leave you with this photo of conference bloggers, Jana Last, the Ancestry Insider, and Lynn Broderick.

2015 BYU conference bloggers, Jana Last, Ancestry Insider, and Lynn Broderick
Photo credit: random passerby.

Tuesday, September 1, 2015

My Family is all Messed Up on FamilySearch Family Tree – #BYUFHGC

Ben Baker addressing the 2015 BYU Conference on Family History and GenealogyBen Baker spoke at the 2015 BYU Conference on Family History and Genealogy. His topic was “Help! My Family is all Messed Up on FamilySearch Family Tree.” Ben’s presentations are always packed with useful information and this was no exception. Fortunately, he posts his slides. You can see them for yourself at http://www.slideshare.net/bakers84/help-my-family-is-all-messed-up-on-familysearch-family-tree. This is the first of two articles recounting his remarks.

FamilySearch Family Tree is somewhat like a wiki. Anyone can make a change. Everyone sees the changes. It is maintained by volunteers. It’s free. It reduces duplication and encourages collaboration. Your research outlives you. You can link to photos, stories, and sources.

Ben posed the question, “If collaborative family trees are so great, how come everything is so
messed up?” To begin with, Family Tree was created from multiple kinds of sources. And Family Tree has imperfect patrons. It astounds him how “creative” people are when they make changes. “People do really crazy things. It never ceases to amaze me,” he said. The third factor is that FamilySearch has done things in the past to try to clean things up, and sometimes have made them worse.

There are three special usernames that frustrate users when they show up as a contributor in Family Tree. They sometimes introduce or re-introduce errors.

FamilySearch This value means that a FamilySearch administrator, or an automated FamilySearch tool, has changed the information. This happens when someone at FamilySearch is fixing problems that can’t be fixed in any other way.
unknown4470317

This value indicates that Family Tree doesn’t know who the contributor was. On the slides Ben gave Pedigree Resource File contributions as an example. In his presentation, he mentioned the old four generation program (by which, I suppose he meant Ancestral File). I don’t think either of those are correct. I think Family Tree doesn’t know the identity of some contributors to the International Genealogical Index. When FamilySearch keyed in paper submissions to the IGI, they didn’t key in contributor or source information. This value exists for original contributors only; current contributors are all known.

LDS Church Membership

This value means that FamilySearch brought the information into Family Tree from the Church membership system. FamilySearch synchronizes Family Tree with the Church’s membership database on a regular basis.

When you call support, you get different tiers. The first tier consists of volunteer missionaries. They can escalate to  higher tiers. One of the higher levels is the Data Quality team. They can escalate bugs to the software developers; that’s when Ben would get involved. Ask support to escalate if the first tier is not able to solve your issue.

But things are getting better. There are hundreds of millions of sources attached to Family Tree. That is stabilizing things because people are less likely to make changes when there are lots of sources. People are merging duplicates; there are 40,000 merges per day and it has been as high as 50,000. Another sign that things are getting better is the reduced number of times that people undo merges. In the New FamilySearch tree, for every four combines, there was one separate. That was probably a sign that people were making incorrect combines. Today, there are about 30 to 1 merges to restores. Ben takes that as evidence that users feel like most merges are correct. And there are few reports of “edit wars.” That’s when two people disagree about a fact and constantly change it back and forth. There are some. Click the report abuse button if it is happening.

We’ll know Family Tree has “arrived” when it is the first place to go to find out about a historical person. “That’s not true today,” Ben admitted. We want people to say, “Wow, this is amazing. Why would I want to go make my own tree somewhere else?”

Stay tuned for more…

Powerful FamilySearch Partner Apps – #BYUFHGC

Jimmy Zimmerman presenting at RootsTech 2015“Have you ever said to yourself, ‘If only FamilySearch would do this one thing?’ ” asked Jimmy Zimmerman, product manager for FamilySearch Family Tree. Jimmy spoke to the topic “Powerful Partner Apps for FamilySearch” at the 2015 BYU Conference on Family History and Genealogy.

“There are an infinite number of ideas out there,” Jimmy said, “and FamilySearch has finite resources.” But what if others could add features? Well, FamilySearch has something called an API which allows that.

Diagram showing arrows between apps, through the Internet, to the FamilySearch API

[Insider’s note: An API is like a wall with holes in it set aside for particular actions. An app or website writes information on a piece of paper and, holding the paper in hand, sticks their hand through a specific hole in the wall. On the other side FamilySearch notices the hand sticking through the wall, reads the information on the piece of paper, writes a reply, and shoves the hand back through the wall. For example, an app might write a person identifier (PID) on a piece of paper and stick itthrough a hole labeled “fetch information about a person in Family Tree.” FamilySearch writes the information on the piece of paper and shoves the hand back through the hole.]

To use the API, companies must adhere to a strict set of rules. These are designed to protect the integrity of data in FamilySearch Family Tree and to guarantee best security practices. The rules are so voluminous they are jokingly referred to as “the tax code.” In the FamilySearch App Gallery, each app page indicates capabilities that the app can exercise within the information at FamilySearch.org. Writing and modifying Family Tree requires far more rules than just reading Family Tree.

Jimmy talked about finding available apps in the App Gallery. If you can’t find a way to get to the App Gallery, you can always go to FamilySearch.org/apps. Find apps by searching for the name or description, specifying category, filtering by platform (Windows, iPhone, web, etc.), price option (free, purchase, or subscription), free trial availability, language, FamilySearch capability (read-only, update), and if a FamilySearch login is required.

Some apps are listed without any certification. According to Jimmy, these have been found to be so helpful, FamilySearch lists them despite the lack of certification. He pointed out Ancestry.com’s Family Tree Maker as one example. An audience member asked when MyHeritage will be interacting with FamilySearch Family Tree. Jimmy said that while he couldn’t say, he could tell us it is in progress.

Users can rate apps and write reviews. Please leave reviews. It helps others find the really good apps and it encourages the developers to improve. If you find problems with an app, first contact the company. App reviews may not be a fair place to report problems, as the problems might actually be a FamilySearch API issue.

Some apps with high ratings are:

Jimmy demonstrated a few of the apps. Kinpoint was one that I had not seen before.

Explore Chart of Kinpoint.com

Kinpoint.com displays a fan chart, or Explorer Chart as they call it. Dots on the Explorer Chart are like a to-do list. They mark things like missing vital information, timeline issues, duplicates, lacking sources, and record hints, although some of these are available only with a subscription. The pane on the left displays information about the focus person. A summary pane on the right-hand side shows interesting facts about the persons displayed in the Explore chart, such as the number of countries of origin, number of children per family, youngest and oldest ages, and range of birth dates. Most of these are available only by subscription. Facts can be used to highlight persons on the chart according to available filters. For example, you could see all ancestors highlighted who were 25-30 years of age at the time of their death. The chart can show ancestors or descendants. The subscription features are available for free in a Family History Center.

Jimmy showed MooseRoots.com, a website with census and vital records. MooseRoots is a new company in the family history space, but has its roots in the ability to pull together lots of information. [Insider’s note: The parent company is the newly named Graphiq, a data visualization company, with many vertical search engines.] For example, their census records are married to aggregate census statistics, name origins and meanings, historical stock performance, historical place information, and economic data. [Insider’s note: Some of their data looks pretty rough, like the WWII army enlistment records for The first five names from Cache County, Utah are Edson Bcnson, On Roy Pehr, Meroill W Glevn, Grant C Jarsvn, and Eewzp Thompkwo. If I had to guess, I would say they used OCR on a typed or printed source. No images were available.]

Jimmy wanted to show us their Civil War Soldiers collection, but couldn’t find the link to it. I stumbled across it at http://civil-war-soldiers.findthebest.com/ after a lot of poking around. Graphiq has married the standard Civil War Soldiers database with information about the infantry, battles, and casualty counts.

The same section of the Graphiq website contains information about battles, generals, sailors, and war statistics. They credit the National Park Service for the data and Hal Jespersen (www.cwmaps.com) for the maps.

Monday, August 31, 2015

Monday Mailbox: How Fast Was the 1860 Census Indexed

Howland Davis sent a question in response to my article, “FamilySearch Indexing Not Keeping Up.”

Dear Ancestry Insider,

Interesting article, thank you.  I have a question about the comparison of the indexing the 1860 and the 1940 censuses.  I am fairly sure that the 1940 index was completed 1650 days after its release in 2012.  Was the 1860 census indexed 17 years after its release in 1932(?) or did the work start some years after that?

Just curious, not important.

Howland Davis

Dear Howland,

Ooooh. Something shiny.

It took Ancestry.com four months and one day to finish its 1940 index. (See my article of 6 August 2012, “Census Indexing Update: And It’s Over.”) FamilySearch published the 50 states a while later, but I think it took them a considerable amount of time to finish the territories.

I believe the first large-scale effort to index the U.S. censuses was made by Ronald Vern Jackson and Accelerated Indexing Systems (AIS) in the late 1970s through the early 1990s. I believe he indexed heads-of-households only, and just the names, so the amount of work was more manageable. These were true indexes, not the census databases we use today. Where did he get his keyers? Does anyone know? He published the indexes as bound books of computer printouts.

A page from the 1976 AIS index to the Louisianna 1820 census
Ronald Vern Jackson, et. al, eds., Louisiana 1820 Census Index (Bountiful, Utah: Accelerated Indexing Systems, 1976), 1.

According to Thomas Jay Kemp’s The American Census Handbook (Wilmington, Delaware: Scholarly Resources, 2001), here are the publication years for a sampling of states:

Census Publication year
1790 New York: 1990
Ohio: 1984
1800 Ohio: 1986
Vermont: 1976
1810 Virginia: 1978
1820 Iowa: 1977
Indiana: 1976
1830 Indiana: 1976
1840 Iowa: 1979
1850 Iowa: 1976
1860 Iowa: 1987
North Dakota: 1980
Virginia: 1988
Washington: 1979
1870 Iowa: 1990

Notice all were done after the widespread availability of computers.

In 1984 AIS published on microfiche what it had completed. Ancestry.com published AIS indexes online in 1999.

Some limited scope indexes were published earlier. For example, in 1964 the Ohio Library Foundation published an index of the 1830 Ohio census. This, too, was a computer printout. Volunteer family historians extracted the names of heads of households onto index cards. The cards were keyed onto punch cards, which were then sorted by an IBM mainframe computer.

A page from the Ohio Library Foundation's 1964 index of the 1830 Ohio census
Ohio Library Foundation, ed., 1830 Federal Population Census Index, vol. 1 (Columbus, Ohio: Ohio Library Foundation, 1964), 1.

So the answer to your question is, that indexing the 1860 census took about a decade and was finished around 1990.

Signed,
---tai

Thursday, August 27, 2015

The Future Will Bring Automated Indexing Tools – #BYUFHGC

Jake Gehring presenting at the 2015 BYU Conference on Family History and Genealogy“It’s not that we don’t like our [indexing] volunteers,” said Jake Gehring. “We would just rather have them work on things that only [humans] can do.” Jake is director of content development for FamilySearch and presented at the BYU Conference on Family History and Genealogy last month. This article is the third and last article about his presentation. In the first article I reported on Jake’s premise that FamilySearch Indexing is not keeping up with the number of records FamilySearch is acquiring and additional means are needed. In the second article I reported about two of those means: increasing the efficiency of human indexers and working with commercial partners. In today’s article I will report on the third means: increased automation via computers.

In the third part of his presentation, Jake spoke about “the really far-out stuff, HAL9000 kind of stuff.”

Jake showed a screen shot that we saw in Robert Kehrer’s keynote. (See “Kehrer Talks FamilySearch Transformations” on my blog.) The screen showed a color-coded obituary.

Obituary with parts of speech color coded by FamilySearch automated obituary indexing system

FamilySearch trained a computer to identify the different parts of speech. They trained the computer how to discern meaning out of a bunch of words. Notice in the example above that names of people are identified in dark green, places in brown, dates in dark blue, relationships in salmon, events in pale green, clock times in a steel blue (or would you call that a dark sky blue?), organizations in red, and buildings in goldenrod (or would you call that a mustard?).

They basically teach the computer to read. The computer is willing to extract a lot more detail from an obituary than a volunteer can easily do. And it can work really, really fast. For obituaries, computers can do in about a week and a half what it takes all of FamilySearch’s volunteers three and a half years to do. This is why in a few weeks FamilySearch is going to stop having volunteers index the current obituary project. In fact, FamilySearch has already published about 37 million obituaries this way. You may already have found and used an obituary that was indexed by a smart computer.

This applies to obituaries published since about 1977. Since that time, most obituaries have been published and stored digitally. Pre-1977 it looks a lot differently. Because the obituaries are not already digital, it is a pretty nasty OCR problem. [OCR converts the printed page to text so that the computer can subsequently try to make sense of it.] The problem is so severe, computers can recognize only about half of the words in pre-1900 newspapers.

If you were at RootsTech you may have seen the last thing Jake showed. A company named Planet entered its ArgusSearch into the Innovator Challenge. ArgusSearch is a system that reads the handwriting of documents that have not been indexed. You type in something like “Steinberg” and the program shows some records that might match that name. It won’t find all the matches. And it may return some results that aren’t matches. But this is still useful. This technology is still young, but an application like this is likely to hit real life in the next ten years.

Planet's ArgusSearch automatically read handwritten names in census records without an index.

Jake summarized by saying that while indexing is going really well—never better—unfortunately, it is just not good enough to give us all the records you need. [FamilySearch does not index all the records they acquire.] “We need to do much better. It’s not that we are not quite there; we are way behind and getting further behind every year,” he said. There are three areas that FamilySearch needs to utilize. FamilySearch needs to increase the efficiency of its indexing volunteers. FamilySearch needs more help from for-profit publishers who can bring more resources to the table. And FamilySearch needs to use computer technology to make images searchable with little or no human intervention.

“It’s an exciting time to be alive. Can you imagine the explosion of document availability once we make a bit more headway in a few of these areas?”

Jake took a couple of questions:

Q. How easy is it to use tools like Google Translate to translate Spanish records?

A. Google Translate is better at modern, generic words. If you type in the text of a letter, you would be able to get the gist of it, but it may not handle archaic words or words specific to a vital record. As long as you know a small set of terms, you can usually get by without a computerized translator. There is no magic tool currently available.

Q. Why do we sometimes key so very little from a record? While we have someone looking at the document, shouldn’t they be extracting more?

A. Because we publish both indexes and images, we index the minimal amount necessary to find the image. Why index something that no one will ever use in a search? Cook County, Illinois death certificates are an example where we indexed something that didn’t need to be. We indexed the deceased’s address, but who will ever search using the address? Sometimes we don’t get it quite right, but that’s the general principle.

Q. When will we be able to correct published indexes?

A. We’re starting now after ten years of being in the top three requested features, we’re starting to implement the feature to allow you to contribute corrections. We are rapidly approaching the point when this will be available. I’m not really authorized to say “soon,” but we have our eyes on that feature.