The Ancestry Insider: Robert Kehrer’s Industry Trends and Outlook

Thursday, February 16, 2017

Robert Kehrer’s Industry Trends and Outlook – #RootsTech

Robert Kehrer, product manager at FamilySearch, took part of a panel discussion titled “Industry Trends and Outlook” at the Innovators Summit portion of RootsTech 2017. Robert wrestles with big data technology problems at FamilySearch.

One of the hardest things Robert faced in preparing his presentation was narrowing down the areas that he wanted to talk about. He narrowed things down to three categories of innovation: technology, process, and data.

The first technology innovation he sees coming is automated transcription—the ability of a computer to transcribe a document. There have been some recent advances, particularly in the area of handwriting recognition. Today automated transcription works well on typescript documents and pretty well on print handwriting. The ability to do recognize cursive writing is showing promise. However, there are really messy documents that automated transcription is not likely.

Another area where technology innovation is happening is named entity recognition. A computer takes transcripted text and, using a process called natural language processing, picks out the names, dates, locations, relationships, and so forth. Progress is being made in this area.

Innovation is happening in neural networks and machine learning and is important in combination with automated transcription and named entity recognition. Machine learning is not difficult to understand when demonstrated with a simple example. Machine learning could make it possible to show the machine many images of the name William. Subsequently, when names are shown to the machine, it can pick out those that are William.

Don’t think that these technologies are going to replace human indexers. These technologies must be trained using data indexed by people. And these technologies free up people to do what only people can do.

Innovation is happening in fuzzy search advancements. Fuzzy is a funny word that he used to refer to non-exact search results. This is familiar stuff like wildcards and name variants. Robert feels like there could be some innovation here less complicated than an artificial intelligence hint matching system but more sophisticated than the search engines of today.

DNA will and is having a massive impact on genealogy.

Process innovations are going to be important as well. Today, organizations have a centralized process for determining what records to acquire. Robert thinks we will see more distributed decision making on what collections to digitize. He envisions a world where local archives, libraries, church congregations (like LDS stakes and wards), and individuals take the responsibility to identify, digitize, and index collections. We see this a little already with apps like FamilySearch Memories or BillionGraves.

Data innovation was Robert’s final category. There is a lot of data out there that is highly valuable, but there is a risk that it will be lost. Records can be at-risk because of poor archival conditions, political instability, natural disaster, or scheduled destruction. India destroys their censuses before the decade is over. Lastly, there are hundreds of millions of “records” stored in memorized genealogies in certain cultures, many throughout Africa. FamilySearch has an active and growing program to capture these “oral genealogies.”

The last data innovation is one of Robert’s hopes. There is so much good genealogy data locked up in the record managers on genealogists’ computers. It is not shared freely. Robert envisions a world where tree data is more readily available and shared more freely among all the different sites. Websites could compete on best features, user experience, and records rather than on availability of member submitted trees.

6 comments:

BonnieFebruary 16, 2017 at 10:45 AM
I recently read "Humans Need Not Apply" by Jerry Kaplan on the subject of artifical intelligence. I highly recommend it. The days of initial transcription are numbered...
ReplyDelete
Replies
walterwood44February 16, 2017 at 12:13 PM
O enjoy your blog but sometimes have trouble reading it. On this post which I got via email, I had a problem reading it because it did not wrap the text. I had to widen my email program, Outlook, almost full screen to see from the beginning of the line on the left to the end on the right. On my phone using GMail, I was not able to see the right side of the text unless I turned it sideways. Please figure out how to get the text to flow regardless of the size of the screen it is viewed on. Thanks
ReplyDelete
Replies
UnknownFebruary 16, 2017 at 12:41 PM
I like your news letter and use what I learn from them I believe DNA will help in family genealogy I had my DNA done when Ancestry pushed it for Family tree maker users and they let me add DNA information to my tree site now They have removed my DNA info. and tell me they have never allowed any out side DNA only theirs I have been an Ancestry user since the early 1990s but am now looking at other options as I hav both FTMDNA and 23andme DNA test thad Ancestry wont let me use because they want more money for their Testing and over charch for tfeir cite !
ReplyDelete
Replies
UnknownFebruary 20, 2017 at 8:29 AM
The last paragraph on data innovation really struck home with me. I am 68 years old and have 5 family trees with the largest one having over 90,800 names, 15,400 obits, and numerous wedding announcements and anniversaries, etc. No one in my family has any interest in genealogy. I can see my 17 years of research being “flushed down the toilet” when I am incapable of maintaining my trees. I have basically quit my research because it seems pointless.

Couldn’t FamilySearch create something so trees on programs, such as Legacy, be donated to them so that if in the future they have a use for that data it is there and hasn’t been destroyed?

I would gladly pay for a website that allowed my tree to be updated and stored online with sharing opportunities. I have an Ancestry tree but find their program not well thought out and pretty much useless for maintaining an online tree.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.