Thursday, February 16, 2017

Robert Kehrer’s Industry Trends and Outlook – #RootsTech

Robert Kehrer at RootsTech 2017Robert Kehrer, product manager at FamilySearch, took part of a panel discussion titled “Industry Trends and Outlook” at the Innovators Summit portion of RootsTech 2017. Robert wrestles with big data technology problems at FamilySearch.

One of the hardest things Robert faced in preparing his presentation was narrowing down the areas that he wanted to talk about. He narrowed things down to three categories of innovation: technology, process, and data.

The first technology innovation he sees coming is automated transcription—the ability of a computer to transcribe a document. There have been some recent advances, particularly in the area of handwriting recognition. Today automated transcription works well on typescript documents and pretty well on print handwriting. The ability to do recognize cursive writing is showing promise. However, there are really messy documents that automated transcription is not likely.

Robert Kehrer says automated transcription of some documents is harder based on handwriting style

Another area where technology innovation is happening is named entity recognition. A computer takes transcripted text and, using a process called natural language processing, picks out the names, dates, locations, relationships, and so forth. Progress is being made in this area.

Innovation is happening in neural networks and machine learning and is important in combination with automated transcription and named entity recognition. Machine learning is not difficult to understand when demonstrated with a simple example. Machine learning could make it possible to show the machine many images of the name William. Subsequently, when names are shown to the machine, it can pick out those that are William.

Robert Kehrer demystifies machine learning Robert Kehrer demystifies machine learning

Don’t think that these technologies are going to replace human indexers. These technologies must be trained using data indexed by people. And these technologies free up people to do what only people can do.

Innovation is happening in fuzzy search advancements. Fuzzy is a funny word that he used to refer to non-exact search results. This is familiar stuff like wildcards and name variants. Robert feels like there could be some innovation here less complicated than an artificial intelligence hint matching system but more sophisticated than the search engines of today.

DNA will and is having a massive impact on genealogy.

Process innovations are going to be important as well. Today, organizations have a centralized process for determining what records to acquire. Robert thinks we will see more distributed decision making on what collections to digitize. He envisions a world where local archives, libraries, church congregations (like LDS stakes and wards), and individuals take the responsibility to identify, digitize, and index collections. We see this a little already with apps like FamilySearch Memories or BillionGraves.

Data innovation was Robert’s final category. There is a lot of data out there that is highly valuable, but there is a risk that it will be lost. Records can be at-risk because of poor archival conditions, political instability, natural disaster, or scheduled destruction. India destroys their censuses before the decade is over. Lastly, there are hundreds of millions of “records” stored in memorized genealogies in certain cultures, many throughout Africa. FamilySearch has an active and growing program to capture these “oral genealogies.”

Robert Kehrer says some records are at risk because of poor archival condition. Robert Kehrer says some records are at risk because of political instability. Robert Kehrer says some records are at risk because of natural disaster Robert Kehrer says some records are at risk because of scheduled destruction

The last data innovation is one of Robert’s hopes. There is so much good genealogy data locked up in the record managers on genealogists’ computers. It is not shared freely. Robert envisions a world where tree data is more readily available and shared more freely among all the different sites. Websites could compete on best features, user experience, and records rather than on availability of member submitted trees.

5 comments:

  1. I recently read "Humans Need Not Apply" by Jerry Kaplan on the subject of artifical intelligence. I highly recommend it. The days of initial transcription are numbered...

    ReplyDelete
  2. O enjoy your blog but sometimes have trouble reading it. On this post which I got via email, I had a problem reading it because it did not wrap the text. I had to widen my email program, Outlook, almost full screen to see from the beginning of the line on the left to the end on the right. On my phone using GMail, I was not able to see the right side of the text unless I turned it sideways. Please figure out how to get the text to flow regardless of the size of the screen it is viewed on. Thanks

    ReplyDelete
  3. I like your news letter and use what I learn from them I believe DNA will help in family genealogy I had my DNA done when Ancestry pushed it for Family tree maker users and they let me add DNA information to my tree site now They have removed my DNA info. and tell me they have never allowed any out side DNA only theirs I have been an Ancestry user since the early 1990s but am now looking at other options as I hav both FTMDNA and 23andme DNA test thad Ancestry wont let me use because they want more money for their Testing and over charch for tfeir cite !

    ReplyDelete
  4. The last paragraph on data innovation really struck home with me. I am 68 years old and have 5 family trees with the largest one having over 90,800 names, 15,400 obits, and numerous wedding announcements and anniversaries, etc. No one in my family has any interest in genealogy. I can see my 17 years of research being “flushed down the toilet” when I am incapable of maintaining my trees. I have basically quit my research because it seems pointless.

    Couldn’t FamilySearch create something so trees on programs, such as Legacy, be donated to them so that if in the future they have a use for that data it is there and hasn’t been destroyed?

    I would gladly pay for a website that allowed my tree to be updated and stored online with sharing opportunities. I have an Ancestry tree but find their program not well thought out and pretty much useless for maintaining an online tree.

    ReplyDelete
    Replies
    1. Regarding Mr. Blanchard's concern about storing 'private' trees post-death or incapacity; while my trees are much smaller than his own, I am using Ancestry (and when FamilySearch can easily upload full trees) as the inevitable depository for my detailed trees. I could determine no other mechanism for dealing with this issue. Presently I maintain as "public" on all levels, and I presume that after my demise, others will be able to use and enjoy my years of research work. From above, I'll be watching!

      Delete