Robert Kehrer, product manager at FamilySearch, took part of a panel discussion titled “Industry Trends and Outlook” at the Innovators Summit portion of RootsTech 2017. Robert wrestles with big data technology problems at FamilySearch.
One of the hardest things Robert faced in preparing his presentation was narrowing down the areas that he wanted to talk about. He narrowed things down to three categories of innovation: technology, process, and data.
The first technology innovation he sees coming is automated transcription—the ability of a computer to transcribe a document. There have been some recent advances, particularly in the area of handwriting recognition. Today automated transcription works well on typescript documents and pretty well on print handwriting. The ability to do recognize cursive writing is showing promise. However, there are really messy documents that automated transcription is not likely.
Another area where technology innovation is happening is named entity recognition. A computer takes transcripted text and, using a process called natural language processing, picks out the names, dates, locations, relationships, and so forth. Progress is being made in this area.
Innovation is happening in neural networks and machine learning and is important in combination with automated transcription and named entity recognition. Machine learning is not difficult to understand when demonstrated with a simple example. Machine learning could make it possible to show the machine many images of the name William. Subsequently, when names are shown to the machine, it can pick out those that are William.
Don’t think that these technologies are going to replace human indexers. These technologies must be trained using data indexed by people. And these technologies free up people to do what only people can do.
Innovation is happening in fuzzy search advancements. Fuzzy is a funny word that he used to refer to non-exact search results. This is familiar stuff like wildcards and name variants. Robert feels like there could be some innovation here less complicated than an artificial intelligence hint matching system but more sophisticated than the search engines of today.
DNA will and is having a massive impact on genealogy.
Process innovations are going to be important as well. Today, organizations have a centralized process for determining what records to acquire. Robert thinks we will see more distributed decision making on what collections to digitize. He envisions a world where local archives, libraries, church congregations (like LDS stakes and wards), and individuals take the responsibility to identify, digitize, and index collections. We see this a little already with apps like FamilySearch Memories or BillionGraves.
Data innovation was Robert’s final category. There is a lot of data out there that is highly valuable, but there is a risk that it will be lost. Records can be at-risk because of poor archival conditions, political instability, natural disaster, or scheduled destruction. India destroys their censuses before the decade is over. Lastly, there are hundreds of millions of “records” stored in memorized genealogies in certain cultures, many throughout Africa. FamilySearch has an active and growing program to capture these “oral genealogies.”
The last data innovation is one of Robert’s hopes. There is so much good genealogy data locked up in the record managers on genealogists’ computers. It is not shared freely. Robert envisions a world where tree data is more readily available and shared more freely among all the different sites. Websites could compete on best features, user experience, and records rather than on availability of member submitted trees.