Wednesday, February 17, 2016

Genealogy Life Blood at Kendall Hulet #RootsTech Luncheon

Part 2 of 3

Kendall Hulet is Ancestry.com’s senior vice president of product management. He spoke at a Saturday luncheon titled “Things to Look Forward to on Ancestry in 2016.”

Ancestry is expanding internationally. About a third of Americans claim German heritage. They are relaunching their German language website in a big way. They are doing television advertising in Germany for the first time. An increased involvement by users in Germany, adding content and uploading photos and things, helps everyone. They have 300 million German records and will have 400 by the end of the year.

“Content unlocks the family history story,” Kendall said. “Content is the lifeblood of the things we do.”

Recent German civil registry deals at Ancestry

Ancestry is adding millions of German civil registration records each month. They have deals with the state archives of Berlin and Hessen, including records from East and West Prussia, Silesia, Bohemia, and Moravia.

They have a German national directories project that will add 500 million records from 5.5 million images of 30,000 volumes of the German Reich from 1910 to 1955.

In collaboration with FamilySearch, they are publishing Lutheran Church records. They have published 19.2 million with another 100 million on the way.

They will soon publish the World War II young-men’s draft cards.

They are also launching a full index of the Irish Catholic parish records. It has 10 million records from 1740 to 1900. An official announcement will be coming out in the weeks after RootsTech. It will give more details.

Add New People to Index Feature for US Probates on Ancestry.com in 2016

Kendall announced that they are working on the ability to let users add additional names to the US Probate Records collection. They’ve already released that collection, which was done in collaboration with FamilySearch. It required adding a new user experience because of the packet nature of probate records. Ancestry didn’t index all individuals named in the records, so allowing users to add names and relationships will make it possible to search for, receive hints about, and attach to your tree, others in the records.

“Mobile is taking over the world,” Kendall said. “At Ancestry 50% of our visitors come in on a mobile device the first time they are visiting.” Over time Ancestry has been pulling features available on the web into the mobile app. They are going to improve the search experience. They are going to provide a better way to capture content and put it online. They are adding the ability to capture audio and video capture of stories, both in the app and on the website. They will incorporate audio and video into Life Story. “Go interview people before they pass on,” Kendall recommended.

Venn diagram which may help understand precision and recallMaking search results and hinting (Shaky Leaf) results right is difficult. Kendall showed a diagram similar to the one I’ve put together to the right. The circle represents all the Shaky Leaves Ancestry returned about your ancestor. Some were good and some were bad. The percentage that were good is defined as precision. “Precision is finding the right stuff,” he said. The rectangle represents all the records about your ancestor. Ancestry missed some of them (the portion of the rectangle outside the circle). The percentage that Ancestry found is defined as recall. The challenge is that if you try to increase one, it makes the other worse. “[If] you cast a wide enough net [to] catch all the good fish…you’re going to bring back a lot of other weird stuff with it,” he said.

For hints, Ancestry concentrates on keeping the precision high. For search, Ancestry concentrates on recall. They want you to be able to find “the needle in the haystack.” Some people are frustrated by the number of search results not about their ancestor. They want higher precision. “This is the constant challenge we’re dealing with,” he said.

“How are we going to go and make search and hints better?” Kendall asked rhetorically. “This is a big focus that I want to go after in 2016.” There is a concept in computer science called machine learning. If you can supply enough examples to the machine of results that you want and results you don’t want, the machine can learn how to return just the results you want. The more “training data” you can give the machine, the smarter it will become. “We have been working on a bunch of machine learning algorithms and we’re excited because we’re seeing higher precision and better recall from these machine learned algorithms,” Kendall said. This has an interesting side-effect. “What you’ll see over time is subtle changes to the results that you’ll get. They will be better.”

There is a problem with hints: users are receiving too many! It would help if they didn’t show you as many hints and if they made it easier to find the most valuable ones. They are focused on this challenge. They have a machine learned algorithm that increases both precision and recall, that anticipates where you were working in your tree, what your interests are, which hints are new, and which ones might add new information or people to your tree.

Stay tuned. Next time I will cover the last part of Kendall’s presentation.

3 comments:

  1. 1) Ancestry should substitute the word "relative" for "ancestor" in search boxes and narratives about using search engines.

    2) Simple logic indicates that my ancestor born in 1859 could not have served in the Revolutionary War or the US Civil War. Yet for years the search engine comes up with such ridiculous search results. Can they not teach the machines to do the math?

    3) They still have not fixed the drop-down place lists for pre-1870 US Census enumerations to show the presently-in-WV Counties in VA through 1860. They have known about this glitch for years. It needs to be fixed.

    ReplyDelete
  2. But since hints are only as good as the transcriptions, I shudder to think of any narrowing based on Ancestry transcriptions. However, I agree that having hints that are decades--and even whole countries!--off from any possible involvement by the person seem as if they could readily be fixed. I am so sick of getting US Revolutionary War hints for female relatives born in Canada in the 1920's, seemingly based on something like a middle initial.

    ReplyDelete
  3. While bad hints are a pain in the neck to be sure, I wish they would teach their computers NOT to give me hints of things that are already IN my tree.

    The system is certainly capable, with some programming corrections, NOT to show HINTS that are already IN your tree and attached to the very person these HINTS show up for!

    I have one cousin who cannot seem to figure out how to use the ancestry FTM properly and she will copy a hint or photo 10 times--OFTEN THESE ARE FROM MY TREE AND I AM THE ORIGINATOR OF THE DOCUMENT WHICH IS ALREADY IN MY TREE--then I get 10 hints on a person for a photo or document that I added to begin with...It is a waste of my time to have to go look at these and delete--not to mention I think "aha ancestry has found some info for me" only to see it is something that is not only in my tree, but the hint is the same thing 10 times!

    I also wish they would stop calling the "family data" SOURCES--they are not sources, they are amalgamations of what various other people have put on their trees-and are often incorrect although sometimes useful to look at to see if any of it is correct BUT they are not and never will be SOURCES.

    I do glance at other trees that show up in hints BUT I only look at trees that have sources---I don't know how many times I have seen a tree that SUPPOSEDLY has 20 sources--only to find out that person has added the Family Data "sources" many times over...a waste of my time to look at that stuff...

    PLEASE ancestry stop calling these things sources, they are not. Find something else to call them, and find a different way to index them into the system so they do not show up when looking at "tree hints" as a source! They are not and never will be sources and it is misleading to call them such.

    Those of us who have been around a while know enough to ignore these but new users really think of them as sources...so that is misleading for them, but it is especially annoying to look at other family trees, thinking they have a ton of sources, only to find out they have attached these databases (often several times) that are not SOURCES at all.

    Joyce

    ReplyDelete