I was affected again by a bad merge in FamilySearch Family Tree. Someone had merged a Tilford in Virginia with a Telford in Ireland to produce a monstrosity of a person with a gaggle of children. Mysteriously, this man bounced back and forth between Virginia and Ireland, using the Tilford spelling for all children born in Virginia and the Telford spelling for all kids born in Ireland. Hmmm. What an odd fellow.
I can’t watch all my relatives so I didn’t know the merge had occurred until it was too late. Rather than undo the merge, people had come in and made various repairs. It was no longer possible to undo.
I have often spoken of the indiscretions of FamilySearch’s past, accusing them of bad automated merges. Perhaps because of this I’ve received a message on the topic from one of FamilySearch’s engineers, Randy Wilson.
“Machine merging was done very carefully [when the New FamilySearch tree was created],” he said. “[There is] a lot of empirical evidence showing that errors were less than 0.5%. There have been a lot of bad merges done in the system, to be sure. But almost all of the bad merges I have seen (and I've seen a lot of them) have been caused by users, not the machine.”
I hope he’s right. He’s one of the brightest fellows I know. (And he’s a relative and I’d like to believe intelligence runs in the family.) But there’s something that gives me pause: Ancestral File.
My perception was that Ancestral File was a mess. I thought the first release was especially messy. No one can be blamed besides FamilySearch. Users couldn’t make any changes, good or bad. Only FamilySearch can be blamed. I think my perception was shared by many genealogists.
“Yes, AF had some bad merges, too,” Randy said. “The actual number wasn't as bad as the user perception, because common ancestors happened more and got merged more and common ancestors also get seen by users more, because, well, they're common. So you get a ‘squaring effect’ of the bad merges being especially visible there.”
Are the software engineers of today somehow smarter than the engineers of the 1990s? Randy Wilson said
There was, in fact, a big difference between the merging algorithms used for Ancestral File…and those used in New FamilySearch. The former used "probabilistic record linkage" with simple field-based features, using parameters derived from statistics on a few hundred labeled pairs of records. The latter used far more advanced neural networks with 80,000 labeled pairs of records, with 20,000 more used to verify accuracy and select thresholds. I don't know if the engineers are "smarter" than in the 1990s, but the algorithms used sure are, and the engineers did have more extensive training in machine learning than before. In particular, we had a few Ph.Ds with a background in machine learning and neural networks working on it this time (me, Dallan Quass and Spencer Kohler).
I can imagine two identical pedigrees, each sprinkled with Frankenstein monsters or extra generations or random pedigree errors. (See my article, “Frankenstein Genealogy.”) A machine algorithm is let loose on the two pedigrees. I can image the machine could become confused or react in unexpected ways.
At the very best, the machine cocks the gun. Then some naïve genealogist comes along and pulls the trigger.
shuets udono, photograph of auto accident of two cars in an intersection in Japan, Flickr (https://www.flickr.com/photos/63522147@N00/408633225 : accessed 18 October 2015). Used under CC BY-SA 2.0 license.
“Frankenstein Genealogy,” The Ancestry Insider (http://ancestryinsider.org : 23 March 2011).