Monday, February 21, 2011

Monday Mailbox: Bulk Merge

Dear Ancestry Insider,

I am very frustrated, and think you might be in a position to help, at least understand the problem.

The Church [of Jesus Christ of Latter-day Saints] has digitized and indexed millions of records which is a wonderful thing.

What is not so wonderful is that these names are being bulk merged by computer to existing family lines and more often than not, fouling them up.  I am suddenly finding names attached to my family lines that are in a completely different geographical area than where my family lived, and with dates and other information that is clearly not theirs.

My concern is that information cannot and should not be merged by computer.  I have worked hard and carefully, paying attention to sources etc. and now to have my data corrupted by computer merges is very unsettling.

I wonder if those who okayed this merging are aware of the problems it is causing. I would be very interested in your opinion, since I know you understand both technology and genealogy.  I enjoy reading your blog, and am a fan.

Suzanne Johnston

Dear Suzanne,

Thank you. I’m glad you enjoy it.

I have good news and I have bad news.

The good news is the bulk merging ceased when the new FamilySearch Tree rollout began.

FamilySearch seeded the tree with bad data, some from computer merging, some from human error. The ground-breaking, evidence-centric design of the Tree was totally inadequate for dealing with the glutton of pre-loaded bad evidence. FamilySearch had to do something.

They opted to keep the bad data and replace the system.

For many months FamilySearch has gradually been replacing the system with a standard, source-centric design. Once the replacement is complete, users will be able to clean up the data. (Ironically, once clean, the old system would have been able to handle it.)

Now for the bad news.

If machines are not doing the bad merges, it is pretty clear who is. Once again we see evidence that genealogy is deceptively difficult.

-- The Insider

7 comments:

  1. Very nice post, and I have to smile at the last statement. I frequently have to remind myself that I am not the owner of my family, I have to play with the the others who are interested too.

    ReplyDelete
  2. I was just at my local FHC last night, and several of the volunteers there were very angry because their family lines have become messed up - when they used to be clean and correct. They told me the church allowed some company to come in and add information to the trees, and that is messing it up. I didn't believe that though, lol. I just figured more and more people are using NFS and they are the ones messing it up. There seems to be several stories going around, thinking the church is messing stuff up, and that just ain't true.

    I have just kind of backed off on trying to fix my family tree until new versions of NFS are released that allow you to better deal with bad info. But at the same time I figure as long as everyone can just change info there are always going to be errors. Even Wikipedia will lock articles and only allow a few pre-authorized people to make changes, maybe FamilySearch should look into something like that.

    ReplyDelete
  3. I just wanted to add something about the "messed up" families in NFS that I haven't seen addressed. When combining people, you only have the option of saying they are the same person. And I want to emphasize "same person." For instance, if you can tell they are the same person but one piece of information is outlandish, they are still the same person, but now will have a piece of very wrong information included. If at some point, NFS allows us to mark the correct date, place or other piece of information as correct (with our included documentation), then we be able to clean it up and just ignore the information that is incorrect. So, perhaps people are not "messing up" a clean family line, but just identifying unique individuals. It is obvious that many changes in NFS are ongoing and some still in the works. It is still a work in progress.
    Best

    ReplyDelete
  4. DJCummins has clarified a huge part of this problem. I, too, see entries for people who are the same person, but with some absurd data (such as a marriage in North Carolina for someone who lived his entire life in England 100 years earlier). I have left these individuals separate to try to keep the information correct on one of them, but that means that duplicate entries remain.

    There will eventually need to be a way to combine individuals without importing bad data that is there for one of the entries.

    ReplyDelete
  5. I totally agree djcummins. That is a huge problem in NFS. In my line I have several ancestors who were named after their parents, and sometimes those parents were named after their parents, and that is were I see this problem the most.

    A lot of times a record will have all the information for the child, their date of birth, etc., but it will have their mother has the spouse or something. Then you have to decide should this record be combined with the child because all the info, except the spouse, is for the child or should it be combined with the parent that was married to the spouse?

    There is even one spot in my family tree where there is an "eternal loop", because the parents are listed as children and the children as parents (because they have the same names). It will be so nice when we are able to fix this stuff, because currently wrong info has been submitted and there is no way to fix this problem just by un-combining records.

    ReplyDelete
  6. AI, interesting report and commentary.

    Some genealogical researchers have been intending to upload a gedcom to newFamilySearch if the tree is opened to the public. I realize this may not be a settled issue, but . . . .

    Will they be allowed to do this? Will there be a transparent and workable way to combine uploaded individuals with those pre-existing in the nFS tree? If workable combination-of-individuals would be possible, would the uploader be required to allow such pre-loaded facts in the existing nFS tree that go with such individuals, including those proven wrong or unproven in the opinion of the uploader?

    ReplyDelete
  7. I wish people could NOT upload a gedcom, but could only upload a document or link to document that supports evidence in a person's life. I think we need to be forced to slow down and rethink the information we publish on public websites.

    ReplyDelete