Thursday, August 25, 2016

Jim Ericson and FamilySearch Indexing (Part 2) – #BYUFHGC

Jim Ericson of FamilySearch addressed the 2016 BYU Conference on Family History and GenealogyThis is the second of two articles about Jim’s presentation.

Jim Ericson of FamilySearch gave a presentation titled “Straight Talk about the State of Indexing” at the 2016 BYU Conference on Family History and Genealogy. His purpose was to “answer several key questions related to FamilySearch indexing and the program’s future in a direct, no nonsense way.”

Where is indexing headed in the future?

FamilySearch is preparing a new indexing system. Jim said the new system is up and running but FamilySearch is still testing and figuring things out. It will probably be after the beginning of 2017 before it is available.

[At this point I have to poke fun at FamilySearch not about you, Jim. FamilySearch has been saying “this year” or “next year” for a long time. Here’s what they’ve said at several dates in the past:

My first career was as a software engineer and my managers were always asking, “How long will it take you to do this thing that no one has ever done before? And I was always thinking, “Are you listening to what you are saying?” I would dutifully try to figure out how long it would take me. Then I would tell my boss twice that long. Without me knowing it, he would double the number before telling the director, who would double it before reporting to the vice president. In the end, the project would take twice that long.

What moving target will I make light of after FamilySearch really release this program? Hmmm. I guess there is always: “We will be done scanning the vault in five years.”]

In the new indexing program FamilySearch will not use double keying. There are a lot of projects that are simple forms and it doesn’t make sense to have 3 people key them. So when it is appropriate, FamilySearch may have single key indexing for an entire record, or for just select fields. A field like gender is probably okay having just one person key the field, while the name should be indexed by two indexers. A qualified volunteer might be able to produce a better index than 3 people.

Another model FamilySearch will use is single-key indexing plus peer review. One person keys the work, but another person reviews it for correctness. This eliminates the problem of arbitrators working in isolation. This is not another name for arbitration. The reviewer doesn’t have to have more competence than the original indexers. It’s like checking a classmate’s homework. It eliminates the adversarial relationship between volunteers.

Coming in the future is the deployment of new technologies.

For things that are typewritten it is really easy for the computer to read those characters. Another technology is something FamilySearch calls robokeying. It reads and interprets text and “indexes” it. It goes beyond OCR. The results are audited. FamilySearch has done extensive testing of the results. There are technologies for recognizing all alphabets.

FamilySearch is testing with Kanji the ability to do handwriting recognition. That is the holy grail of the future.

However, we will always need volunteers, Jim said, not just for indexing, but other tasks like zoning areas of a news page for indexing to work with.

Microtasking is something FamilySearch could employee in the future. There would be specialized tasks like zoning, blocking fields in a form, or recognizing where names are in a record. A microtask could be to identify data types. A microtask could be keying specific fields, like just the name. A microtask could be verifying names. The microtasking system could use a personalized page that directs efforts towards currently needed tasks. This is the direction we are trying to go, he said.

In the future, we are headed towards more difficult projects, Jim said. The biggest factor for indexing volume is currently how easy or interesting the project is. “We’ve done a lot of the easy ones,” he said. The U.S. census only comes once per decade. The Freedmen’s bureau project is an example of a really difficult record type that the future holds. These records are going to be increasingly complex. About 60% of all the really valuable US collections have been completed, and about 40% in the UK. That leaves us with spotty coverage for the rest of the world, so we have huge needs when it comes to indexing in other languages, he said.

Jim took a number of questions.

Q. Will you allow people to be signed in for more than one day at a time?

Yes. That is one of the things we are working on. The Church of Jesus Christ of Latter-day Saints is very sensitive when it comes to security. The online version will allow two weeks like Family Tree.

Q. When can I do indexing on my smart phone?

“No time soon.” The new online indexing program can be done on a tablet, but requires more real estate than available on a phone. We want to do it. We are evaluating doing it. But it would be irresponsible for me to give a date.

Q. When will the indexing effort be done?

Never. Only about 30% of published records on FamilySearch.org are indexed. And we are still going to be acquiring records. And we have ongoing partnerships with organizations with projects for records we want access to. And new records are created every day. A big problem we have today is getting images imaged before the records are destroyed.

Q. From the time a project is indexed, how long does it take before the collection is published?

The 1940 census was the best we had ever done. Within days we were putting up states. Most projects are more complex and require more auditing and review. A project can get stuck in arbitration, quality assurance, or reindexing. We have some projects that have been hanging at 99% for more than a year. The model is to shorten that time.

Q. Once in a while you find a record that was misindexed, but there is no way to go back and correct it.

The number one question is, by far, “how do I fix a record that has been indexed incorrectly?” One solution we are considering is in the indexing step: preserve both a and b key. The other side is post-publication. That is the holy grail that we want to fix.

Q. Ancestry has had it for years.

Q. I’ve been arbitrating Kentucky marriage records. No one is following the rules. Should I do the job for them or send it back?

If they are done incorrectly, it depends on how diligent you are. If you want to send it back, that would be fine. If they are missing records from part of the image, send it back and indexers can see what they are missing.

Let me finish off with some recent indexing numbers. I received an email recently with this information:

FamilySearch Indexing English records indexed

And the FamilySearch Indexing page has this information as of 13 August 2016:

FamilySearch Indexing Statistics

5 comments:

  1. Are any records going to be every-name indexed, such as (say) partitions in Chancery, petitions for administration listing (perhaps dozens of) heirs, wills, or deeds?

    ReplyDelete
  2. I tried to get Family Search to correct an error on the 1940 Census. Well I was pretty much informed that even if it was wrong it would stay because 3 people had looked at it. Never mind that is was my aunt and uncle that I had been aware of and knew their names the error is still there.

    ReplyDelete
  3. I think that Family Search should let volunteers pick projects that they are familiar with, such as transcribing foreign countries where they are familiar with surnames. The Croatian church is one example where I am researching. I don't care if 3 people looked at it, they have all butchered the names.

    ReplyDelete
  4. I believe you missed part of the answer to the question about Kentucky Marriages missing records, and whether the arbitrator "Should...do the job for them or send it back?" (I asked the question). He said it was fine to send them back, but he preferred that we arbitrate them so that the indexer who missed half of the records would see that they missed part of the records. If it is send back for redo, the indexer has no idea why it was sent back, [and will therefore not learn learn.--my statement not his]

    ReplyDelete
  5. I believe you missed part of the answer to the question about Kentucky Marriages missing records, and whether the arbitrator "Should...do the job for them or send it back?" (I asked the question). He said it was fine to send them back, but he preferred that we arbitrate them so that the indexer who missed half of the records would see that they missed part of the records. If it is send back for redo, the indexer has no idea why it was sent back, [and will therefore not learn learn.--my statement not his]

    ReplyDelete