Monday, August 29, 2016

Monday Mailbox: FamilySearch Indexing

The Ancestry Insider's Monday MailboxIn response to my article about Jim Ericson’s frank talk about FamilySearch Indexing, several readers posed some frank questions. In the spirit of Jim’s talk, I’m going to give some frank answers.

Dear Ancestry Insider,

Are any records going to be every-name indexed, such as (say) partitions in Chancery, petitions for administration listing (perhaps dozens of) heirs, wills, or deeds?

Signed,
Geolover

Dear Geolover,

I noticed this morning in the Kentucky marriage record project in FamilySearch Indexing that FamilySearch is not indexing the birth places of the bride, her parents, the groom, or his parents. Because it is cheaper to leave out some of the vital information, FamilySearch volunteers are able to achieve the big numbers Jim showed. Picking out all the names from a free-form record is even more expensive than indexing all the birthplaces from a form.

Does that answer your question?

Signed,
The Ancestry Insider


Dear Ancestry Insider,

I tried to get FamilySearch to correct an error on the 1940 Census. Well I was pretty much informed that even if it was wrong it would stay because 3 people had looked at it. Never mind that is was my aunt and uncle that I had been aware of and knew their names the error is still there.

Signed,
Gale Nash

Dear Gale,

Whoever told you that names could not be corrected in the 1940 census because three people had already looked at them was unauthorized and incorrect (and was, frankly, a little “up in the night”). The real reason is that FamilySearch has no mechanism (like Ancestry.com does) allowing error corrections. FamilySearch has said publicly that they will provide that mechanism someday, but haven’t said whether or not they are currently working on it. One can imagine that preventing their website from pulling a Hindenburg pulled their attention elsewhere.

Signed,
The Ancestry Insider


Dear Ancestry Insider,

I think that FamilySearch should let volunteers pick projects that they are familiar with, such as transcribing foreign countries where they are familiar with surnames. The Croatian church is one example where I am researching. I don't care if 3 people looked at it, they have all butchered the names.

Signed,
Alojzija

Dear Alojzija,

You are absolutely right. People do a terrible job indexing unfamiliar names. In 2010 I wrote “Indexing Errors: Test, Check the Boxes” about “cold indexing.” Frankly, I would expect a 5th generation Utahn of English extraction to butcher Croatian names worse than a highly trained Chinese keyer.

However, FamilySearch does allow volunteers to pick projects. But to be frank, most non-English language speakers aren’t indexing. (If you are one of the few, good on ya, mate.) FamilySearch isn’t going to provide lots of non-English FamilySearch Indexing projects to choose from if they are just going to sit there glacially indexed.

I think the solution is “Laissez Faire Indexing,” as I called it back in 2011. FamilySearch should scan everything in the vault and take everything they are currently photographing and throw it immediately, unindexed, on their website. Then let anyone index anything, anytime. Don’t require any involvement from FamilySearch, or they become the bottleneck. Don’t require them to set up projects or write indexing instructions or block images or anything else. Sure, they can organize formal projects like they do now; but don’t require it. There are downsides, to be sure. See the referenced article for more information.

Signed,
The Ancestry Insider


One reader gave me a friendly jab over a typo in the first article about Jim’s talk: “Jim provided some tips for success. Work with a fried or get some training.”

Dear Insider

I hope we don't all have to work "fried." Winking smile I sincerely appreciate all of your messages -- THANKS for all you do !!!!!

Signed,
Phil Besselievre

Dear Phil,

That was on purpose. It’s state fair time. Everything is served up fried. Winking smile

Signed,
The Ancestry Insider

Thursday, August 25, 2016

Jim Ericson and FamilySearch Indexing (Part 2) – #BYUFHGC

Jim Ericson of FamilySearch addressed the 2016 BYU Conference on Family History and GenealogyThis is the second of two articles about Jim’s presentation.

Jim Ericson of FamilySearch gave a presentation titled “Straight Talk about the State of Indexing” at the 2016 BYU Conference on Family History and Genealogy. His purpose was to “answer several key questions related to FamilySearch indexing and the program’s future in a direct, no nonsense way.”

Where is indexing headed in the future?

FamilySearch is preparing a new indexing system. Jim said the new system is up and running but FamilySearch is still testing and figuring things out. It will probably be after the beginning of 2017 before it is available.

[At this point I have to poke fun at FamilySearch not about you, Jim. FamilySearch has been saying “this year” or “next year” for a long time. Here’s what they’ve said at several dates in the past:

My first career was as a software engineer and my managers were always asking, “How long will it take you to do this thing that no one has ever done before? And I was always thinking, “Are you listening to what you are saying?” I would dutifully try to figure out how long it would take me. Then I would tell my boss twice that long. Without me knowing it, he would double the number before telling the director, who would double it before reporting to the vice president. In the end, the project would take twice that long.

What moving target will I make light of after FamilySearch really release this program? Hmmm. I guess there is always: “We will be done scanning the vault in five years.”]

In the new indexing program FamilySearch will not use double keying. There are a lot of projects that are simple forms and it doesn’t make sense to have 3 people key them. So when it is appropriate, FamilySearch may have single key indexing for an entire record, or for just select fields. A field like gender is probably okay having just one person key the field, while the name should be indexed by two indexers. A qualified volunteer might be able to produce a better index than 3 people.

Another model FamilySearch will use is single-key indexing plus peer review. One person keys the work, but another person reviews it for correctness. This eliminates the problem of arbitrators working in isolation. This is not another name for arbitration. The reviewer doesn’t have to have more competence than the original indexers. It’s like checking a classmate’s homework. It eliminates the adversarial relationship between volunteers.

Coming in the future is the deployment of new technologies.

For things that are typewritten it is really easy for the computer to read those characters. Another technology is something FamilySearch calls robokeying. It reads and interprets text and “indexes” it. It goes beyond OCR. The results are audited. FamilySearch has done extensive testing of the results. There are technologies for recognizing all alphabets.

FamilySearch is testing with Kanji the ability to do handwriting recognition. That is the holy grail of the future.

However, we will always need volunteers, Jim said, not just for indexing, but other tasks like zoning areas of a news page for indexing to work with.

Microtasking is something FamilySearch could employee in the future. There would be specialized tasks like zoning, blocking fields in a form, or recognizing where names are in a record. A microtask could be to identify data types. A microtask could be keying specific fields, like just the name. A microtask could be verifying names. The microtasking system could use a personalized page that directs efforts towards currently needed tasks. This is the direction we are trying to go, he said.

In the future, we are headed towards more difficult projects, Jim said. The biggest factor for indexing volume is currently how easy or interesting the project is. “We’ve done a lot of the easy ones,” he said. The U.S. census only comes once per decade. The Freedmen’s bureau project is an example of a really difficult record type that the future holds. These records are going to be increasingly complex. About 60% of all the really valuable US collections have been completed, and about 40% in the UK. That leaves us with spotty coverage for the rest of the world, so we have huge needs when it comes to indexing in other languages, he said.

Jim took a number of questions.

Q. Will you allow people to be signed in for more than one day at a time?

Yes. That is one of the things we are working on. The Church of Jesus Christ of Latter-day Saints is very sensitive when it comes to security. The online version will allow two weeks like Family Tree.

Q. When can I do indexing on my smart phone?

“No time soon.” The new online indexing program can be done on a tablet, but requires more real estate than available on a phone. We want to do it. We are evaluating doing it. But it would be irresponsible for me to give a date.

Q. When will the indexing effort be done?

Never. Only about 30% of published records on FamilySearch.org are indexed. And we are still going to be acquiring records. And we have ongoing partnerships with organizations with projects for records we want access to. And new records are created every day. A big problem we have today is getting images imaged before the records are destroyed.

Q. From the time a project is indexed, how long does it take before the collection is published?

The 1940 census was the best we had ever done. Within days we were putting up states. Most projects are more complex and require more auditing and review. A project can get stuck in arbitration, quality assurance, or reindexing. We have some projects that have been hanging at 99% for more than a year. The model is to shorten that time.

Q. Once in a while you find a record that was misindexed, but there is no way to go back and correct it.

The number one question is, by far, “how do I fix a record that has been indexed incorrectly?” One solution we are considering is in the indexing step: preserve both a and b key. The other side is post-publication. That is the holy grail that we want to fix.

Q. Ancestry has had it for years.

Q. I’ve been arbitrating Kentucky marriage records. No one is following the rules. Should I do the job for them or send it back?

If they are done incorrectly, it depends on how diligent you are. If you want to send it back, that would be fine. If they are missing records from part of the image, send it back and indexers can see what they are missing.

Let me finish off with some recent indexing numbers. I received an email recently with this information:

FamilySearch Indexing English records indexed

And the FamilySearch Indexing page has this information as of 13 August 2016:

FamilySearch Indexing Statistics

Wednesday, August 24, 2016

Jim Ericson and FamilySearch Indexing (Part 1) – #BYUFHGC

Jim Ericson of FamilySearch gave a presentation titled “Straight Talk about the State of Indexing” at the 2016 BYU Conference on Family History and Genealogy. His purpose was to “answer several key questions related to FamilySearch indexing and the program’s future in a direct, no nonsense way.” [It’s been so long since the conference, I’m starting to forget things that aren’t in my notes. Hopefully I don’t mess it up too badly. This will be the first of two articles about Jim’s presentation. Here goes…]

To lead off, Jim thanked those who have indexed. There have been 3 billion names indexed in 1.4 billion records through the FamilySearch indexing program. There have been nearly 250,000 indexers so far in 2016. [Since Jim’s presentation, that number has grown to 262,868 according to the FamilySearch Indexing website.]

For the recent world-wide indexing event 116,000 people indexed 10 million records. Participants represented 110 different countries. While some, like Tonga and Samoa had only a few, this is amazing.

FamilySearch's Jim Erickson talks about the world-wide indexing event.

There were 10,000 youth ages 8 to 17 who participated. FamilySearch likes to get youth involved. Youth indexers come and go, Jim said.

FamilySearch's Jim Erickson talks about the world-wide indexing event.

More than 23,000 (19%) participants were not members of The Church of Jesus Christ of Latter-day Saints. On Facebook there was huge interest by the general public. First time indexers composed 23% of participants. Jim said that is why they do these events. It extends the number of indexers.

Jim told his indexing story. He searched for hours and hours to find the maiden name of William Worley’s wife, Betsy G. He finally found their marriage record and learned it was Gilson.

Jim Erickson spent hours and hours searching for the marriage record of William Worley and Betsy Gilson.

Since then, FamilySearch volunteers have indexed that record and Jim has attached it to Family Tree. “Now people don’t have to go through the process I went through to find Betsy G.,” he said.

Jim said indexing helps us all personally. We learn about family history and learn how to read handwriting. We serve others. We belong to an amazing volunteer community. We improve data entry skills. We increase unity with family and friends and we gain a deeper appreciate for the worth of all men. FamilySearch doesn’t recommend that children start indexing records on their own, but it is a way to collaborate and build family unity, Jim said.

What are the biggest challenges of indexing?

Indexing can be really challenging, especially for beginners. It has an unintuitive software interface. People’s expectation is that you should be able to get started without helps or hints. The handwriting is difficult to read has sometimes has poor legibility. The last few batches often take a long time until researchers buckle down and do the last, hard batches. Instructions vary by project, which is a problem if arbitrators don’t read the instructions and change batches that had been done right. There can be a variety of records, even within the same project.

The software FamilySearch is using can be a challenge. It has had a long, miraculous journey, Jim said. There was a small company called iArchives that was providing software for commercial offshore keying companies. FamilySearch took that software, meant for a trained workforce working on a few projects, and deployed it to a large, diverse workforce. Even though FamilySearch is coming out with web-based indexing, the current software will be used for a long, long time. Some projects have to be offline. But it is now an amazing effort to keep this legacy system running. During the world-wide indexing event an engineer was restarting the server every 10 minutes to prevent it from crashing.

A big challenge of indexing involves human factors. For example, the indexing program used to have a screen showing the percentage of an indexer’s work that was not changed by arbitrators. We’ve removed that because it was causing friction, Jim said. (See “What’s New with Indexing—June 2016” on the FamilySearch blog for more information.) If the indexer has really studied and the arbitrator hasn’t and overrides the correct information, it is really frustrating. You have to remember that indexers and arbitrators are volunteers, Jim said. “We can’t fire them for not doing a good job.” They are doing their best and FamilySearch Indexing is achieving mid-to-high 90th percentile accuracy.

Jim provided some tips for success. Work with a fried or get some training. Focus on a single project at a time for quality and efficiency. Follow the directions. Reach out and help others. Be patient. And stretch yourself into harder projects. “That which we persist in doing becomes easier for us to do—not that the nature of the thing is changed, but that our power to do is increased.” (Attributed to Ralph Waldo Emerson, quoted by Heber J. Grant.)

Tune in next time to learn what is coming in the future and to answers to attendees’ questions.

Tuesday, August 23, 2016

Ancestry Insider in Family Tree Magazine Top 101

The Ancestry Insider is a Family Tree Magazine 101 Best Websites for 2016I recently received this message from Diane Haddad, editor, Family Tree Magazine.

Congratulations! 

Your genealogy website has been named one of our annual 101 best family history websites in the September 2016 issue of Family Tree Magazine. This issue is being mailed to subscribers and is available at ShopFamilyTree.com. It goes on sale August 16 at newsstands. 

Each year, Family Tree Magazine publishes the 101 Best Websites for family history to guide genealogists to the top websites where they can make family history research progress, and to honor the individuals and organizations who create those sites. This year, we took a fresh look at the list, adding more than 30 new, innovative and overlooked sites. For the "old favorites" on the list, we've highlighted new content and features.

The full list of 101 Best Websites for family history, including your site, can also be found using the category links at http://www.familytreemagazine.com/article/101-best-websites-2016 .

Thank you, Diane, David A. Fryxell, and Family Tree Magazine. I am constantly amazed and overwhelmed by the number of quality, awesome websites out there. More are being added everyday. It’s more than I can keep up with. It is an honor to have Diane and David take notice of my small contribution. Their annual list is a great way to keep up with some of the best.

Websites were recognized in one of 16 categories:

101 Best Websites for 2016 main page
2016 Best Big Genealogy Websites
2016 Best Websites for Exploring Your Ancestors' Lives
2016 Best US Genealogy Websites
2016 Best Sites for Sharing Your Genealogy
2016 Best Websites for Putting Ancestors on the Map
2016 Best Genealogy Library Websites
2016 Best Websites for Finding Ancestors in Old Newspapers
2016 Best African-American Genealogy Websites
2016 Best Cemetery and Directory Sites for Genealogy
2016 Best Tech Tools for Genealogy in 2016
2016 Best Immigrant Ancestors Websites
2016 Best British & Irish Genealogy Websites
2016 Best International Genealogy Websites
2016 Best Genetic Genealogy Websites
2016 Best Genealogy News & Help Websites

Monday, August 22, 2016

RootsWeb Update for 20 August 2016

RootsWeb by Ancestry logoHere is the latest I know about the RootsWeb website.

As of 20 August 2016

  • Freepages FTP service seems to be down still.
  • Mailing lists seem to have miscellaneous problems with archives and admin tools.
  • I was able to browse mailing list archives. I understand that was recently broken.
  • The mailing list archive search doesn’t return any emails since sometime in April.
  • I was able to subscribe to a mailing list.
  • I hear reports that emails are being sent, but spam filters are not working, so a lot of the email is spam.
  • User contributed data stats haven’t been updated since 24 February 2016. I don’t know if RootsWeb is currently accepting new data.
  • There are currently 15,297 web pages in the freepages genealogy community index. I haven’t monitored it for change, but there it is.
  • The freepages file manager, http://freepages.rootsweb.ancestry.com/­fileman/, is still missing.

DonFT wrote on 19 August 2016:

I heard from somebody at RW Help whose reply included the words "decisions are being made as to the future availability of this feature." My impression was that the person was referring to the free pages generally. Suggests to me that they may be abandoning the whole thing. Thoughts?

BKip wrote on 18 August 2016:

Having been unable to access the Freepages File Manager since sometime in July I’ve been mostly in the dark about what is going on. My site is fully available for viewing, but I am unable to make any updates. An email to the help desk gave me an ambiguous reply leaving me just as confused. This page is the first place I’ve found where there is at least a bit of information.

Is there another place where there is more information on the status of Freepages?

Is Freepages expected to continue?

Is there a different URL to log-in to the File Manager? http://freepages.rootsweb.ancestry.com/fileman/file_manager.cgi

Any further information would be truly appreciated.

BKip, I’m afraid I have very little information you don’t already have. There is a status page (http://helpdesk.rootsweb.com/ or http://rootsweb.custhelp.com/), but Ancestry.com is not using it.

Tim received this message from RootsWeb on 15 August 2016:

Dear Tim,
Thank you for contacting RootsWeb in regard to Mailing List spam.
We are sorry that you are encountering a problem with spam. We will do all that we can to assist you. The Mailing Lists are undergoing maintenance. Spam filters have temporarily been turned off during this process. Other tools including those for subscribing and unsubscribing are also not available at this time. We expect the spam filters to be re-enabled soon. We apologize for the inconvenience and appreciate your patience.

Bobango2 sent this query to RootsWeb:

Checking in once again on the repairs to the Freepages file manager. RW has over 15000 sites listed in this category. It would be nice to know that they are still working on the issue and have a completion date in mind. It is very frustrating for those of us who have devoted hundreds of hours to these pages not to be able to upload new material or make corrections. Surely, someone in IT can throw light on this matter.

He received this reply on 15 August 2016:

Thank you for contacting RootsWeb in regard to maintenance to the site.
We sincerely apologize for the length of time the maintenance is taking. We had hoped it would be completed by now. Our development team is working on getting this completed as quickly as possible, it is just taking longer than expected. We appreciate your patience and understanding during this time.
If there is anything else with which we might assist you, please let us know

I received this message from the RootsWeb product manager on 15 August 2016:

Right now we are dealing with getting the spam filters working on the mailing lists again. I have nothing new to report other than we are trying to fix problems as we find them.

So, there’s what I know. Post comments as the situation evolves and any of you learn more.

P.S. I got to thinking. How long will Ancestry.com keep the mailing lists running? How much are the mailing lists being used now days? Here’s the historical picture for the number of messages sent during the month of July, since 1995. (Note I skipped some years, as indicated by the dots.) Writing on the wall, guys. Writing on the wall.

Historical graph of the number of RootsWeb mailing list messages during the month of July