Wednesday, January 19, 2011

We Want Tech: OCR Handwriting

DoHistory.og invites you to explore the process of piecing together the lives of ordinary people in the pastAt the 2010 NGS Conference GenTech Luncheon David Rencher presented “The Top 10 Areas Where Technology Can Still Make a Real Difference in Genealogy : Could You Please Hurry?” In “We Want Tech and We Want It Now” I review technologies already available, at least in infant form. Today I look at:

# 9 OCR Handwriting

David Rencher would like a viewing lens that would read handwritten documents, like the “magic lens” shown to the right from www.DoHistory.org. Drag the magic lens over a handwritten document and magically the text is shown clearly.

Well, David, let me give you a status report on this technology.

Some limited applications are already available. A2iA can do envelopes and checks. It can classify document types, and it can read form fields.

A2iA FieldReader is designed to read handwritten forms

The now defunct BYU Family History Technology Workshop showcased student research for several years, including free form handwriting recognition. For example,

  • Douglas J Kennard and William A Barret, "Progress with Searchable Indexes for Handwritten Document," PDF, _BYU Family History Technology Workshop_ (fht.byu.edu : accessed 19 Jan 2011). “Progress with Searchable Indexes for Handwritten Documents” (PDF)
  • “Interactive Smoothing of Handwritten Text Images Using a Bilateral filter” (PDF)
  • “Handwriting Recognition for Genealogical Records” (PPT)
  • “Using a Hidden-Markov Model in Semi-Automatic Indexing of Historical Handwritten Records” (PDF)
  • “Towards Searchable Indexes for Handwritten Documents” (PDF)
  • “Thresholding of Text Documents” (PDF)

Sorry, David. Technology is a long way away from providing a “Magic lens.”

Do you know of other work being done to read old handwriting automatically? Share your discoveries here by leaving a comment.

4 comments:

  1. At the National Archives in the Netherlands, we're participating in an academic study to do OCR on handwritten archival documents. See the Scratch project page for more information.

    ReplyDelete
  2. This would be one of the holy grails for digitization, research, and search. I've also not found any usable tools yet - but this post is a good collection of efforts in this area.

    One suggestion is that the OCR doesn't need to be 100% perfect (even typed documents don't get OCRd perfectly). Getting any significant recognition of old handwriting would be valuable (10% or more).

    ReplyDelete
  3. Back in the 1990s I scanned a bibliography with some handwritten notes on it. I used whatever the best of breed OCR program was at the time - can't remember the name of it. I was surprised to see that my handwritten notes were correctly read by the software.

    ReplyDelete
  4. This post reminds me of a long time ago when in school we used to have a subject Handwriting expert (with proper lectures - although I don’t remember about the exams: P) called "handwriting improvement". It sounds quite a funny but helpful course now :)
    Handwriting analysis

    ReplyDelete