At the 2010 NGS Conference GenTech Luncheon David Rencher presented “The Top 10 Areas Where Technology Can Still Make a Real Difference in Genealogy : Could You Please Hurry?” In “We Want Tech and We Want It Now” I review technologies already available, at least in infant form. Today I look at:
# 9 OCR Handwriting
David Rencher would like a viewing lens that would read handwritten documents, like the “magic lens” shown to the right from www.DoHistory.org. Drag the magic lens over a handwritten document and magically the text is shown clearly.
Well, David, let me give you a status report on this technology.
Some limited applications are already available. A2iA can do envelopes and checks. It can classify document types, and it can read form fields.
The now defunct BYU Family History Technology Workshop showcased student research for several years, including free form handwriting recognition. For example,
- “Progress with Searchable Indexes for Handwritten Documents” (PDF)
- “Interactive Smoothing of Handwritten Text Images Using a Bilateral filter” (PDF)
- “Handwriting Recognition for Genealogical Records” (PPT)
- “Using a Hidden-Markov Model in Semi-Automatic Indexing of Historical Handwritten Records” (PDF)
- “Towards Searchable Indexes for Handwritten Documents” (PDF)
- “Thresholding of Text Documents” (PDF)
Sorry, David. Technology is a long way away from providing a “Magic lens.”
Do you know of other work being done to read old handwriting automatically? Share your discoveries here by leaving a comment.
At the National Archives in the Netherlands, we're participating in an academic study to do OCR on handwritten archival documents. See the Scratch project page for more information.
ReplyDeleteThis would be one of the holy grails for digitization, research, and search. I've also not found any usable tools yet - but this post is a good collection of efforts in this area.
ReplyDeleteOne suggestion is that the OCR doesn't need to be 100% perfect (even typed documents don't get OCRd perfectly). Getting any significant recognition of old handwriting would be valuable (10% or more).
Back in the 1990s I scanned a bibliography with some handwritten notes on it. I used whatever the best of breed OCR program was at the time - can't remember the name of it. I was surprised to see that my handwritten notes were correctly read by the software.
ReplyDeleteThis post reminds me of a long time ago when in school we used to have a subject Handwriting expert (with proper lectures - although I don’t remember about the exams: P) called "handwriting improvement". It sounds quite a funny but helpful course now :)
ReplyDeleteHandwriting analysis