Friday, March 4, 2011

Laissez Faire Indexing

FamilySearch has been immensely successful in assembling a large, volunteer workforce of indexers. Yet digitizing microfilm images still outpaces indexing by many orders of magnitude. Many, many more indexers are needed. How long FamilySearch can continue to grow its indexing workforce is unknown.

I think there is a better way to harness the capabilities of the genealogical community. I call it laissez faire indexing. It uses what I call the Amazon model.

The Amazon Model

The Amazon model is to leverage normal user actions to gain information that can be given back, enhancing the user experience. From almost their first day Amazon.com did business, they added value for customers with a feature that told you “People who bought this book also bought…”

Ancestry.com is successfully utilizing the Amazon method to give users additional value by gleaning information from users attaching records into their trees. If some other user attaches a record to a person in a tree, then you get notified that there is a record available for that same person in your tree. Similarly, if you view a record, you are alerted to other records that are associated with that record because both are attached to the same person in someone’s tree.

Microfilm

Think about microfilm. What do you do normally? You crank through the film, figuring out the lay of the land. You look for indexes. You figure out how the records are arranged. You find indexes (or lack thereof). You find page numbers, or ranges of pages for dates or letters of the alphabet or record types. You locate pages of interest. You finagle meaning out of bad handwriting, mold, mildew, book worms, blurs, and ink spots.

Altruism aside, if you could leave yourself some breadcrumbs, wouldn’t you? How cool would it be if you could bookmark the start of a volume, the start of the index, the start of the index for the letter “N” (you’re researching the N’Siders, of course), and each page where that surname appears? If some page numbers were illegible, but you figured them out by going back and forth through nearby pages, wouldn’t you want to save yourself—and other researches (we don’t have to be totally non-altruistic)—some trouble in the future? If you figured out some names that were tough to read, wouldn’t it be handy to tag the name to remind yourself what you figured out? Wouldn’t you bookmark pages with names you wanted to revisit later?

Sure you would! I know I would. (I’m OCD, so I might get sucked into bookmarking and tagging the five pages intermixed with the 20 pages of interest to me. Forget helping others; it would drive me crazy if I didn’t. But I digress…)

Laissez Faire Indexing

While bookmarking pages and tagging information isn’t possible on microfilm, it is possible with online digital images. As the name suggests, Laissez faire indexing lets indexers choose when and what they index without the control of or dependence on a central authority. Indexers choose which collections, which records from the collection, and which fields from the record to index.

I include bookmarking in the concept of laissez faire indexing. Bookmarking is the establishment of a browse structure into a collection.

Ancestry.com, Inc. offers tagging on its Footnote.com website, but calls them annotations.

Footnote supports tagging of names, places, dates, and other text

Build laissez faire indexing around the Amazon model and I think you have advantages over the current practice of bulk indexing.

  • It captures the normal work of researchers that otherwise goes to waste. Judging from the scratches I find on microfilm at the Family History Library, at lot of people have already used the films I use. Each subsequent user redoes some of the work done by previous users, finding beginnings of volumes, locations of indexes, and deciphering information.
  • That last point is worth expounding. Normal collection users don’t have to “index.” They continue doing what they do now with the added convenience of going to the Internet instead of going to Salt Lake or Germany. And with the added convenience of bookmarking and tagging, indexing just happens.
  • I dismissed altruism earlier, but the truth is that the genealogical community is rife with it. The community constantly and consistently “pays it forward.” Laissez faire indexing lowers barriers, catalyzing indexing throughout the community, if only a little bit, as researchers go about their research.
  • Just as a free market economy self optimizes better than central planning, laissez faire indexing prioritizes work better than a central authority. Indexing occurs in the collections, records, and fields that are used the most. This returns the greatest value to the genealogical community.
  • Dedicated indexers choose the collections that interest them the most. High interest motivates indexers to work more, and to work more productively.
  • Societies and historical organizations can self-organize indexing projects. Indeed, ad hoc groups can self-organize around indexing projects, forming new societies and strengthening existing ones.
  • The same infrastructure needed for laissez faire could be used for user corrections and entry of alternate opinions, name variations, and maiden names.

Laissez faire indexing would have disadvantages too, of course.

  • Collections, names, and fields that are seldom or never used might never get indexed. Tools would need to be developed to allow dedicated indexers to identify and fill gaps.
  • Creating a search experience that effectively integrated indexed results with non-indexed possibilities would be challenging.
  • Novices, in particular, might be demotivated by search failure and the need to dig into and understand records. Effective training would need to be woven into the user experience. (On the other hand, records needed by the most novices would get indexed sooner, providing a better first experience than currently provided.)
  • Relinquishing control is difficult for an organization and introduces considerable risks: Implementing laissez faire is expensive. The benefits are unproven. The results are non-deterministic.
  • History has shown that successful online communities depend on small, subtle factors and features. Building a successful community requires agility, flexibility, and plenty of trial and error. I wonder if we are up to the challenge. FamilySearch is not known (yet) for its agility. And while genealogists are flexible, some of us are adverse to trial and error when it comes to website changes.
  • Creating indexing templates would be inconsistent and building consensus among users could be divisive. Building tools to solve this problem would be expensive.

What do you think? Continue indexing as present? Make the leap of faith into laissez faire indexing? Fine tune what we have? Or do something entirely different?

Next week I’ll give you my recommendation.

17 comments:

  1. I vote for Laissez Faire Indexing Amazon Style. I am willing to index what I don't need, longs as I could index what I need once in a while.

    ReplyDelete
  2. Why do we have to choose? do both!

    I think laissez faire indexing is a great idea, especially the bookmarks. Perhaps a developer's challenge for RootsTech 2012...

    ReplyDelete
  3. Do both. Adapting other features of Footnote.com's viewer such as improved image and collection navigation would also be a great help to the current FamilySearch.org
    I am pretty sure Footnote.com has the best genealogy image viewer in the known universe.
    Keep the current indexing system, but add a Footnote.com like viewer with annotations.
    Right now on FamilySearch Indexing there are 2 indexers and an arbitrator for differences. Create a flexible system where 2 or more annotations are kicked to an arbitrator for any necessary correction, but make the system flexible enough to allow the arbitrator to save both if one is simply a variant which may improve search reliability (like Ancestry.com allows alternate names to improve search reliability). Also sync the current Indexing Software with the new annotation capable viewer so that posting of indexes is more automatic and more instant.
    Also make it possible for the 1st indexer to be a FamilySearch Indexing software user and the 2nd indexer to be an annotator. Let the system kick any differences instantly to an arbitrator. Then the batch the 2nd indexer downloads on FamilySearch Indexing software will have the data field already checked against an annotation blocked out like so "" or let the existing annotation value be present in the slot. For example, when you download the indexes of Ancestry.com as one of the 2 indexers several slots are already filled in. A synced system could have slots already filled in for one of the indexers for any of the fields already annotated. It is important that the viewer and software be live synced so that the most recent annotations are always included in any downloaded batch. And this would provide a way for the missing non-annotated data to be completed which is an issue you mentioned.
    Also I am a big supporter of using social media to its fullest. I have discussed on FamilySearch Indexing's Facebook wall recently the opportunities to share how much a person is indexing by linking FamilySearch Accounts to social media like Facebook and Twitter according to a user's preferences. I believe this info with a link would attract more indexers who see it in their friends status updates. The new indexing synced annotatable FamilySearch Viewer which would be part of FamilySearch.org's way to view historic documents should also be social media linked through logging into your FamilySearch Account and according to your preferences update your status feed(s) when you "like" a new record. You could click a like button for records that deal with important figures or your ancestors which will automatically then be posted in your social media feeds as set in preferences. This is only one small way social media could be used.

    ReplyDelete
  4. Check out Waypointing in the Wiki, it is on the way to what you suggest.

    ReplyDelete
  5. This model of indexing documents is brilliant.

    Community-controlled data that is created and curated from the bottom up makes using genealogy sites better.

    The power of the community is something we at Geni believe in tremendously.

    ReplyDelete
  6. Sorry if my last comment was too long. I see it did not get posted. If it was lost let me know and I will send it again.

    I forgot to say about LDSTech.
    That website could surely be used to alleviate the burden of creating the necessary technology and programming. If volunteers can do half of the work for the 4 official church mobile apps then why can't programming volunteers do half of the work to make your idea happen? :)

    ReplyDelete
  7. The Ancestry version is Laissez Faire based on material that is already indexed. And creating the Laissez Faire only works in conjunction with Ancestry family trees. It is helpful but can be incorrect. And it doesn't include the arbitration phase which is critical.

    I would hate to lose the current Family Search indexing method. I have found records in places where I would never have looked. And when there are two people with similar names and birthdates, it brings up both of them.

    We do need more Family Search Indexers. Or perhaps when we get past the census records, that will free up some indexers to work on other projects.

    ReplyDelete
  8. Great ideas! But we don't need to wait for RootsTech 2012 or any company to do this for us. All it takes is some creativity and technical know-how. (Unfortunately, I'm lacking in both areas.) I expanded on this in a blog post today.

    ReplyDelete
  9. The Australian Newspaper archive (http://trove.nla.gov.au/newspaper) lets you correct the OCR text. It's analogous to what you suggest with Laissez Faire indexing. If I find a Fitzhenry reference, then I correct the text and the next article too as a thank you.

    ReplyDelete
  10. The big problem here is that the laissez faire methods produces a collection of data un peu ici, un peu là (a little here, a little there).

    Without the guidance of indexing assigned batches of related data, you'll just end up with random pages being indexed, with no knowledge of what has been done or is yet to be done. The only thing worse than a dataset not being indexed is a partial index, with no inkling of what has been done and what hasn't. I don't have the time to check the same dataset every five minutes for five years, on the outside chance that somebody has indexed another paragraph. Nor do I want to wait five years for the entire work to be haphazardly indexed by the luck of the draw.

    Start an index project on a specific document and finish it. Then move on.

    ReplyDelete
  11. Do both. Adapting other features of Footnote.com's viewer such as improved image and collection navigation would also be a great help to the current FamilySearch.org
    I understand a new HTML version of the viewer is soon to be released. I hope it has some improvements besides not being flash reliant
    I am pretty sure Footnote.com has the best genealogy image viewer in the known universe.
    Keep the current indexing system, but add a Footnote.com like viewer with annotations.

    Right now on FamilySearch Indexing there are 2 indexers and an arbitrator for differences.
    Sync the current Indexing software with the new annotation capable viewer so that posting of indexes is more automatic and more instant.

    One of the copies of the batch that an indexer downloads on FamilySearch Indexing software will have the existing annotation value present in the slot. For example, when you download the indexes of Ancestry.com as one of the 2 indexers several slots are already filled in.

    A synced system could have slots already filled in for one of the indexers for any of the fields already annotated. It is important that the viewer and software be live synced so that the most recent annotations are always included in any downloaded batch. This would provide a way for the missing non-annotated data to be completed which is an issue you mentioned.

    Also I am a big supporter of using social media to its fullest. I have discussed on FamilySearch Indexing's Facebook wall recently the opportunities to share how much a person is indexing by linking FamilySearch Accounts to social media like Facebook and Twitter according to a user's preferences. I believe this info with a link would attract more indexers who see it in their friends status updates. The new indexing synced annotation capable FamilySearch Viewer which would be part of FamilySearch.org's way to view historic documents should also be social media linked through logging into your FamilySearch Account and according to your preferences update your status feed(s) when you "like" a new record. You could click a like button for records that deal with important figures or your ancestors which will automatically then be posted in your social media feeds as set in preferences. This is only one small way social media could be used. The sharing buttons in the current viewer seem not to work for me.

    I forgot to say about LDSTech.
    That website could surely be used to alleviate the burden of creating the necessary technology and programming. If volunteers can do half of the work for the 4 official church mobile apps then why can't programming volunteers do half of the work to make your idea happen? :)

    I am sending this again in Firefox because when I sent it in IE I assume it got lost since my comment never appeared.

    ReplyDelete
  12. There are some great ideas here, both in the original post and in the comments. In the relatively short time that I've been family tree-climbing, I've seen lots of evidence of altruism in the online genealogy community and I'm sure that if a bookmarking / tagging / laissez faire indexing system could be devised and implemented, lots of people would use it and help to open up previously unindexed records. The system might even appeal to those who think that altruism is fine as long as there's something in it for them....

    ReplyDelete
  13. Interestingly enough, this indexing approach is extremely well compatible with the Gentech datamodel (for those that are familiar with it). The indexers would create a set of "personas" along with their events and characteristics found in the source.

    Then the user of a gentech-based genealogy application could simply import that source and its personas, and then indicates that those personas match existing ones in the user's genealogy. If it ever comes that the source was in fact invalid or incorrectly indexed, one can simply break that link between the two personas, and the user's genealogy is back to where it was before the import.

    I have started experimenting with that gentech model (see http://briot.github.com/geneapro/ for those interested), and I know it is possible to use it for a real-world application, in particular showing pedigrees,...

    I really like the proposed approach

    ReplyDelete
  14. I am certainly with those who have commented "lets have both". I would also second Jo Fitz's comment about the Australian Newspaper Archive on Trove. While many of the records on microfilm may not be readable by OCR, the principal remains the same.

    I can also see Anon's point about "a little here, a little there". Perhaps the Laissez Faire style could be there as a precursor to official indexing projects, and be fed into the moderation process when an indexing project officially starts.

    ReplyDelete
  15. This is an interesting idea. Crowd sourcing is gaining a foothold in the academic community. The NY Times (12/27/10) in an article “Scholars Recruit Public For Project” examined The Bentham Project to have volunteers transcribe the philosopher’s papers. Each volunteer can choose which papers he wants to work on by subject. George Mason University is launching a similar project to transcribe the War Department documents from the early days of the republic. Their experiences might offer some insights about managing a crowd sourced project.

    ReplyDelete
  16. I wonder if what is being suggested here could apply to my recent experience of creating my own database of all the Liebensteins mentioned in a 16th century Zurich church record. I would love a way to show on the microfilm or somewhere that I've done this and would share it. If it were digital, that would have been easier. I've done quite a bit of annotating for Footnote.com and found it very rewarding to see my community's record name searchable. I vote for some system of annotation/indexing so our efforts help others and create value-added indexing.

    ReplyDelete
  17. Arent commercial indexers indexing things which have already been indexed (eg census)

    Surely thats the place to organize people

    ReplyDelete

Note: Only a member of this blog may post a comment.