Thursday, February 19, 2009

FTM 2009

Arnie Krauise has done a biting benchmark of FTM 2009 performance. Read the whole thing at "FTM 2009: A Comparison."

Among other performance issues, what really jumped off the page at me was how slow GEDCOM file operations were using the new, 32-bit code versions of Family Tree Maker (FTM). The old FTM 2006/16, the last 16-bit code version of FTM, read a test GEDCOM in 21 minutes. Yes, it was a big file. The new FTM 2008 and 2009 releases, which use the new 32-bit code, took a whoping 342 minutes and 312 minutes, respectively. That's between 5 and 6 hours!

Programmers usually approach file import in straightforword, easy-to-program ways. A GEDCOM file is a text file; open it with Notepad and you can read the contents. It may make little or no sense to you, but you can read it. But I digress. Since it is a text file, the programmers are reading it one line at a time. How can I say so with such confidence? Read my lips: "that's between 5 and 6 hours!"

Read, then process. Read a little more, then process. Read a little, then process. Easy to program; killer on performance. The same sector of data is redundantly read over-and-over from the disk.

Somebody at Ancjestry.com/The Generations Network needs to tell the developers that that kind of inefficient programming isn't going to hack it with consumers. Windows is a modern operating system with features like memory-mapped files. The test system had 2 GB of RAM, if I recall correctly. They could read the entire file into memory in a minute or two, process 100,000+ individuals from the memory buffer in five minutes or so, and be done with the whole thing in under 10 minutes.

Thanks to Randy Seaver and Hugh Watkins for pointing out the update to Krauise's review.

5 comments:

  1. Wow. THANK YOU for printing this column AND for putting up the link to the full review. I was eager to update to the new 2009 version, but after reading your column & this review, I'll stick w/my FTM version 16! I may grumble a bit with the time it takes to back up my files (both using the BACKUP feature & the Exit feature), but a couple of minutes is far better than who knows how many hours!

    ReplyDelete
  2. I'm Michelle Pfister, Sr. Product Manager and Business Owner for Family Tree Maker. We're aware that GEDcom import of large files is in need of a speed improvement. It is an area we are actively improving, along with several other performance enhancements.

    Here are a few things to keep in mind: 1) Arnie Krauise's tests were done on a very large file of 162,221 individuals. Perhaps 1% of FTM users have a file that large. Importing smaller files is much faster, so most FTM users will not see a big performance problem like Arnie saw; 2) Having said that, we take the needs of large-file users very seriously, and as I mentioned, we're actively working to improve performance in FTM on large and smaller files; 3) GEDcom imports occur once, and are not part of a normal FTM backup, so pnyswingal (previous comment) won't see slower backups in FTM 2009.

    ReplyDelete
  3. First for the record, I used to use FTM and thankfully changed before 2008 to Genbox. Even besides the horrible problems with the 2008 version I would never go back because it lacks professional level citation and sourcing abilities.

    But on this situation it seems to me that there are 3 possibilities:

    1) FTM's programmers are not the best and brightest;

    2) FTM's programmers are very good but management is not and the failure to optimize performance is on said management;

    3) FTM's programmers and management are fully competent but either that level or a higher level of management intentionally chose not to optimize nor test such optimization. This would include intentionally using inefficient programming which they knew would not be an issue with most users and the file sizes they typically have, in line with the poor excuse given by Michelle above.

    So which is it?

    If someone thinks I am giving a false choice and there is another that should be listed, then please give it.

    I mean is there a document at Ancestry on performance/programming issues for FTM that notes the performance issues for small vs. large files, as in Ancestry already knew this? Or rather did Ancestry not already know it because it used, intentionally or otherwise, poor programming techniques and then also did not test for performance on its own?

    ReplyDelete
  4. Mike,

    How about this one?

    4) Programmers and management understand the law of diminishing returns and make judgment calls with more information than either you or I have.

    In my opinion, since I fall into the broad spectrum of users doing ancestral genealogy, I'm going to fall into the group with less than 8,192 individuals in my database. Once they've optimized my case, I'd rather the programmers move on to the additional features I need. Let the university doing genetic studies of Iceland buy one of the other programs.

    -- The Ancestry Insider

    ReplyDelete
  5. Insider,

    Well that could too be another possibility. But we are talking about programming algorithms here aren't we? I mean how much more code (or less?) would it take to do it the other way, and how much difference in server performance in doing so?

    Also again my question is did Ancestry's FTM management team in fact know and state internally in writing all this *in advance*?

    ReplyDelete