I have been away quite a long time for vacation: a one month trip around China with my wife. I wanted to write some blog post, but I still couldn't find time for it... but I have something cooking here that is worth talking about. I have started to write a small command line tool showing the internals of .doc files. Thanks to apache POI and the file format specs, it currently outputs quite nicely the paragraphs and their runs as well as the field descriptors. This helps me a lot to understand what is going on in the OpenOffice.org WW8 import filter, but is currently not complete.
The sources are available under GPL v3 on this gitorious repository: http://gitorious.org/ww8dumper/ww8dumper, feel free to propose some patches to make is progress!