Importing the text inside the shapes from docx files is a complex problem in OpenOffice.org. Most of the problem comes from the fact that there are two text engines in OOo like in older versions of Word:

  • the Writer text engine used in the text documents and in the text frames
  • the edit engine used in Draw / Impress and for the shapes content in Writer.

The edit engine is supporting only simple formatted text, whereas the Writer text engine is supporting other objects like tables, pictures, etc. In Word 2007 any shape can contain any piece of content: not only text. The problem when importing is then: how to import the shapes containing complex text? A short term solution is to create a text frame for any rectangle shape containing text and keeping shapes for anything else. Some data could then be lost, for example if an ellipse shape is containing a table. A long term solution would be to get rid of the edit engine and use the writer engine everywhere...

I managed to create the shape or frame depending on the docx input and to set the text in it. There is a really annoying thing with the text frames and shapes: their UNO APIs are completely different: then all the work with the shape properties have to be done another time with the frame properties... Couldn't at least the fill / line properties be the same for both TextFrame and Shape services ?

Here are some screenshots showing the import of a docx file with shapes containing text. There is no screenshot for the OdfConverter as the generated odf document makes OOo crash!

The original document

The original document

Imported with a non fixed OOo

Imported with a non fixed OOo

Imported with the fixed OOo

Imported with the fixed OOo