I needed to compare the FlatXML of two OpenOffice.org files. I usually reformat the XML files using xmllint and then diff those formatted XML files. However I couldn't properly see the differences between for those files because the attributes were on the same line with the element. In order to compare all the attributes one by one, I decided to hack xmllint (and then libxml2 too) as we discussed it recently with Tor recently.

All what I have added is a --diff argument for the xmllint program and change the use of the format parameter in the libxml (when the value equals 2, then separate each attribute and text on different lines). Here are two XML sample files (many thanks to Zvon for their nice XML samples).

Here is file foo.xml:

<AAA>
     <BBB aaa = "111" bbb = "222">
          <CCC/>
          <CCC xxx = "555" yyy = "666" zzz = "777"/>
     </BBB>
     <BBB aaa = "999">
          <CCC xxx = "qq"/>
<DDD xxx = "ww"/>
          <EEE xxx = "oo"/>
     </BBB>
     <BBB>
<DDD xxx = "oo"/>
     </BBB>
</AAA>

Here is file foo2.xml:

<AAA>
  <BBB aaa="111" bbb="222">
    <CCC/>
    <CCC xxx="565" zzz="777" ddd="new"/>
  </BBB>
  <BBB aaa="999">
    <CCC xxx="qq"/>
<DDD xxx="ww"/>
    <EEE xxx="oo"/>
  </BBB>
  <NEW attr="newtoo">Change</NEW>
  <BBB>
<DDD xxx="oo">CONTENT</DDD>
  </BBB>
</AAA>

I have diff'ed those two files using xmllint --format and xmllint --diff to better show the differences.

Here is the output using --format:

--- format-foo.xml  2009-05-18 17:41:33.000000000 +0200
+++ format-foo2.xml 2009-05-18 17:41:42.000000000 +0200
@@ -2,14 +2,15 @@
 <AAA>
   <BBB aaa="111" bbb="222">
     <CCC/>
-    <CCC xxx="555" yyy="666" zzz="777"/>
+    <CCC xxx="565" zzz="777" ddd="new"/>
   </BBB>
   <BBB aaa="999">
     <CCC xxx="qq"/>
<DDD xxx="ww"/>
     <EEE xxx="oo"/>
   </BBB>
+  <NEW attr="newtoo">Change</NEW>
   <BBB>
-
<DDD xxx="oo"/>
+
<DDD xxx="oo">CONTENT</DDD>
   </BBB>
 </AAA>

Here is the output using --diff:

--- diff-foo.xml    2009-05-18 17:40:06.000000000 +0200
+++ diff-foo2.xml   2009-05-18 17:40:13.000000000 +0200
@@ -5,9 +5,9 @@
     bbb="222">
     <CCC/>
     <CCC
-      xxx="555"
-      yyy="666"
-      zzz="777"/>
+      xxx="565"
+      zzz="777"
+      ddd="new"/>
   </BBB>
   <BBB
     aaa="999">
@@ -18,8 +18,14 @@
     <EEE
       xxx="oo"/>
   </BBB>
+  <NEW
+    attr="newtoo">
+Change
+  </NEW>
   <BBB>
<DDD
-      xxx="oo"/>
+      xxx="oo">
+CONTENT
+    </DDD>
   </BBB>
 </AAA>

All the test files are available in this archive. Printing them on a web page caused some important formatting problems.

The patch for libxml2 hasn't yet been submitted to Daniel Veillard, but is available here for those who wants to try now.