I needed to compare the FlatXML of two OpenOffice.org files. I usually reformat the XML files using xmllint and then diff those formatted XML files. However I couldn't properly see the differences between for those files because the attributes were on the same line with the element. In order to compare all the attributes one by one, I decided to hack xmllint (and then libxml2 too) as we discussed it recently with Tor recently.
All what I have added is a --diff
argument for the xmllint program and
change the use of the format parameter in the libxml (when the value
equals 2, then separate each attribute and text on different lines).
Here are two XML sample files (many thanks to Zvon for their nice XML
samples).
Here is file foo.xml
:
<AAA>
<BBB aaa = "111" bbb = "222">
<CCC/>
<CCC xxx = "555" yyy = "666" zzz = "777"/>
</BBB>
<BBB aaa = "999">
<CCC xxx = "qq"/>
<DDD xxx = "ww"/>
<EEE xxx = "oo"/>
</BBB>
<BBB>
<DDD xxx = "oo"/>
</BBB>
</AAA>
Here is file foo2.xml
:
<AAA>
<BBB aaa="111" bbb="222">
<CCC/>
<CCC xxx="565" zzz="777" ddd="new"/>
</BBB>
<BBB aaa="999">
<CCC xxx="qq"/>
<DDD xxx="ww"/>
<EEE xxx="oo"/>
</BBB>
<NEW attr="newtoo">Change</NEW>
<BBB>
<DDD xxx="oo">CONTENT</DDD>
</BBB>
</AAA>
I have diff'ed those two files using xmllint --format
and
xmllint --diff
to better show the differences.
Here is the output using --format
:
--- format-foo.xml 2009-05-18 17:41:33.000000000 +0200
+++ format-foo2.xml 2009-05-18 17:41:42.000000000 +0200
@@ -2,14 +2,15 @@
<AAA>
<BBB aaa="111" bbb="222">
<CCC/>
- <CCC xxx="555" yyy="666" zzz="777"/>
+ <CCC xxx="565" zzz="777" ddd="new"/>
</BBB>
<BBB aaa="999">
<CCC xxx="qq"/>
<DDD xxx="ww"/>
<EEE xxx="oo"/>
</BBB>
+ <NEW attr="newtoo">Change</NEW>
<BBB>
-
<DDD xxx="oo"/>
+
<DDD xxx="oo">CONTENT</DDD>
</BBB>
</AAA>
Here is the output using --diff
:
--- diff-foo.xml 2009-05-18 17:40:06.000000000 +0200
+++ diff-foo2.xml 2009-05-18 17:40:13.000000000 +0200
@@ -5,9 +5,9 @@
bbb="222">
<CCC/>
<CCC
- xxx="555"
- yyy="666"
- zzz="777"/>
+ xxx="565"
+ zzz="777"
+ ddd="new"/>
</BBB>
<BBB
aaa="999">
@@ -18,8 +18,14 @@
<EEE
xxx="oo"/>
</BBB>
+ <NEW
+ attr="newtoo">
+Change
+ </NEW>
<BBB>
<DDD
- xxx="oo"/>
+ xxx="oo">
+CONTENT
+ </DDD>
</BBB>
</AAA>
All the test files are available in this archive. Printing them on a web page caused some important formatting problems.
The patch for libxml2 hasn't yet been submitted to Daniel Veillard, but is available here for those who wants to try now.