10

Is there a command line tool that can remove comments from an XML file? Or do I need to write a small program that makes use of an XML parser to do this?

Update: I'm not interested in solutions that only handle a subset of all possible XML files.

For instance a regexp can't handle XML parsing.

https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la

3 Answers3

18

I would do it in this way:

cat myfile.xml | sed '/<!--.*-->/d' | sed '/<!--/,/-->/d' > cleaned.xml

Or:

awk 'in_comment&&/-->/{sub(/([^-]|-[^-])*--+>/,"");in_comment=0}
 in_comment{next}
 {gsub(/<!--+([^-]|-[^-])*--+>/,"");
  in_comment=sub(/<!--+.*/,"");
  print}'

Or:

xmlstarlet ed -d '//comment()' file.xml
Frantique
  • 8,673
0

To expand on the top answer. If you only want to delete the comment and not the entire line, you should probably use:

sed 's/<!--.*-->//'

In my case, I had a minified XML file where the entire content was in a single line and since the previous solution would delete the entire line where the comment was located, it would completely clear out my file.

bezbos.
  • 101
0

This is good to clean multiline comments (like failed tests) from a xml, least the ones you hand picked and are helpful to the end user:
perl -i -w -0777pe 's/<!--(.(?<!(HELP|TODO)))*?-->//sg' somefile.xml

more about related regex: https://stackoverflow.com/a/1240293/1422630

If there is a way to obtain the same result but using xmlstarlet, I would prefer as there may have some exception that regex may not handle, but for now this is what I have to use.