1

I am just beginning to learn sed and awk. I have to submit an homework assignment tomorrow, which is a copy-paste from Wikipedia. Just the opportunity to practice some sed scripting!

So I have the document in html format. Now I need to replace [<number>] with nothing. How would I do this?

This is what I tried, but I think it does not even match the pattern I want:

cat content.xml | sed 's/\[\d+\]/ /g' > content2.xml

As a next stage, I will be implementing the replacement of these patterns, which are hyperlinks, but even the above mentioned simple pattern is not being matched:

<a href="https://en.wikipedia.org/wiki/Immune_system">immune system</a>

and then remove the citations:

<a name="cite_ref-Gleeson2007_27-0"/><a href="https://en.wikipedia.org/wiki/Physical_exercise#cite_note-Gleeson2007-27">[27]</a>
daltonfury42
  • 5,559

1 Answers1

1

You went the Wrong direction, you should learn XML/XSLT instead :) (XML Style Sheet). Either for use with ODT or XHTML. For ODT, a macro may be be better, but I don't know it.

Make a look on this accepted answer: RegEx match open tags except XHTML self-contained tags

The solution in this answer for How to replace all images in Libreoffice with their description should work for you too with little modification.

user.dz
  • 49,176