Links
- github: lxml/lxml
- Depends on libxml2
- Tutorials
- XPath Support in ElementTree
- lxml Module Docs
- lxml.etree Module
- ElementBase
- remove()
- e.g.
elem.getparent().remove(elem)
- ElementBase
- lxml.html Module
- lxml.etree Module
- lxml.etree
- lxml.html
- lxml.objectify
Snippets
Getting all text from inside an element
From: ElementTree: Bits and Pieces
The text attribute contains the text immediately inside an element, but it does not include text inside subelements. To get all text, you can use something like:
def gettext(elem): text = elem.text or "" for e in elem: text += gettext(e) if e.tail: text += e.tail return text
Removing elements
From: ElementTree: Bits and Pieces
If you're using ElementTree 1.3, then the serialization code will leave out the tags for elements that have their tag attribute set to None.
To remove an element from a tree, you have to replace the element with its contents. This includes not only the subelements, but also the text and tail attributes.
The following function takes a tree and a filter function, and removes all subelements for which the filter returns false.
def cleanup(elem, filter): out = [] for e in elem: cleanup(e, filter) if not filter(e): if e.text: if out: out[-1].tail += e.text else: elem.text += e.text out.extend(e) if e.tail: if out: out[-1].tail += e.tail else: elem.text += e.tail else: out.append(e) elem[:] = out
Note that the top element itself isn’t checked; if you need to remove that, you have to do that at the application level.
Instead of writing a filter function, you can iterate over the tree and set the tag to None for the elements you want to remove. When you’ve checked all elements, call the cleanup function as follows:
cleanup(elem, lambda e: e.tag)