========= lxml.html ========= :Author: Ian Bicking Since version 2.0, lxml comes with a dedicated Python package for dealing with HTML: ``lxml.html``. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks. .. contents:: .. 1 Parsing HTML 1.1 Parsing HTML fragments 1.2 Really broken pages 2 HTML Element Methods 3 Running HTML doctests 4 Creating HTML with the E-factory 4.1 Viewing your HTML 5 Working with links 5.1 Functions 6 Forms 6.1 Form Filling Example 6.2 Form Submission 7 Cleaning up HTML 7.1 autolink 7.2 wordwrap 8 HTML Diff 9 Examples 9.1 Microformat Example The main API is based on the `lxml.etree`_ API, and thus, on the ElementTree_ API. .. _`lxml.etree`: tutorial.html .. _ElementTree: http://effbot.org/zone/element-index.htm Parsing HTML ============ Parsing HTML fragments ---------------------- There are several functions available to parse HTML: ``parse(filename_url_or_file)``: Parses the named file or url, or if the object has a ``.read()`` method, parses from that. If you give a URL, or if the object has a ``.geturl()`` method (as file-like objects from ``urllib.urlopen()`` have), then that URL is used as the base URL. You can also provide an explicit ``base_url`` keyword argument. ``document_fromstring(string)``: Parses a document from the given string. This always creates a correct HTML document, which means the parent node is ````, and there is a body and possibly a head. ``fragment_fromstring(string, create_parent=False)``: Returns an HTML fragment from a string. The fragment must contain just a single element, unless ``create_parent`` is given; e.g., ``fragment_fromstring(string, create_parent='div')`` will wrap the element in a ``