Parsing incorrect HTML

TidyLib provides a command-line tool and a library to turn badly formed HTML/XHTML pages into standards compliant ones.

Aside from the point of view of the webmaster who wants to make sure his website is well-formed this package is also useful for web client developers who want to sanitize invalid HTML before feeding it to their parsers.

There’s even a Python interface called uTidyLib.


About this entry