Powered by Ninja Monkeys!

Computational linguistics help uncover government secrets

This article from yesterday’s news talks about a program for revealing blacked-out words in documents. They used it against a memorandum from the USA Dept. of Defense yielding interesting results:

They said that although the name of a country had been blacked out in that memorandum, their software showed that it was highly likely the document named South Korea as having helped the Iraqis.
Posted at 1am on 11/05/04 | no comments | Filed Under: science, security read on

Parsing incorrect HTML

TidyLib provides a command-line tool and a library to turn badly formed HTML/XHTML pages into standards compliant ones.

Aside from the point of view of the webmaster who wants to make sure his website is well-formed this package is also useful for web client developers who want to sanitize invalid HTML before feeding it to their parsers.

There’s even a Python interface called uTidyLib.

Posted at 1am on 10/05/04 | no comments | Filed Under: programming, python read on