[twill] general query re form parsing.

Titus Brown titus at caltech.edu
Tue Jan 24 11:18:59 PST 2006

Hey all,

Peri's problem with badly formatted pages has raised the question of how
robust or tolerant twill should be to really cruddy HTML.

I could stick with the mechanize/ClientForm approach, which is to deal
badly with outright errors that exist in the semantics of the page.
('tidy' will not fix this!)

I could switch to using BeautifulSoup if BS is installed, as well as
continuing to do doing tidy preprocessing if tidy is installed.

I could include BeautifulSoup with twill, too.

I could also modify ClientForm to be tolerant to ParseErrors of the sort
that Peri encounters.

Right now I'm leaning towards including BS and modifying ClientForm.


More information about the twill mailing list