[twill] general query re form parsing.

Titus Brown titus at caltech.edu
Tue Jan 24 11:18:59 PST 2006


Hey all,

Peri's problem with badly formatted pages has raised the question of how
robust or tolerant twill should be to really cruddy HTML.

I could stick with the mechanize/ClientForm approach, which is to deal
badly with outright errors that exist in the semantics of the page.
('tidy' will not fix this!)

I could switch to using BeautifulSoup if BS is installed, as well as
continuing to do doing tidy preprocessing if tidy is installed.

I could include BeautifulSoup with twill, too.

I could also modify ClientForm to be tolerant to ParseErrors of the sort
that Peri encounters.

Right now I'm leaning towards including BS and modifying ClientForm.
Thoughts?

--titus



More information about the twill mailing list