[twill] general query re form parsing.
titus at caltech.edu
Tue Jan 24 11:18:59 PST 2006
Peri's problem with badly formatted pages has raised the question of how
robust or tolerant twill should be to really cruddy HTML.
I could stick with the mechanize/ClientForm approach, which is to deal
badly with outright errors that exist in the semantics of the page.
('tidy' will not fix this!)
I could switch to using BeautifulSoup if BS is installed, as well as
continuing to do doing tidy preprocessing if tidy is installed.
I could include BeautifulSoup with twill, too.
I could also modify ClientForm to be tolerant to ParseErrors of the sort
that Peri encounters.
Right now I'm leaning towards including BS and modifying ClientForm.
More information about the twill