[twill] general query re form parsing.

Ed Rahn ed at hfd.com
Tue Jan 24 11:36:54 PST 2006


It would be nice to be able to choose how lenient the parsing code is if you do
make these changes. The error messages about bad HTML are just as important as
failing tests for some.

- Ed

On Tue, 24 Jan 2006 11:18:59 -0800
Titus Brown <titus at caltech.edu> wrote:

> Hey all,
> 
> Peri's problem with badly formatted pages has raised the question of how
> robust or tolerant twill should be to really cruddy HTML.
> 
> I could stick with the mechanize/ClientForm approach, which is to deal
> badly with outright errors that exist in the semantics of the page.
> ('tidy' will not fix this!)
> 
> I could switch to using BeautifulSoup if BS is installed, as well as
> continuing to do doing tidy preprocessing if tidy is installed.
> 
> I could include BeautifulSoup with twill, too.
> 
> I could also modify ClientForm to be tolerant to ParseErrors of the sort
> that Peri encounters.
> 
> Right now I'm leaning towards including BS and modifying ClientForm.
> Thoughts?
> 
> --titus
> 
> _______________________________________________
> twill mailing list
> twill at lists.idyll.org
> http://lists.idyll.org/listinfo/twill
> 



More information about the twill mailing list