[twill] general query re form parsing.

William K. Volkman wkvsf at users.sourceforge.net
Tue Jan 24 11:30:06 PST 2006


Hi Titus,
On Tue, 2006-01-24 at 12:18, Titus Brown wrote:
> Peri's problem with badly formatted pages has raised the question of how
> robust or tolerant twill should be to really cruddy HTML.
<snip>
> I could include BeautifulSoup with twill, too.
> 
> I could also modify ClientForm to be tolerant to ParseErrors of the sort
> that Peri encounters.
> 
> Right now I'm leaning towards including BS and modifying ClientForm.
> Thoughts?

This is the gist of the code I sent you, sorry I never got
back to putting it into twill (and haven't checked to see if
you got it in there).

Basically you have two objectives.  For those testing their
own site, they need to know they have broken html so that
they can fix it.  For those attempting to automate access
to other web sites, you can't fix the html, so using
BeautifulSoup to fixup the poor html then feeding that
to clientform gives you capability to use bad pages.

Perhaps a "strict" or "relaxed" parsing flag to choose
the behavior?

HTH,
William.





More information about the twill mailing list