[twill] general query re form parsing.
titus at caltech.edu
Tue Jan 24 11:34:33 PST 2006
-> Hi Titus,
-> On Tue, 2006-01-24 at 12:18, Titus Brown wrote:
-> > Peri's problem with badly formatted pages has raised the question of how
-> > robust or tolerant twill should be to really cruddy HTML.
-> > I could include BeautifulSoup with twill, too.
-> > I could also modify ClientForm to be tolerant to ParseErrors of the sort
-> > that Peri encounters.
-> > Right now I'm leaning towards including BS and modifying ClientForm.
-> > Thoughts?
-> This is the gist of the code I sent you, sorry I never got
-> back to putting it into twill (and haven't checked to see if
-> you got it in there).
Actually, John Lee (mechanize author) beat me to the punch. It's now
available directly through mechanize!
-> Basically you have two objectives. For those testing their
-> own site, they need to know they have broken html so that
-> they can fix it. For those attempting to automate access
-> to other web sites, you can't fix the html, so using
-> BeautifulSoup to fixup the poor html then feeding that
-> to clientform gives you capability to use bad pages.
-> Perhaps a "strict" or "relaxed" parsing flag to choose
-> the behavior?
Good points. Unless I can get a patch to toggle behavior like this into
ClientForm, though, I'll be left maintaining an unofficial branch of
ClientForm -- not complaining, just pointing it out ;).
So, lessee: proposed options would be,
* toggle tidy use (currently possible);
* toggle BeautifulSoup use;
* toggle "relaxed" ClientForm parsing;
Goodness, that would solve all sorts of problems, I think ;).
More information about the twill