[twill] Using tidy to fix broken forms

Titus Brown titus at caltech.edu
Sun Apr 23 16:34:34 PDT 2006


-> I'm having a problem with forms not showing up. My goal is to automate 
-> the renewal of library loans with python & twill but after I login to the 
-> library system with twill I do not get any forms at all:

[ ... ]

-> This is the source code for the page:
-> http://www.ee.oulu.fi/~thautako/library_page.txt

Hi, Tomo,

I've narrowed the problem down to this bit of HTML:

"""
<font>
<INPUT>

<FORM>
<input type="blah">
</form>
"""

Removing either the <font> or the <INPUT> line makes the form visible;
and running tidy doesn't fix it.

If you turn off 'allow_parse_errors', you get error "INPUT before start
of FORM."  The way the HTML parsing code is constructed, failure to
handle *this* error results in failure to actually parse any more code.
(I could explain why, but it's easier to just look at the code:
basically, if a ParseError is raised in 'finish_starttag' then the
parser fails to advance through the erroneous HTML and gets stuck on
it.)

I'm not sure how best to handle this without making wholesale changes to
ClientForm, which seems Bad.  Or at least is not very maintainable.

John, any ideas?

cheers,
--titus



More information about the twill mailing list