[twill] Form parsing error - using Python's own HTMLParser.py and not BeautifulSoup/tidy?

Boris Benlon boris.benlon at googlemail.com
Sat Jan 12 11:21:22 PST 2008


Hello to all on the mailing list from a new member.

I am trying to use twill to automate the use of my mobile phone operator's SMS
web portal, to allow me to send text messages from the command line of my
laptop, using its nice, big keyboard, rather than the tiny, fiddly keypad of my
mobile.

Using twill, I can successfully log in and follow the link to the SMS-sending
page, but then twill crashes when it attempts to parse the forms on that page.
When it crashes, the error seems to be in Python's own HTMLParser.py script.
That puzzles me, because I have BeautifulSoup and tidy installed, and can prove
(I think) that they are both being used by the fact that no exceptions are
raised when commands are issued after requiring them in the config. If these
superior HTML-parsing modules are being used, why is Python's HTML parser being
called all?

twill has successfully parsed all other HTML pages (with forms) that I have
thrown at it. There seems to be something particularly nasty about the HTML on
this particular page (perhaps inserted deliberately by the mobile provider to
prevent just this sort of automation). If twill simply can't handle it, then
I'm happy to accept that. My concern is that there might be something wrong with
my (pretty new) Python or twill installation, which is causing an avoidable
exception to occur.

Could anyone please suggest what is going wrong?

(As the SMS-sending page is only accessible after logging in, for the purposes
of illustration I have copied the HTML of that page and have saved it to a file
on my own server. This copy still causes twill to crash in the same manner as
when using the live version.)

--- Start of text dump ---

  -= Welcome to twill! =-

current page:  *empty page*
>> config require_tidy 1
current page:  *empty page*
>> config require_BeautifulSoup 1
current page:  *empty page*
>> config
current configuration:
        acknowledge_equiv_refresh : True
        allow_parse_errors : True
        readonly_controls_writeable : False
        require_BeautifulSoup : True
        require_tidy : True
        use_BeautifulSoup : True
        use_tidy : True
        with_default_realm : False

current page:  *empty page*
>> go http://www.saytheword.org.uk/send-text-preparing.htm
==> at http://www.saytheword.org.uk/send-text-preparing.htm
current page: http://www.saytheword.org.uk/send-text-preparing.htm
>> showlinks
8>< - - - SNIP! I've cut this bit out to save space, but no exceptions
are raised. - - - ><8
>> showforms
Traceback (most recent call last):
  File "/usr/bin/twill-sh", line 8, in <module>
    load_entry_point('twill==0.9', 'console_scripts', 'twill-sh')()
  File "/usr/lib/python2.5/site-packages/twill-0.9-py2.5.egg/twill/shell.py",
line 383, in main
    shell.cmdloop(welcome_msg)
  File "/usr/lib/python2.5/cmd.py", line 142, in cmdloop
    stop = self.onecmd(line)
  File "/usr/lib/python2.5/cmd.py", line 219, in onecmd
    return func(arg)
  File "/usr/lib/python2.5/site-packages/twill-0.9-py2.5.egg/twill/shell.py",
line 42, in do_cmd
    print '\nERROR: %s\n' % (str(e),)
  File "/usr/lib/python2.5/HTMLParser.py", line 59, in __str__
    result = self.msg
AttributeError: 'ParseError' object has no attribute 'msg'

--- End of text dump ---



More information about the twill mailing list