[twill] Form parsing error - using Python's own HTMLParser.py and not BeautifulSoup/tidy?
Boris Benlon
boris.benlon at googlemail.com
Sat Jan 12 11:21:22 PST 2008
Hello to all on the mailing list from a new member.
I am trying to use twill to automate the use of my mobile phone operator's SMS
web portal, to allow me to send text messages from the command line of my
laptop, using its nice, big keyboard, rather than the tiny, fiddly keypad of my
mobile.
Using twill, I can successfully log in and follow the link to the SMS-sending
page, but then twill crashes when it attempts to parse the forms on that page.
When it crashes, the error seems to be in Python's own HTMLParser.py script.
That puzzles me, because I have BeautifulSoup and tidy installed, and can prove
(I think) that they are both being used by the fact that no exceptions are
raised when commands are issued after requiring them in the config. If these
superior HTML-parsing modules are being used, why is Python's HTML parser being
called all?
twill has successfully parsed all other HTML pages (with forms) that I have
thrown at it. There seems to be something particularly nasty about the HTML on
this particular page (perhaps inserted deliberately by the mobile provider to
prevent just this sort of automation). If twill simply can't handle it, then
I'm happy to accept that. My concern is that there might be something wrong with
my (pretty new) Python or twill installation, which is causing an avoidable
exception to occur.
Could anyone please suggest what is going wrong?
(As the SMS-sending page is only accessible after logging in, for the purposes
of illustration I have copied the HTML of that page and have saved it to a file
on my own server. This copy still causes twill to crash in the same manner as
when using the live version.)
--- Start of text dump ---
-= Welcome to twill! =-
current page: *empty page*
>> config require_tidy 1
current page: *empty page*
>> config require_BeautifulSoup 1
current page: *empty page*
>> config
current configuration:
acknowledge_equiv_refresh : True
allow_parse_errors : True
readonly_controls_writeable : False
require_BeautifulSoup : True
require_tidy : True
use_BeautifulSoup : True
use_tidy : True
with_default_realm : False
current page: *empty page*
>> go http://www.saytheword.org.uk/send-text-preparing.htm
==> at http://www.saytheword.org.uk/send-text-preparing.htm
current page: http://www.saytheword.org.uk/send-text-preparing.htm
>> showlinks
8>< - - - SNIP! I've cut this bit out to save space, but no exceptions
are raised. - - - ><8
>> showforms
Traceback (most recent call last):
File "/usr/bin/twill-sh", line 8, in <module>
load_entry_point('twill==0.9', 'console_scripts', 'twill-sh')()
File "/usr/lib/python2.5/site-packages/twill-0.9-py2.5.egg/twill/shell.py",
line 383, in main
shell.cmdloop(welcome_msg)
File "/usr/lib/python2.5/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib/python2.5/cmd.py", line 219, in onecmd
return func(arg)
File "/usr/lib/python2.5/site-packages/twill-0.9-py2.5.egg/twill/shell.py",
line 42, in do_cmd
print '\nERROR: %s\n' % (str(e),)
File "/usr/lib/python2.5/HTMLParser.py", line 59, in __str__
result = self.msg
AttributeError: 'ParseError' object has no attribute 'msg'
--- End of text dump ---
More information about the twill
mailing list