[twill] html entities and latin-1 problem

gabor gabor at nekomancer.net
Sun Mar 5 14:04:57 PST 2006


hi,

i'm using twill-0.8.3, and works fine, except the following problem:

1. have the following html file : "›" (yes, that entity is the 
whole file)
2. start twill-sh, and go to the that page, and do a "showforms"
3. you get the following error:


UnicodeEncodeError: 'latin-1' codec can't encode character u'\u203a' in 
position 0: ordinal not in range(256)
(full stacktrace at the end of the mail)

it's logical that that characters is un-encode-able in latin-1, but 
that's fine. but why is he trying to represent it in latin-1?

as a quick-fix, changing "latin-1" to "utf-8" in
twill/other_packages/mechanize/_html.py/form_parser_args helps,
but i don't think that's the cleanest solution..


any better ideas?

gabor
















=========================
 >>> b.showforms()
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "twill/browser.py", line 223, in showforms
     for n, f in enumerate(self._browser.forms()):
   File "twill/other_packages/mechanize/_mechanize.py", line 244, in forms
     return self._factory.forms()
   File "twill/utils.py", line 307, in forms
     self._forms = parse_fn(response, self._encoding)
   File "twill/other_packages/mechanize/_html.py", line 218, in 
parse_response
     ignore_errors=self.ignore_errors
   File "twill/other_packages/ClientForm.py", line 870, in ParseResponse
     encoding,
   File "twill/other_packages/ClientForm.py", line 906, in ParseFile
     fp.feed(ch)
   File 
"/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sgmllib.py", 
line 95, in feed
     self.goahead(0)
   File 
"/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sgmllib.py", 
line 184, in goahead
     self.handle_entityref(name)
   File "twill/other_packages/ClientForm.py", line 667, in handle_entityref
     self.handle_data(table[fullname].encode(self._encoding))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u203a' in 
position 0: ordinal not in range(256)
=========================



More information about the twill mailing list