[twill] html entities and latin-1 problem
gabor
gabor at nekomancer.net
Sun Mar 5 14:04:57 PST 2006
hi,
i'm using twill-0.8.3, and works fine, except the following problem:
1. have the following html file : "›" (yes, that entity is the
whole file)
2. start twill-sh, and go to the that page, and do a "showforms"
3. you get the following error:
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u203a' in
position 0: ordinal not in range(256)
(full stacktrace at the end of the mail)
it's logical that that characters is un-encode-able in latin-1, but
that's fine. but why is he trying to represent it in latin-1?
as a quick-fix, changing "latin-1" to "utf-8" in
twill/other_packages/mechanize/_html.py/form_parser_args helps,
but i don't think that's the cleanest solution..
any better ideas?
gabor
=========================
>>> b.showforms()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "twill/browser.py", line 223, in showforms
for n, f in enumerate(self._browser.forms()):
File "twill/other_packages/mechanize/_mechanize.py", line 244, in forms
return self._factory.forms()
File "twill/utils.py", line 307, in forms
self._forms = parse_fn(response, self._encoding)
File "twill/other_packages/mechanize/_html.py", line 218, in
parse_response
ignore_errors=self.ignore_errors
File "twill/other_packages/ClientForm.py", line 870, in ParseResponse
encoding,
File "twill/other_packages/ClientForm.py", line 906, in ParseFile
fp.feed(ch)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sgmllib.py",
line 95, in feed
self.goahead(0)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sgmllib.py",
line 184, in goahead
self.handle_entityref(name)
File "twill/other_packages/ClientForm.py", line 667, in handle_entityref
self.handle_data(table[fullname].encode(self._encoding))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u203a' in
position 0: ordinal not in range(256)
=========================
More information about the twill
mailing list