[twill] html entities and latin-1 problem
Michele Simionato
michele.simionato at gmail.com
Thu Mar 9 00:48:24 PST 2006
On 3/5/06, Titus Brown <titus at caltech.edu> wrote:
> Short answer -- unicode support in mechanize is still young ;(.
>
> I have one or two other unicode issues to look at today, too.
There is something wrong with the 0.8.3 release. I was testing a Plone site with
the previous versions of twill and everything was fine. However now I get
an Unicode error when I try to 'formvalue' to that page; the page
starts as follows
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<meta http-equiv="Content-Type"
content="text/html;charset=utf-8" />
<title>
Portal
—
Portal
</title>
and contains the Plone login form.
I get
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/commands.py",
line 386, in formvalue
form = browser.get_form(formname)
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/browser.py",
line 254, in get_form
forms = self._browser.forms()
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/other_packages/mechanize/_mechanize.py",
line 244, in forms
return self._factory.forms()
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/utils.py",
line 307, in forms
self._forms = parse_fn(response, self._encoding)
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/other_packages/mechanize/_html.py",
line 218, in parse_response
ignore_errors=self.ignore_errors
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/other_packages/ClientForm.py",
line 870, in ParseResponse
encoding,
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/other_packages/ClientForm.py",
line 906, in ParseFile
fp.feed(ch)
File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
self.goahead(0)
File "/usr/lib/python2.4/sgmllib.py", line 184, in goahead
self.handle_entityref(name)
File "/usr/lib/python2.4/site-packages/twill-0.8.3-py2.4.egg/twill/other_packages/ClientForm.py",
line 667, in handle_entityref
self.handle_data(table[fullname].encode(self._encoding))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2014'
in position 0: ordinal not in range(256)
Notice that \u2014 is the em-dash character and twill is using Latin-1
even if the
content-type is utf-8
More information about the twill
mailing list