[twill] html entities and latin-1 problem

Titus Brown titus at caltech.edu
Sun Mar 5 14:29:09 PST 2006


On Sun, Mar 05, 2006 at 11:04:57PM +0100, gabor wrote:
-> hi,
-> 
-> i'm using twill-0.8.3, and works fine, except the following problem:
-> 
-> 1. have the following html file : "›" (yes, that entity is the 
-> whole file)
-> 2. start twill-sh, and go to the that page, and do a "showforms"
-> 3. you get the following error:
-> 
-> 
-> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u203a' in 
-> position 0: ordinal not in range(256)
-> (full stacktrace at the end of the mail)
-> 
-> it's logical that that characters is un-encode-able in latin-1, but 
-> that's fine. but why is he trying to represent it in latin-1?
-> 
-> as a quick-fix, changing "latin-1" to "utf-8" in
-> twill/other_packages/mechanize/_html.py/form_parser_args helps,
-> but i don't think that's the cleanest solution..
-> 
-> 
-> any better ideas?

Short answer -- unicode support in mechanize is still young ;(.

I have one or two other unicode issues to look at today, too.

--titus



More information about the twill mailing list