[twill] read unicode form result

Tue Jul 15 03:42:07 PDT 2008

Hi,

Twill, mechanize, Beatiful Soup and Python are really neat.
It is really easy to write twill programs to test web
applications.  I have a problem trying to read unicode.
I wrote a little program included below.  It works fine
if I change the "en_zh" to "en_de", then the result is
in german.  I was wondering if there is some way to
persuade it to read unicode?

The returned page starts with:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
<title>Yahoo! Babel Fish - Text Translation and Web Page Translation</title>
<meta name="description" content="Yahoo! Babel Fish provides free online text and web page language translation tools!">
<meta name="keywords" content="translation, translator, language, machine translation, automatic translation">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

I notice there are 2 lines that specify the charset, I was
wondering if that might be confusing mechanize.

Thanks, Mark

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs, sys
from twill import get_browser
from twill.commands import *
from BeautifulSoup import BeautifulSoup
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
from_and_to_languages = 'en_zh'
from_text = 'Some very simple principles are behind what creates our weather.'
b = get_browser()
b.go('http://au.babelfish.yahoo.com/')
formvalue('2', 'lp', from_and_to_languages)
formvalue('2', 'trtext', from_text)
b.submit('btnTrTxt')
result = b.get_html();
soup = BeautifulSoup(result, fromEncoding="utf-8")
div_id_result = soup.find(id="result");
print div_id_result.contents[0].contents[0]

--