[twill] Error with showforms again
William K. Volkman
wkvsf at users.sourceforge.net
Tue Dec 13 22:28:19 PST 2005
Hi Titus,
On Tue, 2005-12-13 at 15:28 -0800, Titus Brown wrote:
> If you just send me the relevant source (not even a functioning patch is
> required...) it would help me overcome my inertia. So, whatever you can
> send would be great.
You really make it easy for a guy to be lazy :-D Well as usual I'm
pressed for time so here is the (albeit) hacky version I use, if you
want I can clean it up. If you can't wait well...here you have it.
import re
from mechanize import Browser, Link, LinkNotFoundError
import ClientForm
from BeautifulSoup import BeautifulSoup, Null, Tag
class MyBrowser(Browser):
def _parse_html(self, response):
self.form = None
self._title = None
if not self.viewing_html():
# nothing to see here
return
# set ._forms, ._links
try:
self._forms = ClientForm.ParseResponse(response)
except ClientForm.ParseError, err:
print str(err)
self._forms = []
url = response.geturl()
response.seek(0)
p = BeautifulSoup(response)
forms = p('form')
for form in forms:
tmp = ClientForm.StringIO()
tmp.write(str(form))
tmp.seek(0)
try:
f = ClientForm.ParseFile(tmp, url)
except ClientForm.ParseError:
continue
self._forms.extend(f)
response.seek(0)
p = BeautifulSoup(response)
b = p('base')
if len(b) == 0:
base = response.geturl()
else:
base = b[0].get('href')
if not base:
base = response.geturl()
self._links = []
for l in p('a', {'href':re.compile('.+')}):
url = l.get('href')
if not url:
# Strange we filtered for it
continue
if l.string != Null:
text = l.string
else:
text = ''
for t in l.contents:
if isinstance(t, str):
text = text + t.strip()
elif isinstance(t, Tag):
if t.name == 'img':
if t.get('alt'):
text = text + t.get('alt')+'[IMG]'
else:
text = text + '[IMG]'
text = text.strip()
link = Link(base, url, text, 'a', l.attrs)
self._links.append(link)
response.seek(0)
Note that the above contains a implementation of handling the "BASE"
tag, which is the reason links are done here instead of letting
mechanize take care of them. I made a small patch to BeautifulSoup to
recognize the "BASE" tag as an unterminated tag and sent it in but
haven't since gone back and checked on it. If needed I can supply
that also. Also the "BASE" tag handling above should apply to the
form actions, it just wasn't necessary in my case. A more straight
forward implementation might be to use BeautifulSoup always instead of
trying ClientForm first however I was in a hurry at the time.
HTH,
William.
More information about the twill
mailing list