[twill] Error with showforms again

Titus Brown titus at caltech.edu
Tue Dec 13 22:49:03 PST 2005


thanks -- I'll probably go through this & check it in this weekend.

--titus

On Tue, Dec 13, 2005 at 11:28:19PM -0700, William K. Volkman wrote:
-> Hi Titus,
-> On Tue, 2005-12-13 at 15:28 -0800, Titus Brown wrote:
-> > If you just send me the relevant source (not even a functioning patch is
-> > required...) it would help me overcome my inertia.  So, whatever you can
-> > send would be great.
-> You really make it easy for a guy to be lazy :-D  Well as usual I'm
-> pressed for time so here is the (albeit) hacky version I use, if you
-> want I can clean it up.  If you can't wait well...here you have it.
-> 
-> import re
-> from mechanize import Browser, Link, LinkNotFoundError
-> import ClientForm
-> from BeautifulSoup import BeautifulSoup, Null, Tag
-> 
-> class MyBrowser(Browser):
->     def _parse_html(self, response):
->         self.form = None
->         self._title = None
->         if not self.viewing_html():
->             # nothing to see here
->             return
->         # set ._forms, ._links
->         try:
->             self._forms = ClientForm.ParseResponse(response)
->         except ClientForm.ParseError, err:
->             print str(err)
->             self._forms = []
->             url = response.geturl()
->             response.seek(0)
->             p = BeautifulSoup(response)
->             forms = p('form')
->             for form in forms:
->                 tmp = ClientForm.StringIO()
->                 tmp.write(str(form))
->                 tmp.seek(0)
->                 try:
->                     f = ClientForm.ParseFile(tmp, url)
->                 except ClientForm.ParseError:
->                     continue
->                 self._forms.extend(f)
->         response.seek(0)
->         p = BeautifulSoup(response)
->         b = p('base')
->         if len(b) == 0:
->             base = response.geturl()
->         else:
->             base = b[0].get('href')
->             if not base:
->                 base = response.geturl()
->         self._links = []
->         for l in p('a', {'href':re.compile('.+')}):
->             url = l.get('href')
->             if not url:
->                 # Strange we filtered for it
->                 continue
->             if l.string != Null:
->                 text = l.string
->             else:
->                 text = ''
->                 for t in l.contents:
->                     if isinstance(t, str):
->                         text = text + t.strip()
->                     elif isinstance(t, Tag):
->                         if t.name == 'img':
->                             if t.get('alt'):
->                                 text = text + t.get('alt')+'[IMG]'
->                             else:
->                                 text = text + '[IMG]'
->                 text = text.strip()
->             link = Link(base, url, text, 'a', l.attrs)
->             self._links.append(link)
->         response.seek(0)
-> 
-> Note that the above contains a implementation of handling the "BASE"
-> tag, which is the reason links are done here instead of letting
-> mechanize take care of them.  I made a small patch to BeautifulSoup to
-> recognize the "BASE" tag as an unterminated tag and sent it in but
-> haven't since gone back and checked on it.  If needed I can supply
-> that also.  Also the "BASE" tag handling above should apply to the
-> form actions, it just wasn't necessary in my case.  A more straight
-> forward implementation might be to use BeautifulSoup always instead of
-> trying ClientForm first however I was in a hurry at the time.
-> 
-> HTH,
-> William.
-> 



More information about the twill mailing list