[twill] beautifulSoup via twill?

montara at sonic.net montara at sonic.net
Tue Jul 24 12:53:30 PDT 2007


>
> Hey Mike,
>
> this is about what you have to do, yes.  It seems like a good idea to
> make a standardized API for it, though; do you have any thoughts on what
> that should be?
>
> Perhaps:
>
> import twill
> ...
> soup = twill.get_soup()
>
> is simple enough?
>
> --titus
>
>

thanks Stephan and Titus.

regarding a standardized API, yes, let's encourage that.  I tend to think
"private" access when seeing objects starting with underscores
("_beautifulsoup"), and that's where I think I'm not understanding what
I'm using.

the basic features I'm needing, Twill seems like a good fit:
 * straightforward install
 * settable timeout on requests
 * timing all requests (easy enough)
 * pull and parse html, ideally using a document object model
 * form processing
 * session management via cookie management

I've not bridged BeautifulSoup with PyParser, more coding samples of the
various pieces that make up the framework I would think would be valuable.
 For example, pulling out the inner html of a div or the image url from an
image tag.  Simple stuff, but not obvious to figure out on the first time
through.

the only other thing I'd add is that mechanize and twill are both out of
date with BeautifulSoup, I'm wondering how difficult it is to keep them
current.

I'm enjoying using twill and the underlying packages, this is good stuff,
thank you!

- mike




More information about the twill mailing list