[twill] Using Beautiful Soup to Find Images

Titus Brown titus at caltech.edu
Wed Jul 12 23:57:30 PDT 2006


On Wed, Jul 12, 2006 at 02:23:16PM -0500, Terry Peppers wrote:
-> Had a question for the group related to Beautiful Soup that is
-> packaged with Twill.
-> 
-> I'm trying to get away from using a regex to pull out all of the
-> images in a HTML page, I figured I would use Beautiful Soup since it's
-> included with Twill and it's made for parsing HTML, but I'm having
-> some seriously weird results.

[ ... ]

-> So I'm not sure if Twill comes with a scaled back version of
-> BeautifulSoup or if I'm just approaching the problem incorrectly. (If
-> I were a productive member of the OS community I would offer Titus a
-> patch that would just pull all the images in....).
-> 
-> Anyone?

I bet William is right -- that you're using BS 3.0 terminology with BS
2.0.

I can do the following:

soup('img')

to get all of the image tags, for example.

I'm not familiar enough with BS 3.0 to figure out what the difference is
between findAll and __call__ in BS 2.0, though.

cheers,
--titus



More information about the twill mailing list