[twill] Newbie question on approach to following links

Terry Peppers peppers at gmail.com
Wed Jul 26 14:55:03 PDT 2006


Hi Hal -

This probably isn't the correct answer, but when I'm trying to access
certain elements of a page I've been using Beautiful Soup with decent
results.

Something like - maybe:

from twill commands import *
from twill import get_browser
from BeutifulSoup import BeautifulSoup

go("http://www.someplace.com")

page = get_browser().get_html()

soup = BeautifulSoup(page)

cell = soup('td' : {'class' : 'if cell has a classname it helps'})

Maybe?

t.

On 7/26/06, twill-request at lists.idyll.org <twill-request at lists.idyll.org> wrote:
> Send twill mailing list submissions to
>         twill at lists.idyll.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.idyll.org/listinfo/twill
> or, via email, send a message with subject or body 'help' to
>         twill-request at lists.idyll.org
>
> You can reach the person managing the list at
>         twill-owner at lists.idyll.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of twill digest..."
>
>
> Today's Topics:
>
>    1.  Newbie question on approach to following links relative to
>       text (Hal Wine)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 25 Jul 2006 12:06:26 -0700
> From: "Hal Wine" <hal.wine at gmail.com>
> Subject: [twill] Newbie question on approach to following links
>         relative to     text
> To: twill at lists.idyll.org
> Message-ID:
>         <523453750607251206n450e8973rc2fca25599fa430b at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I'm trying to learn twill with a "simple" example of scraping a web
> page and sending an email based on content.
>
> I'm not fully groking what to do in python, and what to have twill do.
> Suggestions welcome, or pointers to any similar example welcome.
>
> Basically, I want to scrape a tinderbox status page, which is a big
> table. Cell data consists of 3 links followed by text. If the text
> contains 'errs:', then I want to follow one of the preceding links. In
> pseudo code:
>   find cell with 'junit' in it
>   if cell also has 'errs:' in it, then follow the 2nd link prior to
> this position.
>
> I can use 'find' to get the cell data. What I'm confused about is if
> there is any support in twill for it to parse the preceding links, or
> if I do that in python then pass the extracted URL to 'follow'? Or
> should I just write an extension to add that? Or is there a way to
> know that my 'find' is between links X & Y, so I can just access
> showlinks()[X-1].
>
> Thanks
>
>
>
> ------------------------------
>
> _______________________________________________
> twill mailing list
> twill at lists.idyll.org
> http://lists.idyll.org/listinfo/twill
>
>
> End of twill Digest, Vol 12, Issue 13
> *************************************
>



More information about the twill mailing list