[twill] Working around <br/> issue?

Howard B. Golden hgolden at socal.rr.com
Wed Dec 30 14:15:42 PST 2009

On Wednesday December 30, 2009, Misha Koshelev wrote:

> I obviously am not able to alter this website. However, I was
>  wondering is there either: (a) a way to query-replace all <br/> with
>  <br /> _before_ twill parses a website or, alternately,
> (b) I can do this from a shell script fairly easily. Is there a
>  mechanism, analogous to save_html, to _input_ an HTML file into
>  twill, say like load_html?

If you trace how twill's commands work, you will see that they use a 
browser.get_html() call to get the page's data. This is then used for 
all the rest of the processing. For example, look at the code for find() 
in commands.py.

The browser object also has a "result" attribute. This is a 
ResultWrapper object (see utils.py) created by _journey() in browser.py 
when it reads a page. Inside this object, there is a "page" attribute, 
which you can play with (and modify).

In summary, you can create a browser object (call it "b"). Then, after 
you do a "b.go(url)", you can modify "b.result.page" to be whatever you 
want before calling other functions.

Note: All of this can be done using a "run" command. See my previous 
message (http://lists.idyll.org/pipermail/twill/2009-March/000962.html) 
for how this works.

(I haven't tested this, so it may have some syntax errors. Let me know 
if you have any questions.)


More information about the twill mailing list