[twill] Ignoring markup when finding text

Tue Jul 11 09:06:32 PDT 2006

I've used the html2text module before, and was grateful not to have to
worry about entity references and other niceties that I would have had
to dummy out by trial and error:

	http://www.aaronsw.com/2002/html2text/

It may be overkill for simply getting the viewable text out of an HTML
document, but it'll work right out of the box.

Cheers!
--
David Hancock | dhancock at arinc.com | 410-266-4384 

-----Original Message-----
From: twill-bounces at lists.idyll.org
[mailto:twill-bounces at lists.idyll.org] On Behalf Of Titus Brown
Sent: Tuesday, July 11, 2006 11:22 AM
To: Michael Hope
Cc: twill at lists.idyll.org
Subject: Re: [twill] Ignoring markup when finding text

On Tue, Jul 11, 2006 at 08:08:15AM +0000, Michael Hope wrote:
-> I've just started using twill to test a django based app, and was
wondering
-> how to handle basic text tests.
-> 
-> I'd like to assert on what the user sees, not the HTML.  For example
-> the app has a 'Log in to post comments' sentance with a anchor around
the
-> words 'Log in'.  The test will be more robust and easier to read if I
can
-> search for the plain text instead of the text with mark up.
-> 
-> I made a quick change to 'find' to search a version of the page with
all
-> tags and newlines stripped out and it worked well.  I was thinking
about
-> making this an option just like the current regex options.  It's a
bit messy
-> as you'd be mixing option classes but not too bad.
-> 
-> How do you people handle this?

Hi, Michael,

good question ;).  I've been thinking about adding an option to 'show'
that strips all of the tags, e.g.

	show --text

and this idea of yours fits pretty well.  I'd worry about handling
things like newlines -- what if 'show --text' wraps the line "Log in\nto
post comments"?  Any thoughts?

cheers,
--titus

_______________________________________________
twill mailing list
twill at lists.idyll.org
http://lists.idyll.org/listinfo/twill