[twill] tidy_ok is nice, but I need something more strict
Matthew Wilson
matt at tplus1.com
Thu Dec 27 10:45:24 PST 2007
On Wed 26 Dec 2007 05:07:13 PM EST, William K. Volkman wrote:
> Hello
> On Mon, 2007-12-24 at 14:13, Matthew Wilson wrote:
>> tidy_ok seems to let a lot of unholy html pass through. For example, I
>> would love to have my twill tests catch instances of img tags lacking
>> alt attributes.
>>
>> Is it possible to run some other, much more strict XHTML parser over the
>> pages?
>
> IIRC img tags with alt attributes is not strictly an XHTML
> requirement. It is part of the accessibility initiative.
> It has been awhile; however, I recall using tidy to clean
> up just such constructs. Perhaps there is a way to add
> the "-access" switch to the tidy call from twill.
I went in a slightly different direction. What I was really after was a
way to enforce some in-house standards on code. I wrote this plugin in
about an hour. I'm curious what people think about it.
# file tweed.py
"""
Twill extension module that checks for HTML idioms that I don't like.
"""
from twill.errors import TwillAssertionError
from twill.commands import get_browser
from BeautifulSoup import BeautifulSoup
def img_has_alt():
"Verify every img tag has an alt attribute."
page_guts = get_browser().get_html()
x = BeautifulSoup(page_guts)
for img_tag in x.findAll('img'):
if 'alt' not in img_tag:
raise TwillAssertionError("No alt attribute in %s."
% img_tag)
def no_nested_tables():
"Verify no table has a table inside."
page_guts = get_browser().get_html()
x = BeautifulSoup(page_guts)
for t in x.findAll('table'):
if t.find('table'):
raise TwillAssertionError("1998 called and it wants "
"its web design back")
Here it is in the wild:
current page: http://localhost/scratch/html/untidy.html
>> show
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>simple document</title>
</head>
<body>
<p>a simple paragraph
<ul>
<li><img src="a.png"></li>
<li><img src="b.png"></li>
</ul>
<table>
<tr>
<!-- column one of the page -->
<td>
<table>
<tr>
<td>hi there</td>
</tr>
<tr>
<td>nice to see you</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>hi there</td>
<td>nice to see you</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
current page: http://localhost/scratch/html/untidy.html
>> extend_with tweed
Imported extension module 'tweed'.
(at /home/matt/scratch/python/tweed.py)
Description:
Twill extension module that checks for lots of stuff.
current page: http://localhost/scratch/html/untidy.html
>> img_has_alt
ERROR: No alt attribute in <img src="a.png" />.
current page: http://localhost/scratch/html/untidy.html
>> no
no_nested_tables notfind
current page: http://localhost/scratch/html/untidy.html
>> no_nested_tables
ERROR: 1998 called and it wants its web design back.
current page: http://localhost/scratch/html/untidy.html
>>
I tried to print the line number of the nested table, but I couldn't
figure it out.
I have made no effort to make this nice. Maybe if I end up using this,
I will figure out how to avoid the redundant calls to get_html.
I was thrilled with how easy it was to write this twill plugin. I'd
like to make this play nice with require.
Matt
--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com
More information about the twill
mailing list