[twill] tidy_ok is nice, but I need something more strict

Matthew Wilson matt at tplus1.com
Thu Dec 27 10:45:24 PST 2007


On Wed 26 Dec 2007 05:07:13 PM EST, William K. Volkman wrote:
> Hello
> On Mon, 2007-12-24 at 14:13, Matthew Wilson wrote:
>> tidy_ok seems to let a lot of unholy html pass through.  For example, I
>> would love to have my twill tests catch instances of img tags lacking
>> alt attributes.
>> 
>> Is it possible to run some other, much more strict XHTML parser over the
>> pages?
>
> IIRC img tags with alt attributes is not strictly an XHTML
> requirement.  It is part of the accessibility initiative.
> It has been awhile; however, I recall using tidy to clean
> up just such constructs.  Perhaps there is a way to add
> the "-access" switch to the tidy call from twill.

I went in a slightly different direction.  What I was really after was a
way to enforce some in-house standards on code.  I wrote this plugin in
about an hour.  I'm curious what people think about it.

    # file tweed.py

    """
    Twill extension module that checks for HTML idioms that I don't like.
    """

    from twill.errors import TwillAssertionError
    from twill.commands import get_browser
    from BeautifulSoup import BeautifulSoup

    def img_has_alt():
        "Verify every img tag has an alt attribute."

        page_guts = get_browser().get_html()
        x = BeautifulSoup(page_guts)

        for img_tag in x.findAll('img'):
            if 'alt' not in img_tag:
                raise TwillAssertionError("No alt attribute in %s."
                                          % img_tag)

    def no_nested_tables():
        "Verify no table has a table inside."

        page_guts = get_browser().get_html()
        x = BeautifulSoup(page_guts)

        for t in x.findAll('table'):
            if t.find('table'):
                raise TwillAssertionError("1998 called and it wants "
                                          "its web design back")

Here it is in the wild:

    current page: http://localhost/scratch/html/untidy.html
    >> show
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html>
    <head>
    <title>simple document</title>
    </head>
    <body>
    <p>a simple paragraph

    <ul>
        <li><img src="a.png"></li>
        <li><img src="b.png"></li>
    </ul>

    <table>
        <tr>
            <!-- column one of the page -->
            <td>

                <table>
                    <tr>
                        <td>hi there</td>
                    </tr>
                    <tr>
                        <td>nice to see you</td>
                    </tr>
                </table>

            </td>

            <td>
                <table>
                    <tr>
                        <td>hi there</td>
                        <td>nice to see you</td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>

    </body>
    </html>

    current page: http://localhost/scratch/html/untidy.html
    >> extend_with tweed
    Imported extension module 'tweed'.
    (at /home/matt/scratch/python/tweed.py)

    Description:

    Twill extension module that checks for lots of stuff.

    current page: http://localhost/scratch/html/untidy.html
    >> img_has_alt

    ERROR: No alt attribute in <img src="a.png" />.

    current page: http://localhost/scratch/html/untidy.html
    >> no
    no_nested_tables  notfind           
    current page: http://localhost/scratch/html/untidy.html
    >> no_nested_tables

    ERROR: 1998 called and it wants its web design back.

    current page: http://localhost/scratch/html/untidy.html
    >> 

I tried to print the line number of the nested table, but I couldn't
figure it out.

I have made no effort to make this nice.  Maybe if I end up using this,
I will figure out how to avoid the redundant calls to get_html.

I was thrilled with how easy it was to write this twill plugin.  I'd
like to make this play nice with require.

Matt

-- 
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com




More information about the twill mailing list