[twill] check_links doesn't follow

Lars Stavholm stava at telcotec.se
Mon Jan 29 10:46:42 PST 2007


Titus Brown wrote:
> On Mon, Dec 18, 2006 at 12:33:18PM +0100, Lars Stavholm wrote:
> -> Titus Brown wrote:
> -> > On Sat, Dec 16, 2006 at 05:31:43PM +0100, Lars Stavholm wrote:
> -> > -> Lars Stavholm wrote:
> -> > -> > I'm doing a simple all-links check on a site
> -> > -> > using the following simple twill script:
> -> > -> > 
> -> > -> > go http://www.jonitec.se
> -> > -> > extend_with check_links
> -> > -> > check_links www\.jonitec\.se
> -> > -> > 
> -> > -> > But for some reason, it doesn't follow the links.
> -> > -> > 
> -> > -> > Does anyone have any idea why?
> -> > -> 
> -> > -> Here's another one: <http://milleteknik.se>.
> -> > -> Some of the links are followed, some not.
> -> > -> 
> -> > -> Any ideas?
> -> > -> /L
> -> > 
> -> > Hi, Lars,
> -> > 
> -> > short answer: it's not finding them!  'showlinks' isn't seeing anything
> -> > on jonitec.  I have no idea why; I have to dig into the mechanize code
> -> > to find out.
> -> 
> -> OK, fine so, the only thing I can add is that the site is a php
> -> CMS based site (joomla.org) and that I'll leave it as is for now.
> -> The links in the generated HTML look fine to me.
> 
> Hi, Lars,
> 
> neither tidy nor BeautifulSoup like the conditionals in the HTML on this
> page; the culprit on jonitec.se appears to be this:
> 
> ===
> 
> <!-- CorrectPNG! Module : compliance patch for microsoft browsers -->
> <!--[if gte IE 5.5000]>
> <!--[if lte IE 7]><script language="JavaScript" src="http://www.jonitec.se/mambots/system/botcorrectpng/correctpng.js"></script><![endif]-->
> <![endif]-->
> 
> ===
> 
> That is, if I remove that from the page, showlinks works fine. The
> <![endif]--> is specifically what's causing the problem; if you put
> <!--[endif]--> link parsing works.
> 
> Do you have any thoughts on how to deal with this?  It's obviously
> incorrect HTML but it shouldn't be breaking things this badly ;).

Thanks Titus! Problem solved in the sense that I fixed the parsing
problem by simply making sure that correct HTML is produced following
your advice and findings. I'll remember to try and make sure that I
have correct HTML before running twill again.



More information about the twill mailing list