[twill] check_links doesn't follow

Titus Brown titus at caltech.edu
Mon Jan 29 10:23:41 PST 2007


On Mon, Dec 18, 2006 at 12:33:18PM +0100, Lars Stavholm wrote:
-> Titus Brown wrote:
-> > On Sat, Dec 16, 2006 at 05:31:43PM +0100, Lars Stavholm wrote:
-> > -> Lars Stavholm wrote:
-> > -> > I'm doing a simple all-links check on a site
-> > -> > using the following simple twill script:
-> > -> > 
-> > -> > go http://www.jonitec.se
-> > -> > extend_with check_links
-> > -> > check_links www\.jonitec\.se
-> > -> > 
-> > -> > But for some reason, it doesn't follow the links.
-> > -> > 
-> > -> > Does anyone have any idea why?
-> > -> 
-> > -> Here's another one: <http://milleteknik.se>.
-> > -> Some of the links are followed, some not.
-> > -> 
-> > -> Any ideas?
-> > -> /L
-> > 
-> > Hi, Lars,
-> > 
-> > short answer: it's not finding them!  'showlinks' isn't seeing anything
-> > on jonitec.  I have no idea why; I have to dig into the mechanize code
-> > to find out.
-> 
-> OK, fine so, the only thing I can add is that the site is a php
-> CMS based site (joomla.org) and that I'll leave it as is for now.
-> The links in the generated HTML look fine to me.

Hi, Lars,

neither tidy nor BeautifulSoup like the conditionals in the HTML on this
page; the culprit on jonitec.se appears to be this:

===

<!-- CorrectPNG! Module : compliance patch for microsoft browsers -->
<!--[if gte IE 5.5000]>
<!--[if lte IE 7]><script language="JavaScript" src="http://www.jonitec.se/mambots/system/botcorrectpng/correctpng.js"></script><![endif]-->
<![endif]-->

===

That is, if I remove that from the page, showlinks works fine. The
<![endif]--> is specifically what's causing the problem; if you put
<!--[endif]--> link parsing works.

Do you have any thoughts on how to deal with this?  It's obviously
incorrect HTML but it shouldn't be breaking things this badly ;).

cheers,
--titus



More information about the twill mailing list