[twill] Can't login to MySpace

twill.overbored at spamgourmet.com twill.overbored at spamgourmet.com
Wed Aug 9 23:32:59 PDT 2006


Here's a simple function tracer along with the traces. A ton of more
code is executed by the working version even before the first open(),
but why?

On 8/10/06, I wrote:
> So sorry - that was totally wrong.
>
> In fact, both versions of mechanize work (I was wrong because of yet
> more Python import confusion). I.e., using the latest darcs checkout
> of twill, this works:
>
> from twill.other_packages.mechanize import *
> b=Browser()
> b.open('http://myspace.com')
> b.select_form('theForm')
> b['email']='myname at myhost'
> b['password']='mypass'
> print b.submit().read()
>
> Yet the following fails!
>
> from twill.commands import *
> go('http://myspace.com')
>
> This is most bizarre, given that the twill simply invokes open(). I
> made sure that there were no other packages - just cd to a directory
> that isn't the twill darcs directory and then make sure that importing
> BeautifulSoup, twill, ClientForm, and mechanize all fail. The
> exception is:
>
> /export/home/bob/tmp/work/twill2/twill/commands.py in go(url)
>     102     Visit the URL given.
>     103     """
> --> 104     browser.go(url)
>     105     return browser.get_url()
>     106
>
> /export/home/bob/tmp/work/twill2/twill/browser.py in go(self, url)
>     112         for u in try_urls:
>     113             try:
> --> 114                 self._journey('open', u)
>     115                 success >     116                 break
>
> /export/home/bob/tmp/work/twill2/twill/browser.py in _journey(self,
> func_name, *args, **kwargs)
>     499         func >     500         try:
> --> 501             r >     502         except urllib2.HTTPError, e:
>     503             r >
> /export/home/bob/tmp/work/twill2/twill/other_packages/mechanize/_mechanize.py
> in open(self, url, data)
>     128         if self._response is not None:
>     129             self._response.close()
> --> 130         return self._mech_open(url, data)
>     131
>     132     def _mech_open(self, url, data=None, update_history=True):
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/mechanize/_mechanize.py
> in _mech_open(self, url, data, update_history)
>     168 ##             # acceptable.
>     169 ##             raise
> --> 170         self.set_response(response)
>     171         if not success:
>     172             raise error
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/mechanize/_mechanize.py
> in set_response(self, response)
>     211
>     212         self._response > --> 213         self._factory.set_response(self._response)
>     214
>     215     def geturl(self):
>
>
> /export/home/bob/tmp/work/twill2/twill/utils.py in set_response(self, response)
>     390         else:
>     391             self.factory > --> 392         self._cleanup_html(response)
>     393
>     394     def links(self):
>
> /export/home/bob/tmp/work/twill2/twill/utils.py in _cleanup_html(self, response)
>     426
>     427         self.factory.set_response(FakeResponse(self._html, self._url,
> --> 428                                                response.info()))
>     429
>     430     def use_BS(self):
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/mechanize/_html.py
> in set_response(self, response)
>     576         if response is not None:
>     577             data > --> 578             soup >     579             self._forms_factory.set_response(response, self.encoding)
>     580             self._links_factory.set_soup(
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/mechanize/_html.py
> in __init__(self, encoding, text)
>     319             self._encoding >     320             ### @CTB
> --> 321             BeautifulSoup.BeautifulSoup.__init__(self, text)
>     322
>     323         def handle_charref(self, ref):
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/BeautifulSoup.py
> in __init__(self, *args, **kwargs)
>    1324         if not kwargs.has_key('smartQuotesTo'):
>    1325             kwargs['smartQuotesTo'] > -> 1326         BeautifulStoneSoup.__init__(self, *args, **kwargs)
>    1327
>    1328     SELF_CLOSING_TAGS >
> /export/home/bob/tmp/work/twill2/twill/other_packages/BeautifulSoup.py
> in __init__(self, markup, parseOnlyThese, fromEncoding, markupMassage,
> smartQuotesTo, convertEntities, selfClosingTags)
>     971         self.markupMassage >     972         try:
> --> 973             self._feed()
>     974         except StopParsing:
>     975             pass
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/BeautifulSoup.py
> in _feed(self, inDocumentEncoding)
>     996         self.reset()
>     997
> --> 998         SGMLParser.feed(self, markup or "")
>     999         SGMLParser.close(self)
>    1000         # Close out any unfinished strings and close all the open tags.
>
> /usr/lib/python2.4/sgmllib.py in feed(self, data)
>      93
>      94         self.rawdata > ---> 95         self.goahead(0)
>      96
>      97     def close(self):
>
> /usr/lib/python2.4/sgmllib.py in goahead(self, end)
>     127                         i >     128                         continue
> --> 129                     k >     130                     if k < 0: break
>     131                     i >
> /usr/lib/python2.4/sgmllib.py in parse_starttag(self, i)
>     278             j >     279         self.__starttag_text > --> 280         self.finish_starttag(tag, attrs)
>     281         return j
>     282
>
> /usr/lib/python2.4/sgmllib.py in finish_starttag(self, tag, attrs)
>     309                 method >     310             except AttributeError:
> --> 311                 self.unknown_starttag(tag, attrs)
>     312                 return -1
>     313             else:
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/BeautifulSoup.py
> in unknown_starttag(self, name, attrs, selfClosing)
>    1153             self.currentData.append('<%s%s>' % (name, attrs))
>    1154             return
> -> 1155         self.endData()
>    1156
>    1157         if not self.isSelfClosingTag(name) and not selfClosing:
>
> /export/home/bob/tmp/work/twill2/twill/other_packages/BeautifulSoup.py
> in endData(self, containerClass)
>    1055     def endData(self, containerClass=NavigableString):
>    1056         if self.currentData:
> -> 1057             currentData >    1058             if currentData.endswith('<') and self.convertHTMLEntities:
>    1059                 currentData >
> I spent a good couple of hours trying to find out what's going on but
> I give up. So close! Details: the former script never even executes
> endData(), while the second does and finds 0xc2 in
> self.currentData[-1][0]. Anybody know what's up?
>
> On 8/9/06, I wrote:
> > OK, the problem was an outdated version of mechanize packaged with
> > twill. I tested both the mechanize from its own subversion repository
> > and the one included in the latest (darcs) version of twill, and only
> > the latter worked.
> >
> > Titus, do you plan to update the components used by twill (in
> > particular, the mechanize library, so that this problem is fixed)?
> > Thanks.
> >
> > On 8/8/06, John J Lee - jjl at pobox.com
> > <> wrote:
> > > On Tue, 8 Aug 2006, twill.overbored at spamgourmet.com wrote:
> > >
> > > > Would you mind sharing the code snippet to make this happen? What
> > > > version of mechanize did you use? (The one included with twill?)
> > > > Thanks.
> > > [...]
> > >
> > > I didn't do anything special, and used mechanize SVN with Python 2.5.
> > >
> > > I won't post/email the ten line script: I don't want to encourage scraping
> > > against terms of use (it seems the no-scrape clause is rather a legal
> > > reflex action these days).  Simply logging in doesn't appear to be against
> > > their terms, but I assume in your case logging in would be a prelude to
> > > some automated action, which they do prohibit.
> > >
> > >
> > > John
> > >
> > >
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: files.tar.bz2
Type: application/x-bzip2
Size: 17437 bytes
Desc: not available
Url : http://lists.idyll.org/pipermail/twill/attachments/20060810/a90d3fdf/attachment-0001.bin 


More information about the twill mailing list