[twill] Loading cookies (was Re: cookie client)

Dolf Andringa dolf.andringa at elcyion.nl
Fri Jan 26 01:55:30 PST 2007


Hi everybody,

This question has been dead for a while so I guess it has been solved. I
just found out the solution myself for scopus, so for archive purposes,
here it is:

Scopus uses a crawler protection by checking the user Agent. If it is
python/urllib it will redirect you to a crawlerprotection.url page. Just
set the User Agent header to mozilla 5.0 and the problem is solved:

cj=cookielib.CookieJar()
opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders=[('User-agent','Mozilla/5.0')]
f=opener.open(url)
print f.read()


Dolf.

irina nudelman schreef:
> Hi John,
> 
> Thank you for your suggestions! I didn't have a chance to try them today
> at work. I will try them on Mon. I'm on Windows and I'm using the
> Mozilla Firefox browser, but I could use the explorer as well.
> 
> Thank you!
> Irina.
> 
> 
> */John J Lee <jjl at pobox.com>/* wrote:
> 
>     On Thu, 3 Aug 2006, John J Lee wrote:
>     [...]
>     > 2. Tell twill where to find your "cookie jar" with the "load_cookies"
>     > command.
>     [...]
> 
>     If you don't have the original password, this is what you have to
>     do, of
>     course.
> 
>     Unfortunately, it looks like twill doesn't support loading Firefox or
>     Internet Explorer cookies yet (it can load and save cookies, but
>     only in a
>     special format not used by browsers). ClientCookie and mechanize can
>     load
>     Firefox and IE cookies, however. I'll try and point you in the right
>     direction, but first: Are you on Windows? Which browser are you using?
> 
> 
>     Titus -- why not have twill's load_cookies load the IE cookies if
>     you're
>     running on Windows and there's no cookie jar argument (by
>     instantiating an
>     MSIECookieJar and using .load_from_registry())? That's the preferred
>     way
>     to load IE cookies (always assuming that MSIECookieJar still works
>     -- it
>     parses an undocumented MS file format...). If there's an argument, I
>     think twill should attempt to .load() using each class in turn --
>     LoadError tells you it's the wrong format for that class
>     (MSIECookieJar,
>     MozillaCookieJar, LWPCookieJar -- order shouldn't matter). If you add
>     support for Firefox cookie saving, you should also add the warnings
>     about
>     this from the mechanize docs (IIRC, that a running Firefox may clobber
>     your changes, and you should back up any cookie files that contain
>     important cookies). I think save_cookies without an argument should use
>     the filename and file format that were used on the previous
>     load_cookies.
>     Finally, note MSIECookieJar does not support saving, and iteration
>     won't
>     cause loading of cookies unless .delayload is true. The delayload thing
>     can be significant for MSIECookieJar because each cookie is in its own
>     little file (or something similar, I forget), so I suggest having
>     delayload=False until somebody asks for show_cookies, at which point
>     you
>     should call .load_all_cookies() before iterating.
> 
> 
>     John
> 
> 
>     -------------------------------------------------------------------------
>     Take Surveys. Earn Cash. Influence the Future of IT
>     Join SourceForge.net's Techsay panel and you'll get the chance to
>     share your
>     opinions on IT & business topics through brief surveys -- and earn cash
>     http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>     _______________________________________________
>     wwwsearch-general mailing list
>     wwwsearch-general at lists.sourceforge.net
>     https://lists.sourceforge.net/lists/listinfo/wwwsearch-general
> 
> 
> ------------------------------------------------------------------------
> Do you Yahoo!?
> Next-gen email? Have it all with the all-new Yahoo! Mail Beta.
> <http://us.rd.yahoo.com/evt=42241/*http://advision.webevents.yahoo.com/handraisers>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> wwwsearch-general mailing list
> wwwsearch-general at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wwwsearch-general




More information about the twill mailing list