From liangw.wang at gmail.com Tue Feb 15 19:40:44 2011 From: liangw.wang at gmail.com (Liang Wang) Date: Tue, 15 Feb 2011 22:40:44 -0500 Subject: [twill] A rookie question about twill? Message-ID: Hi all, I am new to this mail list. Hopefully my question is appropriate. I have some trouble trying to log-in the following website: > go https://www.traderydex.com/login/index.rails > showforms It didn't show all the fields -- especially "UserId" and "Password". > fv 1 UserId guest ERROR: no field matches "UserId" Thanks for your help Liang -------------- next part -------------- An HTML attachment was scrubbed... URL: From barmassada at wisc.edu Wed Feb 16 07:20:04 2011 From: barmassada at wisc.edu (Avi Bar Massada) Date: Wed, 16 Feb 2011 09:20:04 -0600 Subject: [twill] Downloading a .zip file Message-ID: <4D5BEB24.2060306@wisc.edu> Hi, I've been using twill with a python script to automate downloads from web-based databases. Until now, I only needed to fetch text files, so it was pretty simple. I've been using: go("web address") b = twill.get_browser() data = b.result.get_page() Now, I'm trying to fetch data from a different website, which generates a link to a .zip file. Given that I know the direct URL to the zip file, would it be possible to download it directly using twill? Clicking on the link in the actual page opens a download dialogue box. Is there any way to bypass it and just get the file directly? Thanks! Avi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandizzi at gmail.com Wed Feb 16 07:45:41 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Wed, 16 Feb 2011 13:45:41 -0200 Subject: [twill] A rookie question about twill? In-Reply-To: References: Message-ID: On Wed, Feb 16, 2011 at 1:40 AM, Liang Wang wrote: > Hi all, Hi, Liang! > I am new to this mail list. Hopefully my question is appropriate. > I have some trouble trying to log-in the following website: >> go?https://www.traderydex.com/login/index.rails >> showforms > It didn't show all the fields -- especially "UserId" and "Password". >> fv 1 UserId guest > ERROR: no field matches "UserId" > Thanks for your help I am not sure but I would bet the problem is that this page is declared as XHTML 1.0 Transitional but its content is in no way a valid XHTML document, as you can se here: http://validator.w3.org/check?uri=https%3A%2F%2Fwww.traderydex.com%2Flogin%2Findex.rails&charset=%28detect+automatically%29&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.2 Maybe - just maybe - it is confusing the twill parser. Would it be possible to correct the code of the page? Good luck! -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/ From brandizzi at gmail.com Wed Feb 16 07:50:08 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Wed, 16 Feb 2011 13:50:08 -0200 Subject: [twill] Downloading a .zip file In-Reply-To: <4D5BEB24.2060306@wisc.edu> References: <4D5BEB24.2060306@wisc.edu> Message-ID: On Wed, Feb 16, 2011 at 1:20 PM, Avi Bar Massada wrote: > Hi, Hi, Avi! > I've been using twill with a python script to automate downloads from > web-based databases. Until now, I only needed to fetch text files, so it was > pretty simple. I've been using: > > go("web address") > b = twill.get_browser() > data = b.result.get_page() > > Now, I'm trying to fetch data from a different website, which generates a > link to a .zip file. Given that I know the direct URL to the zip file, would > it be possible to download it directly using twill? Clicking on the link in > the actual page opens a download dialogue box. Is there any way to bypass it > and just get the file directly? Here I got a ZIP file with "go" >> go http://jsfcompref.appspot.com/faces/chapter04.zip and wrote it to a file using "save_html" >> save_html chapter04.zip It worked flawlessly: Diderot:sandbox brandizzi$ unzip chapter04.zip Archive: chapter04.zip creating: chapter04/web/ [...] inflating: build.properties.sample Have you tried to do it? > Thanks! > Avi > Good luck! -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/ From barmassada at wisc.edu Wed Feb 16 08:06:00 2011 From: barmassada at wisc.edu (Avi Bar Massada) Date: Wed, 16 Feb 2011 10:06:00 -0600 Subject: [twill] Downloading a .zip file In-Reply-To: References: <4D5BEB24.2060306@wisc.edu> Message-ID: <4D5BF5E8.3070504@wisc.edu> Hi Adam, I tried to do that, but the .zip file is corrupted. I think that this is because the URL points to a web page with a link to the file, not to the file itself. Thus, the save_html just saves the web page. In case you want to have a look, the page is here: http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip (notice that it will expire in about seven hours from now). Thanks a lot, Avi On 2/16/2011 9:50 AM, Adam Victor Nazareth Brandizzi wrote: > On Wed, Feb 16, 2011 at 1:20 PM, Avi Bar Massada wrote: >> Hi, > Hi, Avi! > >> I've been using twill with a python script to automate downloads from >> web-based databases. Until now, I only needed to fetch text files, so it was >> pretty simple. I've been using: >> >> go("web address") >> b = twill.get_browser() >> data = b.result.get_page() >> >> Now, I'm trying to fetch data from a different website, which generates a >> link to a .zip file. Given that I know the direct URL to the zip file, would >> it be possible to download it directly using twill? Clicking on the link in >> the actual page opens a download dialogue box. Is there any way to bypass it >> and just get the file directly? > > Here I got a ZIP file with "go" > >>> go http://jsfcompref.appspot.com/faces/chapter04.zip > and wrote it to a file using "save_html" > >>> save_html chapter04.zip > It worked flawlessly: > > Diderot:sandbox brandizzi$ unzip chapter04.zip > Archive: chapter04.zip > creating: chapter04/web/ > [...] > inflating: build.properties.sample > > Have you tried to do it? > >> Thanks! >> Avi >> > Good luck! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandizzi at gmail.com Wed Feb 16 08:14:35 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Wed, 16 Feb 2011 14:14:35 -0200 Subject: [twill] Downloading a .zip file In-Reply-To: <4D5BF5E8.3070504@wisc.edu> References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> Message-ID: On Wed, Feb 16, 2011 at 2:06 PM, Avi Bar Massada wrote: > Hi Adam, Hi, Avi. > I tried to do that, but the .zip file is corrupted. I think that this is > because the URL points to a web page with a link to the file, not to the > file itself. Thus, the save_html just saves the web page. > In case you want to have a look, the page is here: > http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip > (notice that it will expire in about seven hours from now). Well, if you can know the name of the file, you can follow the link after going to the page: >> go http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip >> showlinks Links: 0. Skip to Content ==> #mainContent [...] 11. Original search ==> /occurrences/search.htm?c[0].s=19&c[0].p=0&c[0].o=121.8W,44.9N,121.7W,45.0N&c[1].s=5&c[1].p=0&c[1].o=US 12. occurrence-search-12978055989365071032693658999911.zip (approx file size 1 KB) ==> /download/occurrence-search-12978055989365071032693658999911.zip 13. GBIF ==> http://www.gbif.org 14. Contact us ==> mailto:portal at gbif.org >> follow occurrence-search-12978055989365071032693658999911.zip >> save_html occur.zip Is that possible? I use to give names to all links in my apps just for access them though twill and other automated tools. Maybe you can ease your work doing it if you are the creator of this site :) > Thanks a lot, You're welcome! -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/ From barmassada at wisc.edu Wed Feb 16 09:03:25 2011 From: barmassada at wisc.edu (Avi Bar Massada) Date: Wed, 16 Feb 2011 11:03:25 -0600 Subject: [twill] Downloading a .zip file In-Reply-To: References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> Message-ID: <4D5C035D.6000607@wisc.edu> Thanks Adam, I followed your suggestion, but when I try to open the downloaded zip file it gives me an "unsupported compression method" error. BTW, if I save the file without the .zip suffix, and open it from its new location, it prompts me straight to the download dialogue box. > Well, if you can know the name of the file, you can follow the link > after going to the page: > >>> go http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip >>> showlinks > Links: > > 0. Skip to Content ==> #mainContent > [...] > 11. Original search ==> > /occurrences/search.htm?c[0].s=19&c[0].p=0&c[0].o=121.8W,44.9N,121.7W,45.0N&c[1].s=5&c[1].p=0&c[1].o=US > 12. occurrence-search-12978055989365071032693658999911.zip (approx > file size 1 KB) ==> > /download/occurrence-search-12978055989365071032693658999911.zip > 13. GBIF ==> http://www.gbif.org > 14. Contact us ==> mailto:portal at gbif.org >>> follow occurrence-search-12978055989365071032693658999911.zip >>> save_html occur.zip > Is that possible? > > I use to give names to all links in my apps just for access them > though twill and other automated tools. Maybe you can ease your work > doing it if you are the creator of this site :) I wish I was, but I'm not... >> Thanks a lot, > You're welcome! > From brandizzi at gmail.com Wed Feb 16 09:12:16 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Wed, 16 Feb 2011 15:12:16 -0200 Subject: [twill] Downloading a .zip file In-Reply-To: <4D5C035D.6000607@wisc.edu> References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> <4D5C035D.6000607@wisc.edu> Message-ID: On Wed, Feb 16, 2011 at 3:03 PM, Avi Bar Massada wrote: > Thanks Adam, > > I followed your suggestion, but when I try to open the downloaded zip file > it gives me an "unsupported compression method" error. That is pretty strange... It can be a problem with the compression software you are using. Could you send us the script you are using? > BTW, if I save the > file without the .zip suffix, and open it from its new location, it prompts > me straight to the download dialogue box. Do you mean the browser download dialog box? That is way stranger :) -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/ From barmassada at wisc.edu Wed Feb 16 09:23:04 2011 From: barmassada at wisc.edu (Avi Bar Massada) Date: Wed, 16 Feb 2011 11:23:04 -0600 Subject: [twill] Downloading a .zip file In-Reply-To: References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> <4D5C035D.6000607@wisc.edu> Message-ID: <4D5C07F8.5020905@wisc.edu> Thanks again Adam, see my comments below. > That is pretty strange... It can be a problem with the compression > software you are using. Could you send us the script you are using? > Right now I'm using the python command line in Windows XP. I used the following commands: import twill from twill.commands import * go("http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip") follow("occurrence-search-12978055989365071032693658999911.zip") save_html("occurrence.zip") By the way, I tried to follow your original code (the one that points to http://jsfcompref.appspot.com/faces/chapter04.zip) and got exactly the same errors... >> BTW, if I save the >> file without the .zip suffix, and open it from its new location, it prompts >> me straight to the download dialogue box. > Do you mean the browser download dialog box? That is way stranger :) > Yup, this is the browser download dialogue box... Avi From brandizzi at gmail.com Wed Feb 16 09:43:00 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Wed, 16 Feb 2011 15:43:00 -0200 Subject: [twill] Downloading a .zip file In-Reply-To: <4D5C07F8.5020905@wisc.edu> References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> <4D5C035D.6000607@wisc.edu> <4D5C07F8.5020905@wisc.edu> Message-ID: On Wed, Feb 16, 2011 at 3:23 PM, Avi Bar Massada wrote: > Thanks again Adam, see my comments below. >> >> That is pretty strange... It can be a problem with the compression >> software you are using. Could you send us the script you are using? >> > Right now I'm using the python command line in Windows XP. I used the > following commands: > > import twill > from twill.commands import * > go("http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip") > follow("occurrence-search-12978055989365071032693658999911.zip") > save_html("occurrence.zip") > > By the way, I tried to follow your original code (the one that points to > http://jsfcompref.appspot.com/faces/chapter04.zip) and got exactly the same > errors... Well, I'm trying it on a Mac so I cannot emulate your environment... Sorry. It seems either a problem with your environment or (more probable) a bug in twill. A not-so-buggy bug, anyway, because the "save_html" method is used mainly for, well, save HTML. If I remember well, Windows differentiates between binary (such as ZIP) and text files (such as HTML) and saves them in different ways. Since you are using a function made or saving HTML to save a ZIP file, it can be the problem. >>> BTW, if I save the >>> file without the .zip suffix, and open it from its new location, it >>> prompts >>> me straight to the download dialogue box. >> >> Do you mean the browser download dialog box? That is way stranger :) >> > Yup, this is the browser download dialogue box... Well, good luck! I hope you solve your problem :) -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/ From barmassada at wisc.edu Wed Feb 16 09:46:42 2011 From: barmassada at wisc.edu (Avi Bar Massada) Date: Wed, 16 Feb 2011 11:46:42 -0600 Subject: [twill] Downloading a .zip file In-Reply-To: References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> <4D5C035D.6000607@wisc.edu> <4D5C07F8.5020905@wisc.edu> Message-ID: <4D5C0D82.3070208@wisc.edu> Thanks a lot Adam! I will figure this out eventually :) On 2/16/2011 11:43 AM, Adam Victor Nazareth Brandizzi wrote: > On Wed, Feb 16, 2011 at 3:23 PM, Avi Bar Massada wrote: >> Thanks again Adam, see my comments below. >>> That is pretty strange... It can be a problem with the compression >>> software you are using. Could you send us the script you are using? >>> >> Right now I'm using the python command line in Windows XP. I used the >> following commands: >> >> import twill >> from twill.commands import * >> go("http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip") >> follow("occurrence-search-12978055989365071032693658999911.zip") >> save_html("occurrence.zip") >> >> By the way, I tried to follow your original code (the one that points to >> http://jsfcompref.appspot.com/faces/chapter04.zip) and got exactly the same >> errors... > Well, I'm trying it on a Mac so I cannot emulate your environment... Sorry. > > It seems either a problem with your environment or (more probable) a > bug in twill. A not-so-buggy bug, anyway, because the "save_html" > method is used mainly for, well, save HTML. If I remember well, > Windows differentiates between binary (such as ZIP) and text files > (such as HTML) and saves them in different ways. Since you are using a > function made or saving HTML to save a ZIP file, it can be the > problem. > >>>> BTW, if I save the >>>> file without the .zip suffix, and open it from its new location, it >>>> prompts >>>> me straight to the download dialogue box. >>> Do you mean the browser download dialog box? That is way stranger :) >>> >> Yup, this is the browser download dialogue box... > Well, good luck! I hope you solve your problem :) > From d.rothe at semantics.de Wed Feb 16 13:13:12 2011 From: d.rothe at semantics.de (Dirk Rothe) Date: Wed, 16 Feb 2011 22:13:12 +0100 Subject: [twill] Downloading a .zip file In-Reply-To: <4D5C0D82.3070208@wisc.edu> References: <4D5BEB24.2060306@wisc.edu> <4D5BF5E8.3070504@wisc.edu> <4D5C035D.6000607@wisc.edu> <4D5C07F8.5020905@wisc.edu> <4D5C0D82.3070208@wisc.edu> Message-ID: Try it with patching twill/commands.py Line 311 from f = open(filename, 'w') to f = open(filename, 'wb') Under Windows the newline \n will be expanded to \r\n. --dirk On Wed, 16 Feb 2011 18:46:42 +0100, Avi Bar Massada wrote: > Thanks a lot Adam! I will figure this out eventually :) > > > On 2/16/2011 11:43 AM, Adam Victor Nazareth Brandizzi wrote: >> On Wed, Feb 16, 2011 at 3:23 PM, Avi Bar Massada >> wrote: >>> Thanks again Adam, see my comments below. >>>> That is pretty strange... It can be a problem with the compression >>>> software you are using. Could you send us the script you are using? >>>> >>> Right now I'm using the python command line in Windows XP. I used the >>> following commands: >>> >>> import twill >>> from twill.commands import * >>> go("http://data.gbif.org/download/downloadReady.htm?downloadFile=occurrence-search-12978055989365071032693658999911.zip") >>> follow("occurrence-search-12978055989365071032693658999911.zip") >>> save_html("occurrence.zip") >>> >>> By the way, I tried to follow your original code (the one that points >>> to >>> http://jsfcompref.appspot.com/faces/chapter04.zip) and got exactly the >>> same >>> errors... >> Well, I'm trying it on a Mac so I cannot emulate your environment... >> Sorry. >> >> It seems either a problem with your environment or (more probable) a >> bug in twill. A not-so-buggy bug, anyway, because the "save_html" >> method is used mainly for, well, save HTML. If I remember well, >> Windows differentiates between binary (such as ZIP) and text files >> (such as HTML) and saves them in different ways. Since you are using a >> function made or saving HTML to save a ZIP file, it can be the >> problem. >> >>>>> BTW, if I save the >>>>> file without the .zip suffix, and open it from its new location, it >>>>> prompts >>>>> me straight to the download dialogue box. >>>> Do you mean the browser download dialog box? That is way stranger :) >>>> >>> Yup, this is the browser download dialogue box... >> Well, good luck! I hope you solve your problem :) >> > > _______________________________________________ > twill mailing list > twill at lists.idyll.org > http://lists.idyll.org/listinfo/twill From liangw.wang at gmail.com Wed Feb 16 17:46:15 2011 From: liangw.wang at gmail.com (Liang Wang) Date: Wed, 16 Feb 2011 20:46:15 -0500 Subject: [twill] A rookie question about twill? In-Reply-To: References: Message-ID: Hi Adam, Thanks a lot for your reply. Unfortunately, I don't have any power to test your proposed solution since I am just an end user of the website: http://www.rydex-sgi.com/. Just for fun, I used the website you mentioned: http://validator.w3.org to test www.ebay.com and www.google.com. Each one was complained for at least 200+ errors. :) thanks Liang On Wed, Feb 16, 2011 at 10:45 AM, Adam Victor Nazareth Brandizzi < brandizzi at gmail.com> wrote: > On Wed, Feb 16, 2011 at 1:40 AM, Liang Wang wrote: > > Hi all, > > Hi, Liang! > > > I am new to this mail list. Hopefully my question is appropriate. > > I have some trouble trying to log-in the following website: > >> go https://www.traderydex.com/login/index.rails > >> showforms > > It didn't show all the fields -- especially "UserId" and "Password". > >> fv 1 UserId guest > > ERROR: no field matches "UserId" > > Thanks for your help > > I am not sure but I would bet the problem is that this page is > declared as XHTML 1.0 Transitional > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > > > but its content is in no way a valid XHTML document, as you can se > here: > http://validator.w3.org/check?uri=https%3A%2F%2Fwww.traderydex.com%2Flogin%2Findex.rails&charset=%28detect+automatically%29&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.2 > > Maybe - just maybe - it is confusing the twill parser. Would it be > possible to correct the code of the page? > > Good luck! > > -- > Adam Victor Nazareth Brandizzi > http://brandizzi.googlepages.com/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandizzi at gmail.com Thu Feb 17 06:24:57 2011 From: brandizzi at gmail.com (Adam Victor Nazareth Brandizzi) Date: Thu, 17 Feb 2011 12:24:57 -0200 Subject: [twill] A rookie question about twill? In-Reply-To: References: Message-ID: On Wed, Feb 16, 2011 at 11:46 PM, Liang Wang wrote: > Hi Adam, Hello! > Thanks a lot for your reply. Unfortunately, I don't have any power to test > your proposed solution since I am just an end user of the > website:?http://www.rydex-sgi.com/. I understand... > Just for fun, I used the website you?mentioned:?http://validator.w3.org?to > test www.ebay.com and www.google.com. Each one was complained for at least > 200+ errors. :) In fact the validator is pretty strict. Also, it may not be a problem with the site, I was just cogitating it... It can be a bug in twill as well. Good luck! -- Adam Victor Nazareth Brandizzi http://brandizzi.googlepages.com/