[twill] Web log-in intelligence

Jamal Mazrui Jamal.Mazrui at fcc.gov
Thu Jul 15 08:19:50 PDT 2010


I am interested in developing a cross-platform, batch web download
application with a GUI, implementing recursion and filters similar to
the Wget utility.  An important feature (not supported by Wget in an
automated fashion) is the ability to log into a web site via an HTML
form.  

The GUI would include edit boxes for a user name and password.  The
program would then log in and begin downloading files, according to
criteria specified, using twill and other Python modules.

Since web forms are not standardized regarding the HTML code that
prompts for a user name and password, I would like the program to use
heuristics that work in most cases (knowing that perfection is elusive,
especially if the form uses JavaScript in a manner to thwart bots).  I
am seeking suggestions on how to implement such heuristics.  For
example, the code might look for words like "Name" and "Password" in id
or name attributes of text input fields.  Naturally, if anyone has code
they can share, that would be appreciated.

Jamal




More information about the twill mailing list