解决在 Python 中登录网站的问题

-- xyb [2004-08-29 17:49:27]

解决在 Python 中登录网站的问题

作者:Xie Yanbo,
版权:创作共用/cc 1.0,
原文地址:http://xie.freezope.org/blog/2004/08/python.html

CookieClient

虽然 Python2.3 提供了 Cookie 这个模块,但做为操作客户端 cookie 来说并不实用。我们也可以靠自己维护 http header 来实现这些功能,但这很很麻烦。还好有人提供了不错的模块,比如 CookieClient 就是不错的选择。下面是我用 CookieClient 编写的一个访问 linuxforum 的示例脚本:

   1 #!/usr/bin/env python
   2 # -*- coding: GB2312 -*-
   3 # xyb at linuxforum.net
   4 
   5 import sys
   6 import ClientCookie
   7 from urllib import urlencode
   8 
   9 # add loader
  10 cookies = ClientCookie.LWPCookieJar()
  11 opener = ClientCookie.build_opener(
  12         ClientCookie.HTTPCookieProcessor(cookies),
  13         ClientCookie.HTTPRefererProcessor,
  14         ClientCookie.HTTPEquivProcessor,
  15         ClientCookie.HTTPRefreshProcessor,
  16         ClientCookie.SeekableProcessor)
  17 opener.addheaders = [
  18         ("User-agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031107 Debian/1.5-3"),
  19         ("Accept", "text/html, image/jpeg, image/png, text/*, image/*, */*")]
  20 ClientCookie.install_opener(opener)
  21 
  22 # check args
  23 if len(sys.argv) <= 2:
  24     print "Usage: %s USERNAME PASSWORD" % (sys.argv[0])
  25     sys.exit(1)
  26 else:
  27     UserName = sys.argv[1]
  28     Password = sys.argv[2]
  29 
  30 # login
  31 data = {
  32         'Loginname': UserName,
  33         'Loginpass': Password,
  34         'firstlogin': 1,
  35         'option': '登入论坛'
  36         }
  37 urldata = urlencode(data)
  38 r = ClientCookie.urlopen("http://www.linuxforum.net/forum/start_page.php", urldata)
  39 
  40 # show result
  41 results = r.read()
  42 open('start.html', 'w').write(results)
  43 
  44 # then, user page
  45 r = ClientCookie.urlopen("http://www.linuxforum.net/forum/login.php?Cat=")
  46 results = r.read()
  47 open('user.html', 'w').write(results)
  48 
  49 # read messages
  50 r = ClientCookie.urlopen("http://www.linuxforum.net/forum/viewmessages.php?Cat=&box=received")
  51 results = r.read()
  52 open('inbox.html', 'w').write(results)
  53 
  54 # save cookies to file
  55 cookies.save("./cookies")

下载源代码:lfcookie-1.py

with urllib2

不过从 Python2.4 开始,将会增加 clientlib 来提供对客户端 cookie 的支持,其中绝大部分代码都是 ClientCookie 里的,只不过调整了一下组织方式和文档,并且和 urllib2 整合在了一起。把上面的代码翻译一下,就可以得到如下的代码:

   1 #!/usr/bin/env python
   2 # -*- coding: GB2312 -*-
   3 # xyb at linuxforum.net
   4 
   5 import sys
   6 from urllib import urlencode
   7 import cookielib, urllib2
   8 cj = cookielib.LWPCookieJar()
   9 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  10 urllib2.install_opener(opener)
  11 opener.addheaders = [
  12         ("User-agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031107 Debian/1.5-3"),
  13         ("Accept", "text/html, image/jpeg, image/png, text/*, image/*, */*")]
  14 
  15 # check args
  16 if len(sys.argv) <= 2:
  17     print "Usage: %s USERNAME PASSWORD" % (sys.argv[0])
  18     sys.exit(1)
  19 else:
  20     UserName = sys.argv[1]
  21     Password = sys.argv[2]
  22 
  23 # login
  24 data = {
  25         'Loginname': UserName,
  26         'Loginpass': Password,
  27         'firstlogin': 1,
  28         'option': '登入论坛'
  29         }
  30 urldata = urlencode(data)
  31 r = opener.open("http://www.linuxforum.net/forum/start_page.php", urldata)
  32 
  33 # show result
  34 results = r.read()
  35 open('start.html', 'w').write(results)
  36 
  37 # then, user page
  38 r = urllib2.urlopen("http://www.linuxforum.net/forum/login.php?Cat=")
  39 results = r.read()
  40 open('user.html', 'w').write(results)
  41 
  42 # read messages
  43 r = urllib2.urlopen("http://www.linuxforum.net/forum/viewmessages.php?Cat=&box=received")
  44 results = r.read()
  45 open('inbox.html', 'w').write(results)
  46 
  47 # save cookies to file
  48 cj.save("./cookies")

下载源代码:lfcookie-2.py

后续

BSDDBCookieJar 这些选择。

2006

PythonClientCookie (last edited 2009-12-25 07:09:38 by localhost)