Python Standard Library

翻译: Python 江湖群

2008-03-28 13:11:53

Contents

1. 网络协议

[index.html 返回首页]

1. 网络协议

"Increasingly, people seem to misinterpret complexity as sophistication, which is baffling - the incomprehensible should cause suspicion rather than admiration. Possibly this trend results from a mistaken belief that using a somewhat mysterious device confers an aura of power on the user." - Niklaus Wirth

1.1. 概览

本章描述了 Python 的 socket 协议支持以及其他建立在 socket 模块上的网络模块. 这些包含了对大多流行 Internet 协议客户端的支持, 以及一些可用来实现 Internet 服务器的框架.

对于那些本章中的底层的例子, 我将使用两个协议作为样例: Internet Time Protocol ( Internet 时间协议 ) 以及 Hypertext Transfer Protocol (超文本传输协议, HTTP 协议).

1.1.1. Internet 时间协议

Internet 时间协议 ( RFC 868, Postel 和 Harrenstien, 1983) 可以让一个网络客户端获得一个服务器的当前时间.

因为这个协议是轻量级的, 许多 Unix 系统(但不是所有)都提供了这个服务. 它可能是最简单的网络协议了. 服务器等待连接请求并在连接后返回当前时间 ( 4 字节整数, 自从 1900 年 1 月 1 日到当前的秒数).

协议很简单, 这里我们提供规格书给大家:

File: rfc868.txt

Network Working Group                                    J. Postel - ISI
Request for Comments: 868                           K. Harrenstien - SRI
                                                                May 1983

                             Time Protocol

This RFC specifies a standard for the ARPA Internet community.  Hosts on
the ARPA Internet that choose to implement a Time Protocol are expected
to adopt and implement this standard.

本 RFC 规范提供了一个 ARPA Internet community 上的标准. 
在 ARPA Internet 上的所有主机应当采用并实现这个标准.

This protocol provides a site-independent, machine readable date and
time.  The Time service sends back to the originating source the time in
seconds since midnight on January first 1900.

此协议提供了一个独立于站点的, 机器可读的日期和时间信息. 
时间服务返回的是从 1900 年 1 月 1 日午夜到现在的秒数.

One motivation arises from the fact that not all systems have a
date/time clock, and all are subject to occasional human or machine
error.  The use of time-servers makes it possible to quickly confirm or
correct a system's idea of the time, by making a brief poll of several
independent sites on the network.

设计这个协议的一个重要目的在于, 网络上的一些主机并没有时钟, 
这有可能导致人工或者机器错误. 我们可以依靠时间服务器快速确认或者修改
一个系统的时间.

This protocol may be used either above the Transmission Control Protocol
(TCP) or above the User Datagram Protocol (UDP).

该协议可以用在 TCP 协议或是 UDP 协议上.

When used via TCP the time service works as follows:

通过 TCP 访问时间服务器的步骤:

 *  S: Listen on port 37 (45 octal).
 *  U: Connect to port 37.
 *  S: Send the time as a 32 bit binary number.
 *  U: Receive the time.
 *  U: Close the connection.
 *  S: Close the connection.
 
 *  S: 监听 37 ( 45 的八进制) 端口.
 *  U: 连接 37 端口.
 *  S: 将时间作为 32 位二进制数字发送.
 *  U: 接收时间.
 *  U: 关闭连接.
 *  S: 关闭连接.  

   The server listens for a connection on port 37.  When the connection
   is established, the server returns a 32-bit time value and closes the
   connection.  If the server is unable to determine the time at its
   site, it should either refuse the connection or close it without
   sending anything.
   
   服务器在 37 端口监听. 当连接建立的时候, 服务器返回一个 32 位的数字值
   并关闭连接. 如果服务器自己无法决定当前时间, 那么它应该拒绝这个连接或者
   不发送任何数据立即关闭连接.

When used via UDP the time service works as follows:

通过 TCP 访问时间服务器的步骤:
   
   S: Listen on port 37 (45 octal).
   U: Send an empty datagram to port 37.
   S: Receive the empty datagram.
   S: Send a datagram containing the time as a 32 bit binary number.
   U: Receive the time datagram.

   S: 监听 37 ( 45 的八进制) 端口.
   U: 发送空数据报文到 37 端口.
   S: 接受空报文.
   S: 发送包含时间( 32 位二进制数字 )的报文.
   U: 接受时间报文.
   
   The server listens for a datagram on port 37.  When a datagram
   arrives, the server returns a datagram containing the 32-bit time
   value.  If the server is unable to determine the time at its site, it
   should discard the arriving datagram and make no reply.
   
   服务器在 37 端口监听报文. 当报文到达时, 服务器返回包含 32 位时间值
   的报文. 如果服务器无法决定当前时间, 那么它应该丢弃到达的报文, 
   不做任何回复.

The Time

时间

The time is the number of seconds since 00:00 (midnight) 1 January 1900
GMT, such that the time 1 is 12:00:01 am on 1 January 1900 GMT; this
base will serve until the year 2036.

时间是自 1900 年 1 月 1 日 0 时到当前的秒数, 
这个协议标准会一直服务到2036年. 到时候数字不够用再说.

For example:

   the time  2,208,988,800 corresponds to 00:00  1 Jan 1970 GMT,
             2,398,291,200 corresponds to 00:00  1 Jan 1976 GMT,
             2,524,521,600 corresponds to 00:00  1 Jan 1980 GMT,
             2,629,584,000 corresponds to 00:00  1 May 1983 GMT,
        and -1,297,728,000 corresponds to 00:00 17 Nov 1858 GMT.
                

例如:

     时间值  2,208,988,800 对应 to 00:00  1 Jan 1970 GMT,
             2,398,291,200 对应 to 00:00  1 Jan 1976 GMT,
             2,524,521,600 对应 to 00:00  1 Jan 1980 GMT,
             2,629,584,000 对应 to 00:00  1 May 1983 GMT,
       最后 -1,297,728,000 对应 to 00:00 17 Nov 1858 GMT.

           
RFC868.txt Translated By Andelf(gt: [email protected] )
非商业用途, 转载请保留作者信息. Thx.

1.1.2. HTTP 协议

超文本传输协议 ( HTTP, RFC 2616 ) 是另个完全不同的东西. 最近的格式说明书( Version 1.1 )超过了 100 页.

从它最简单的格式来看, 这个协议是很简单的. 客户端发送如下的请求到服务器, 请求一个文件:

GET /hello.txt HTTP/1.0
Host: hostname
User-Agent: name

[optional request body , 可选的请求正文]

服务器返回对应的响应:

HTTP/1.0 200 OK
Content-Type: text/plain
Content-Length: 7

Hello

请求和响应的 headers (报头)一般会包含更多的域, 但是请求 header 中的 Host 域/字段是必须提供的.

header 行使用 "\r\n" 分割, 而且 header 后必须有一个空行, 即使没有正文 (请求和响应都必须符合这条规则).

剩下的 HTTP 协议格式说明书细节, 例如内容协商, 缓存机制, 保持连接, 等等, 请参阅 Hypertext TransferProtocol - HTTP/1.1 ( http://www.w3.org/Protocols ).

1.2. socket 模块

socket 模块实现了到 socket 通讯层的接口. 你可以使用该模块创建客户端或是服务器的 socket .

我们首先以一个客户端为例, Example 7-1 中的客户端连接到一个时间协议服务器, 读取 4 字节的返回数据, 并把它转换为一个时间值.

1.2.0.1. Example 7-1. 使用 socket 模块实现一个时间客户端

File: socket-example-1.py

import socket
import struct, time

# server
HOST = "www.python.org"
PORT = 37

# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00

# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))

# read 4 bytes, and convert to time value
t = s.recv(4)
t = struct.unpack("!I", t)[0]
t = int(t - TIME1970)

s.close()

# print results
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"

*B*server time is Sat Oct 09 16:42:36 1999
local clock is 8 seconds off*b*

socket 工厂函数( factory function )根据给定类型(该例子中为 Internet stream socket , 即就是 TCP socket )创建一个新的 socket . connect 方法尝试将这个 socket 连接到指定服务器上. 成功后, 就可以使用 recv 方法读取数据.

创建一个服务器 socket 使用的是相同的方法, 不过这里不是连接到服务器, 而是将 socket bind (绑定)到本机的一个端口上, 告诉它去监听连接请求, 然后尽快处理每个到达的请求.

Example 7-2 创建了一个时间服务器, 绑定到本机的 8037 端口( 1024 前的所有端口是为系统服务保留的, Unix 系统下访问它们你必须要有 root 权限).

1.2.0.2. Example 7-2. 使用 socket 模块实现一个时间服务器

File: socket-example-2.py

import socket
import struct, time

# user-accessible port
PORT = 8037

# reference time
TIME1970 = 2208988800L

# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
service.bind(("", PORT))
service.listen(1)

print "listening on port", PORT

while 1:
    # serve forever
    channel, info = service.accept()
    print "connection from", info
    t = int(time.time()) + TIME1970
    t = struct.pack("!I", t)
    channel.send(t) # send timestamp
    channel.close() # disconnect

*B*listening on port 8037
connection from ('127.0.0.1', 1469)
connection from ('127.0.0.1', 1470)
...*b*

listen 函数的调用告诉 socket 我们期望接受连接. 参数代表连接的队列(用于在程序没有处理前保持连接)大小. 最后 accept 循环将当前时间返回给每个连接的客户端.

注意这里的 accept 函数返回一个新的 socket 对象, 这个对象是直接连接到客户端的. 而原 socket 只是用来保持连接; 所有后来的数据传输操作都使用新的 socket .

我们可以使用 Example 7-3 , ( Example 7-1 的通用化版本)来测试这个服务器, .

1.2.0.3. Example 7-3. 一个时间协议客户端

File: timeclient.py

import socket
import struct, sys, time

# default server
host = "localhost"
port = 8037

# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00

def gettime(host, port):
    # fetch time buffer from stream server
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((host, port))
    t = s.recv(4)
    s.close()
    t = struct.unpack("!I", t)[0]
    return int(t - TIME1970)

if _ _name_ _ == "_ _main_ _":
    # command-line utility
    if sys.argv[1:]:
        host = sys.argv[1]
        if sys.argv[2:]:
            port = int(sys.argv[2])
        else:
            port = 37 # default for public servers

    t = gettime(host, port)
    print "server time is", time.ctime(t)
    print "local clock is", int(time.time()) - t, "seconds off"

*B*server time is Sat Oct 09 16:58:50 1999
local clock is 0 seconds off*b*

Example 7-3 所示的脚本也可以作为模块使用; 你只需要导入 timeclient 模块, 然后调用它的 gettime 函数.

目前为止, 我们已经使用了流( TCP ) socket . 时间协议还提到了

UDP sockets (报文). 流 socket 的工作模式和电话线类似; 你会知道在远端

是否有人拿起接听器, 在对方挂断的时候你也会注意到. 相比之下, 发送报文更像是在一间黑屋子里大声喊. 可能某人会在那里, 但你只有在他回复的时候才会知道.

如 Example 7-4 所示, 你不需要在通过报文 socket 发送数据时连接远程机器. 只需使用 sendto 方法, 它接受数据和接收者地址作为参数. 读取报文的时候使用 recvfrom 方法.

1.2.0.4. Example 7-4. 使用 socket 模块实现一个报文时间客户端

File: socket-example-4.py

import socket
import struct, time

# server
HOST = "localhost"
PORT = 8037

# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00

# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# send empty packet
s.sendto("", (HOST, PORT))

# read 4 bytes from server, and convert to time value
t, server = s.recvfrom(4)
t = struct.unpack("!I", t)[0]
t = int(t - TIME1970)

s.close()

print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"

*B*server time is Sat Oct 09 16:42:36 1999
local clock is 8 seconds off*b*

这里的 recvfrom 返回两个值: 数据和发送者的地址. 后者用于发送回复数据.

Example 7-5 展示了对应的服务器代码.

Example 7-5. 使用 socket 模块实现一个报文时间服务器

File: socket-example-5.py

import socket
import struct, time

# user-accessible port
PORT = 8037

# reference time
TIME1970 = 2208988800L

# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
service.bind(("", PORT))

print "listening on port", PORT

while 1:
    # serve forever
    data, client = service.recvfrom(0)
    print "connection from", client
    t = int(time.time()) + TIME1970
    t = struct.pack("!I", t)
    service.sendto(t, client) # send timestamp

*B*listening on port 8037
connection from ('127.0.0.1', 1469)
connection from ('127.0.0.1', 1470)
...*b*

最主要的不同在于服务器使用 bind 来分配一个已知端口给 socket , 根据 recvfrom 函数返回的地址向客户端发送数据.

1.3. select 模块

select 模块允许你检查一个或多个 socket , 管道, 以及其他流兼容对象所接受的数据, 如 Example 7-6 所示.

你可以将一个或更多 socket 传递给 select 函数, 然后等待它们状态改变(可读, 可写, 或是发送错误信号):

如果某人在调用了 listen 函数后连接, 当远端数据到达时, socket 就成为可读的(这意味着 accept 不会阻塞). 或者是 socket 被关闭或重置时(在此情况下, recv 会返回一个空字符串).
当非阻塞调用 connect 方法后建立连接或是数据可以被写入到 socket 时, socket 就成为可写的.
当非阻塞调用 connect 方法后连接失败后, socket 会发出一个错误信号.

1.3.0.1. Example 7-6. 使用 select 模块等待经 socket 发送的数据

File: select-example-1.py

import select
import socket
import time

PORT = 8037

TIME1970 = 2208988800L

service = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
service.bind(("", PORT))
service.listen(1)

print "listening on port", PORT

while 1:
    is_readable = [service]
    is_writable = []
    is_error = []
    r, w, e = select.select(is_readable, is_writable, is_error, 1.0)
    if r:
        channel, info = service.accept()
        print "connection from", info
        t = int(time.time()) + TIME1970
        t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
        channel.send(t) # send timestamp
        channel.close() # disconnect
    else:
        print "still waiting"

*B*listening on port 8037
still waiting
still waiting
connection from ('127.0.0.1', 1469)
still waiting
connection from ('127.0.0.1', 1470)
...*b*

在 Example 7-6 中, 我们等待监听 socket 变成可读状态, 这代表有一个连接请求到达. 我们用和之前一样的方法处理 channel socket , 因为它不可能因为等待 4 字节而填充网络缓冲区. 如果你需要向客户端发送大量的数据, 那么你应该在循环的顶端把数据加入到 is_writable 列表中, 并且只在 select 允许的情况下写入.

如果你设置 socket 为非阻塞模式(通过调用 setblocking 方法), 那么你就可以使用

select 来等待 socket 连接. 不过 asyncore 模块(参见下一节)提供了一个强大的框架,

它自动为你处理好了这一切. 所以我不准备在这里多说什么, 看下一节吧.

1.4. asyncore 模块

asyncore 模块提供了一个 "反馈性的( reactive )" socket 实现. 该模块允许你定义特定过程完成后所执行的代码, 而不是创建 socket 对象, 调用它们的方法. 你只需要继承 dispatcher 类, 然后重载如下方法 (可以选择重载某一个或多个)就可以实现异步的 socket 处理器.

handle_connect : 一个连接成功建立后被调用.
handle_expt : 连接失败后被调用.
handle_accept : 连接请求建立到一个监听 socket 上时被调用. 回调时( callback )应该使用 accept 方法来获得客户端 socket .
handle_read : 有来自 socket 的数据等待读取时被调用. 回调时应该使用 recv 方法来获得数据.
handle_write : socket 可以写入数据的时候被调用. 使用 send 方法写入数据.
handle_close : 当 socket 被关闭或复位时被调用.
handle_error(type, value, traceback) 在任何一个回调函数发生 Python 错误时被调用. 默认的实现会打印跟踪返回消息到 sys.stdout .

Example 7-7 展示了一个时间客户端, 和 socket 模块中的那个类似.

1.4.0.1. Example 7-7. 使用 asyncore 模块从时间服务器获得时间

File: asyncore-example-1.py

import asyncore
import socket, time

# reference time (in seconds since 1900-01-01 00:00:00)
TIME1970 = 2208988800L # 1970-01-01 00:00:00

class TimeRequest(asyncore.dispatcher):
    # time requestor (as defined in RFC 868)

    def _ _init_ _(self, host, port=37):
        asyncore.dispatcher._ _init_ _(self)
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, port))

    def writable(self):
        return 0 # don't have anything to write

    def handle_connect(self):
        pass # connection succeeded

    def handle_expt(self):
        self.close() # connection failed, shutdown

    def handle_read(self):
        # get local time
        here = int(time.time()) + TIME1970

        # get and unpack server time
        s = self.recv(4)
        there = ord(s[3]) + (ord(s[2])<<8) + (ord(s[1])<<16) + (ord(s[0])<<24L)

        self.adjust_time(int(here - there))

        self.handle_close() # we don't expect more data

    def handle_close(self):
        self.close()

    def adjust_time(self, delta):
        # override this method!
        print "time difference is", delta

#
# try it out

request = TimeRequest("www.python.org")

asyncore.loop()

*B*log: adding channel <TimeRequest  at 8cbe90>
time difference is 28
log: closing channel 192:<TimeRequest connected at 8cbe90>*b*

如果你不想记录任何信息, 那么你可以在你的 dispatcher 类里重载 log 方法.

Example 7-8 展示了对应的时间服务器. 注意这里它使用了两个 dispatcher 子类, 一个用于监听 socket , 另个用于与客户端通讯.

1.4.0.2. Example 7-8. 使用 asyncore 模块实现时间服务器

File: asyncore-example-2.py

import asyncore
import socket, time

# reference time
TIME1970 = 2208988800L

class TimeChannel(asyncore.dispatcher):

    def handle_write(self):
        t = int(time.time()) + TIME1970
        t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
        self.send(t)
        self.close()

class TimeServer(asyncore.dispatcher):

    def _ _init_ _(self, port=37):
        self.port = port
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.bind(("", port))
        self.listen(5)
        print "listening on port", self.port

    def handle_accept(self):
        channel, addr = self.accept()
        TimeChannel(channel)

server = TimeServer(8037)
asyncore.loop()

*B*log: adding channel <TimeServer  at 8cb940>
listening on port 8037
log: adding channel <TimeChannel  at 8b2fd0>
log: closing channel 52:<TimeChannel connected at 8b2fd0>*b*

除了 dispatcher 外, 这个模块还包含一个 dispatcher_with_send 类. 你可以使用这个类发送大量的数据而不会阻塞网络通讯缓冲区.

Example 7-9 中的模块通过继承 dispatcher_with_send 类定义了一个 AsyncHTTP 类. 当你创建一个它的实例后, 它会发出一个 HTTP GET 请求并把接受到的数据发送到一个 "consumer" 目标对象

1.4.0.3. Example 7-9. 使用 asyncore 模块发送 HTTP 请求

File: SimpleAsyncHTTP.py

import asyncore
import string, socket
import StringIO
import mimetools, urlparse

class AsyncHTTP(asyncore.dispatcher_with_send):
    # HTTP requester

    def _ _init_ _(self, uri, consumer):
        asyncore.dispatcher_with_send._ _init_ _(self)

        self.uri = uri
        self.consumer = consumer

        # turn the uri into a valid request
        scheme, host, path, params, query, fragment = urlparse.urlparse(uri)
        assert scheme == "http", "only supports HTTP requests"
        try:
            host, port = string.split(host, ":", 1)
            port = int(port)
        except (TypeError, ValueError):
            port = 80 # default port
        if not path:
            path = "/"
        if params:
            path = path + ";" + params
        if query:
            path = path + "?" + query

        self.request = "GET %s HTTP/1.0\r\nHost: %s\r\n\r\n" % (path, host)

        self.host = host
        self.port = port

        self.status = None
        self.header = None

        self.data = ""

        # get things going!
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, port))

    def handle_connect(self):
        # connection succeeded
        self.send(self.request)

    def handle_expt(self):
        # connection failed; notify consumer (status is None)
        self.close()
        try:
            http_header = self.consumer.http_header
        except AttributeError:
            pass
        else:
            http_header(self)

    def handle_read(self):
        data = self.recv(2048)
        if not self.header:
            self.data = self.data + data
            try:
                i = string.index(self.data, "\r\n\r\n")
            except ValueError:
                return # continue
            else:
                # parse header
                fp = StringIO.StringIO(self.data[:i+4])
                # status line is "HTTP/version status message"
                status = fp.readline()
                self.status = string.split(status, " ", 2)
                # followed by a rfc822-style message header
                self.header = mimetools.Message(fp)
                # followed by a newline, and the payload (if any)
                data = self.data[i+4:]
                self.data = ""
                # notify consumer (status is non-zero)
                try:
                    http_header = self.consumer.http_header
                except AttributeError:
                    pass
                else:
                    http_header(self)
                if not self.connected:
                    return # channel was closed by consumer

        self.consumer.feed(data)

    def handle_close(self):
        self.consumer.close()
        self.close()

Example 7-10 中的小脚本展示了如何使用这个类.

1.4.0.4. Example 7-10. 使用 SimpleAsyncHTTP 类

File: asyncore-example-3.py

import SimpleAsyncHTTP
import asyncore

class DummyConsumer:
    size = 0

    def http_header(self, request):
        # handle header
        if request.status is None:
            print "connection failed"
        else:
            print "status", "=>", request.status
            for key, value in request.header.items():
                print key, "=", value

    def feed(self, data):
        # handle incoming data
        self.size = self.size + len(data)

    def close(self):
        # end of data
        print self.size, "bytes in body"

#
# try it out

consumer = DummyConsumer()

request = SimpleAsyncHTTP.AsyncHTTP(
    "http://www.pythonware.com",
    consumer
    )

asyncore.loop()

*B*log: adding channel <AsyncHTTP  at 8e2850>
status => ['HTTP/1.1', '200', 'OK\015\012']
server = Apache/Unix (Unix)
content-type = text/html
content-length = 3730
...
3730 bytes in body
log: closing channel 156:<AsyncHTTP connected at 8e2850>*b*

这里的 consumer 接口设计时是为了与 htmllib 和 xmllib 分析器兼容的, 这样你就可以直接方便地解析 HTML 或是 XML 数据. http_header 方法是可选的; 如果没有定义它, 那么它将被忽略.

Example 7-10 的一个问题是它不能很好地处理重定向资源. Example 7-11 加入了一个额外的 consumer 层, 它可以很好地处理重定向.

1.4.0.5. Example 7-11. 使用 SimpleAsyncHTTP 类处理重定向

File: asyncore-example-4.py

import SimpleAsyncHTTP
import asyncore

class DummyConsumer:
    size = 0

    def http_header(self, request):
        # handle header
        if request.status is None:
            print "connection failed"
        else:
            print "status", "=>", request.status
            for key, value in request.header.items():
                print key, "=", value

    def feed(self, data):
        # handle incoming data
        self.size = self.size + len(data)

    def close(self):
        # end of data
        print self.size, "bytes in body"

class RedirectingConsumer:

    def _ _init_ _(self, consumer):
        self.consumer = consumer

    def http_header(self, request):
        # handle header
        if request.status is None or\
           request.status[1] not in ("301", "302"):
            try:
                http_header = self.consumer.http_header
            except AttributeError:
                pass
            else:
                return http_header(request)
        else:
            # redirect!
            uri = request.header["location"]
            print "redirecting to", uri, "..."
            request.close()
            SimpleAsyncHTTP.AsyncHTTP(uri, self)

    def feed(self, data):
        self.consumer.feed(data)

    def close(self):
        self.consumer.close()

#
# try it out

consumer = RedirectingConsumer(DummyConsumer())

request = SimpleAsyncHTTP.AsyncHTTP(
    "http://www.pythonware.com/library",
    consumer
    )

asyncore.loop()

*B*log: adding channel <AsyncHTTP  at 8e64b0>
redirecting to http://www.pythonware.com/library/ ...
log: closing channel 48:<AsyncHTTP connected at 8e64b0>
log: adding channel <AsyncHTTP  at 8ea790>
status => ['HTTP/1.1', '200', 'OK\015\012']
server = Apache/Unix (Unix)
content-type = text/html
content-length = 387
...
387 bytes in body
log: closing channel 236:<AsyncHTTP connected at 8ea790>*b*

如果服务器返回状态 301 (永久重定向) 或者是 302 (临时重定向), 重定向的 consumer 会关闭当前请求并向新地址发出新请求. 所有对 consumer 的其他调用传递给原来的 consumer .

1.5. asynchat 模块

asynchat 模块是对 asyncore 的一个扩展. 它提供对面向行( line-oriented )的协议的额外支持. 它还提供了增强的缓冲区支持(通过 push 方法和 "producer" 机制.

Example 7-12 实现了一个很小的 HTTP 服务器. 它只是简单地返回包含 HTTP 请求信息的 HTML 文档(浏览器窗口出现的输出).

1.5.0.1. Example 7-12. 使用 asynchat 模块实现一个迷你 HTTP 服务器

File: asynchat-example-1.py

import asyncore, asynchat
import os, socket, string

PORT = 8000

class HTTPChannel(asynchat.async_chat):

    def _ _init_ _(self, server, sock, addr):
        asynchat.async_chat._ _init_ _(self, sock)
        self.set_terminator("\r\n")
        self.request = None
        self.data = ""
        self.shutdown = 0

    def collect_incoming_data(self, data):
        self.data = self.data + data

    def found_terminator(self):
        if not self.request:
            # got the request line
            self.request = string.split(self.data, None, 2)
            if len(self.request) != 3:
                self.shutdown = 1
            else:
                self.push("HTTP/1.0 200 OK\r\n")
                self.push("Content-type: text/html\r\n")
                self.push("\r\n")
            self.data = self.data + "\r\n"
            self.set_terminator("\r\n\r\n") # look for end of headers
        else:
            # return payload.
            self.push("<html><body><pre>\r\n")
            self.push(self.data)
            self.push("</pre></body></html>\r\n")
            self.close_when_done()

class HTTPServer(asyncore.dispatcher):

    def _ _init_ _(self, port):
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.bind(("", port))
        self.listen(5)

    def handle_accept(self):
        conn, addr = self.accept()
        HTTPChannel(self, conn, addr)

#
# try it out

s = HTTPServer(PORT)
print "serving at port", PORT, "..."
asyncore.loop()

*B*GET / HTTP/1.1
Accept: */*
Accept-Language: en, sv
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; Bruce/1.0)
Host: localhost:8000
Connection: Keep-Alive*b*

producer 接口允许你传入( "push" )太大以至于无法在内存中储存的对象. asyncore 在需要更多数据的时候自动调用 producer 的 more 方法. 另外, 它使用一个空字符串标记文件的末尾.

Example 7-13 实现了一个很简单的基于文件的 HTTP 服务器, 它使用了一个简单的 FileProducer 类来从文件中读取数据, 每次只读取几 kb .

1.5.0.2. Example 7-13. 使用 asynchat 模块实现一个简单的 HTTP 服务器

File: asynchat-example-2.py

import asyncore, asynchat
import os, socket, string, sys
import StringIO, mimetools

ROOT = "."

PORT = 8000

class HTTPChannel(asynchat.async_chat):

    def _ _init_ _(self, server, sock, addr):
        asynchat.async_chat._ _init_ _(self, sock)
        self.server = server
        self.set_terminator("\r\n\r\n")
        self.header = None
        self.data = ""
        self.shutdown = 0

    def collect_incoming_data(self, data):
        self.data = self.data + data
        if len(self.data) > 16384:
            # limit the header size to prevent attacks
            self.shutdown = 1

    def found_terminator(self):
        if not self.header:
            # parse http header
            fp = StringIO.StringIO(self.data)
            request = string.split(fp.readline(), None, 2)
            if len(request) != 3:
                # badly formed request; just shut down
                self.shutdown = 1
            else:
                # parse message header
                self.header = mimetools.Message(fp)
                self.set_terminator("\r\n")
                self.server.handle_request(
                    self, request[0], request[1], self.header
                    )
                self.close_when_done()
            self.data = ""
        else:
            pass # ignore body data, for now

    def pushstatus(self, status, explanation="OK"):
        self.push("HTTP/1.0 %d %s\r\n" % (status, explanation))


class FileProducer:
    # a producer that reads data from a file object

    def _ _init_ _(self, file):
        self.file = file

    def more(self):
        if self.file:
            data = self.file.read(2048)
            if data:
                return data
            self.file = None
        return ""


class HTTPServer(asyncore.dispatcher):

    def _ _init_ _(self, port=None, request=None):
        if not port:
            port = 80
        self.port = port
        if request:
            self.handle_request = request # external request handler
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.bind(("", port))
        self.listen(5)

    def handle_accept(self):
        conn, addr = self.accept()
        HTTPChannel(self, conn, addr)

    def handle_request(self, channel, method, path, header):
        try:
            # this is not safe!
            while path[:1] == "/":
                path = path[1:]
            filename = os.path.join(ROOT, path)
            print path, "=>", filename
            file = open(filename, "r")
        except IOError:
            channel.pushstatus(404, "Not found")
            channel.push("Content-type: text/html\r\n")
            channel.push("\r\n")
            channel.push("<html><body>File not found.</body></html>\r\n")
        else:
            channel.pushstatus(200, "OK")
            channel.push("Content-type: text/html\r\n")
            channel.push("\r\n")
            channel.push_with_producer(FileProducer(file))

#
# try it out

s = HTTPServer(PORT)
print "serving at port", PORT
asyncore.loop()

*B*serving at port 8000
log: adding channel <HTTPServer  at 8e54d0>
log: adding channel <HTTPChannel  at 8e64a0>
samples/sample.htm => .\samples/sample.htm
log: closing channel 96:<HTTPChannel connected at 8e64a0>*b*

1.6. urllib 模块

urlib 模块为 HTTP , FTP , 以及 gopher 提供了一个统一的客户端接口. 它会自动地根据 URL 选择合适的协议处理器.

从 URL 获取数据是非常简单的. 只需要调用 urlopen 方法, 然后从返回的流对象中读取数据即可, 如 Example 7-14 所示.

1.6.0.1. Example 7-14. 使用 urllib 模块获取远程资源

File: urllib-example-1.py

import urllib

fp = urllib.urlopen("http://www.python.org")

op = open("out.html", "wb")

n = 0

while 1:
    s = fp.read(8192)
    if not s:
        break
    op.write(s)
    n = n + len(s)

fp.close()
op.close()

for k, v in fp.headers.items():
    print k, "=", v

print "copied", n, "bytes from", fp.url

*B*server = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.org*b*

这个流对象提供了一些非标准的属性. headers 是一个 Message 对象(在 mimetools 模块中定义), url 是实际的 URL . 后者会根据服务器的重定向而更新.

urlopen 函数实际上是一个辅助函数, 它会创建一个 FancyURLopener 类的实例并调用它的 open 方法. 你也可以继承这个类来完成特殊的行为. 例如 Example 7-15 中的类会自动地在必要时登陆服务器.

1.6.0.2. Example 7-15. 用 urllib 模块实现自动身份验证

File: urllib-example-3.py

import urllib

class myURLOpener(urllib.FancyURLopener):
    # read an URL, with automatic HTTP authentication

    def setpasswd(self, user, passwd):
        self._ _user = user
        self._ _passwd = passwd

    def prompt_user_passwd(self, host, realm):
        return self._ _user, self._ _passwd

urlopener = myURLOpener()
urlopener.setpasswd("mulder", "trustno1")

fp = urlopener.open("http://www.secretlabs.com")
print fp.read()

1.7. urlparse 模块

urlparse 模块包含用于处理 URL 的函数, 可以在 URL 和平台特定的文件名间相互转换. 如 Example 7-16 所示.

1.7.0.1. Example 7-16. 使用 urlparse 模块

File: urlparse-example-1.py

import urlparse

print urlparse.urlparse("http://host/path;params?query#fragment")

('http', 'host', '/path', 'params', 'query', 'fragment')

一个常见用途就是把 HTTP URL 分割为主机名和路径组件(一个 HTTP 请求会涉及到主机名以及请求路径), 如 Example 7-17 所示.

1.7.0.2. Example 7-17. 使用 urlparse 模块处理 HTTP 定位器( HTTP Locators )

File: urlparse-example-2.py

import urlparse

scheme, host, path, params, query, fragment =\
        urlparse.urlparse("http://host/path;params?query#fragment")

if scheme == "http":
    print "host", "=>", host
    if params:
        path = path + ";" + params
    if query:
        path = path + "?" + query
    print "path", "=>", path

*B*host => host
path => /path;params?query*b*

Example 7-18 展示了如何使用 urlunparse 函数将各组成部分合并回一个 URL .

1.7.0.3. Example 7-18. 使用 urlparse 模块处理 HTTP 定位器( HTTP Locators )

File: urlparse-example-3.py

import urlparse

scheme, host, path, params, query, fragment =\
        urlparse.urlparse("http://host/path;params?query#fragment")

if scheme == "http":
    print "host", "=>", host
    print "path", "=>", urlparse.urlunparse(
    (None, None, path, params, query, None)
    )

*B*host => host
path => /path;params?query*b*

Example 7-19 使用 urljoin 函数将绝对路径和相对路径组合起来.

1.7.0.4. Example 7-19. 使用 urlparse 模块组合相对定位器

File: urlparse-example-4.py

import urlparse

base = "http://spam.egg/my/little/pony"

for path in "/index", "goldfish", "../black/cat":
    print path, "=>", urlparse.urljoin(base, path)

*B*/index => http://spam.egg/index
goldfish => http://spam.egg/my/little/goldfish
../black/cat => http://spam.egg/my/black/cat*b*

1.8. cookie 模块

(2.0 中新增) 该模块为 HTTP 客户端和服务器提供了基本的 cookie 支持. Example 7-20 展示了它的使用.

1.8.0.1. Example 7-20. 使用 cookie 模块

File: cookie-example-1.py

import Cookie
import os, time

cookie = Cookie.SimpleCookie()
cookie["user"] = "Mimi"
cookie["timestamp"] = time.time()

print cookie

# simulate CGI roundtrip
os.environ["HTTP_COOKIE"] = str(cookie)

print

cookie = Cookie.SmartCookie()
cookie.load(os.environ["HTTP_COOKIE"])

for key, item in cookie.items():
    # dictionary items are "Morsel" instances
    # use value attribute to get actual value
    print key, repr(item.value)


*B*Set-Cookie: timestamp=736513200;
Set-Cookie: user=Mimi;

user 'Mimi'
timestamp '736513200'*b*

1.9. robotparser 模块

(2.0 中新增) robotparser 模块用来读取 robots.txt 文件, 该文件用于 Robot Exclusion Protocol (搜索机器人排除协议? http://info.webcrawler.com/mak/projects/robots/robots.html).

如果你实现的一个 HTTP 机器人会访问网路上的任意站点(并不只是你自己的站点), 那么最好还是用该模块检查下你所做的一切是不是受欢迎的. Example 7-21 展示了该模块的使用.

1.9.0.1. Example 7-21. 使用 robotparser 模块

File: robotparser-example-1.py

import robotparser

r = robotparser.RobotFileParser()
r.set_url("http://www.python.org/robots.txt")
r.read()

if r.can_fetch("*", "/index.html"):
    print "may fetch the home page"

if r.can_fetch("*", "/tim_one/index.html"):
    print "may fetch the tim peters archive"

*B*may fetch the home page*b*

1.10. ftplib 模块

ftplib 模块包含了一个 File Transfer Protocol (FTP , 文件传输协议)客户端的实现.

Example 7-22 展示了如何登陆并获得登陆目录的文件列表. 注意这里的文件列表 (列目录操作)格式与服务器有关(一般和主机平台的列目录工具输出格式相同, 例如 Unix 下的 ls 和 Windows/DOS 下的 dir ).

1.10.0.1. Example 7-22. 使用 ftplib 模块获得目录列表

File: ftplib-example-1.py

import ftplib

ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-1")

print ftp.dir()

ftp.quit()

*B*total 34
drwxrwxr-x  11 root     4127         512 Sep 14 14:18 .
drwxrwxr-x  11 root     4127         512 Sep 14 14:18 ..
drwxrwxr-x   2 root     4127         512 Sep 13 15:18 RCS
lrwxrwxrwx   1 root     bin           11 Jun 29 14:34 README -> welcome.msg
drwxr-xr-x   3 root     wheel        512 May 19  1998 bin
drwxr-sr-x   3 root     1400         512 Jun  9  1997 dev
drwxrwxr--   2 root     4127         512 Feb  8  1998 dup
drwxr-xr-x   3 root     wheel        512 May 19  1998 etc
...*b*

下载文件很简单; 使用合适的 retr 函数即可. 注意当你下载文本文件时, 你必须自己加上行结束符. Example 7-23 中使用了一个 lambda 表达式完成这项工作.

1.10.0.2. Example 7-23. 使用 ftplib 模块下载文件

File: ftplib-example-2.py

import ftplib
import sys

def gettext(ftp, filename, outfile=None):
    # fetch a text file
    if outfile is None:
        outfile = sys.stdout
    # use a lambda to add newlines to the lines read from the server
    ftp.retrlines("RETR " + filename, lambda s, w=outfile.write: w(s+"\n"))

def getbinary(ftp, filename, outfile=None):
    # fetch a binary file
    if outfile is None:
        outfile = sys.stdout
    ftp.retrbinary("RETR " + filename, outfile.write)

ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-2")

gettext(ftp, "README")
getbinary(ftp, "welcome.msg")

*B*WELCOME to python.org, the Python programming language home site.

You are number %N of %M allowed users.  Ni!

Python Web site: http://www.python.org/

CONFUSED FTP CLIENT?  Try begining your login password with '-' dash.
This turns off continuation messages that may be confusing your client.
...*b*

最后, Example 7-24 将文件复制到 FTP 服务器上. 这个脚本使用文件扩展名来判断文件是文本文件还是二进制文件.

1.10.0.3. Example 7-24. 使用 ftplib 模块上传文件

File: ftplib-example-3.py

import ftplib
import os

def upload(ftp, file):
    ext = os.path.splitext(file)[1]
    if ext in (".txt", ".htm", ".html"):
        ftp.storlines("STOR " + file, open(file))
    else:
        ftp.storbinary("STOR " + file, open(file, "rb"), 1024)

ftp = ftplib.FTP("ftp.fbi.gov")
ftp.login("mulder", "trustno1")

upload(ftp, "trixie.zip")
upload(ftp, "file.txt")
upload(ftp, "sightings.jpg")

1.11. gopherlib 模块

gopherlib 模块包含了一个 gopher 客户端实现, 如 Example 7-25 所示.

1.11.0.1. Example 7-25. 使用 gopherlib 模块

File: gopherlib-example-1.py

import gopherlib

host = "gopher.spam.egg"

f = gopherlib.send_selector("1/", host)
for item in gopherlib.get_directory(f):
    print item

*B*['0', "About Spam.Egg's Gopher Server", "0/About's Spam.Egg's
Gopher Server", 'gopher.spam.egg', '70', '+']
['1', 'About Spam.Egg', '1/Spam.Egg', 'gopher.spam.egg', '70', '+']
['1', 'Misc', '1/Misc', 'gopher.spam.egg', '70', '+']
...*b*

1.12. httplib 模块

httplib 模块提供了一个 HTTP 客户端接口, 如 Example 7-26 所示.

1.12.0.1. Example 7-26. 使用 httplib 模块

File: httplib-example-1.py

import httplib

USER_AGENT = "httplib-example-1.py"

class Error:
    # indicates an HTTP error
    def _ _init_ _(self, url, errcode, errmsg, headers):
        self.url = url
        self.errcode = errcode
        self.errmsg = errmsg
        self.headers = headers
    def _ _repr_ _(self):
        return (
            "<Error for %s: %s %s>" %
            (self.url, self.errcode, self.errmsg)
            )

class Server:

    def _ _init_ _(self, host):
        self.host = host

    def fetch(self, path):
        http = httplib.HTTP(self.host)

        # write header
        http.putrequest("GET", path)
        http.putheader("User-Agent", USER_AGENT)
        http.putheader("Host", self.host)
        http.putheader("Accept", "*/*")
        http.endheaders()

        # get response
        errcode, errmsg, headers = http.getreply()

        if errcode != 200:
            raise Error(errcode, errmsg, headers)

        file = http.getfile()
        return file.read()

if _ _name_ _ == "_ _main_ _":

    server = Server("www.pythonware.com")
    print server.fetch("/index.htm")

注意 httplib 提供的 HTTP 客户端在等待服务器回复的时候会阻塞程序. 异步的解决方法请参阅 asyncore 模块中的例子.

1.12.1. 将数据发送给服务器

httplib 可以用来发送其他 HTTP 命令, 例如 POST , 如 Example 7-27 所示.

1.12.1.1. Example 7-27. 使用 httplib 发送数据

File: httplib-example-2.py

import httplib

USER_AGENT = "httplib-example-2.py"

def post(host, path, data, type=None):

    http = httplib.HTTP(host)

    # write header
    http.putrequest("PUT", path)
    http.putheader("User-Agent", USER_AGENT)
    http.putheader("Host", host)
    if type:
        http.putheader("Content-Type", type)
    http.putheader("Content-Length", str(len(size)))
    http.endheaders()

    # write body
    http.send(data)

    # get response
    errcode, errmsg, headers = http.getreply()

    if errcode != 200:
        raise Error(errcode, errmsg, headers)

    file = http.getfile()
    return file.read()

if _ _name_ _ == "_ _main_ _":

    post("www.spam.egg", "/bacon.htm", "a piece of data", "text/plain")

1.13. poplib 模块

poplib 模块(如 Example 7-28 所示) 提供了一个 Post Office Protocol ( POP3 协议) 客户端实现. 这个协议用来从邮件服务器 "pop" (拷贝) 信息到你的个人电脑.

1.13.0.1. Example 7-28. 使用 poplib 模块

File: poplib-example-1.py

import poplib
import string, random
import StringIO, rfc822

SERVER = "pop.spam.egg"

USER  = "mulder"
PASSWORD = "trustno1"

# connect to server
server = poplib.POP3(SERVER)

# login
server.user(USER)
server.pass_(PASSWORD)

# list items on server
resp, items, octets = server.list()

# download a random message
id, size = string.split(random.choice(items))
resp, text, octets = server.retr(id)

text = string.join(text, "\n")
file = StringIO.StringIO(text)

message = rfc822.Message(file)

for k, v in message.items():
    print k, "=", v

print message.fp.read()

*B*subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <[email protected]>
received = (from [email protected])
 by spam.egg (8.8.7/8.8.5) id KAA09206
 for mulder; Tue, 12 Oct 1999 10:08:47 +0200
from = Fredrik Lundh <[email protected]>
date = Tue, 12 Oct 1999 10:08:47 +0200
to = [email protected]

...*b*

1.14. imaplib 模块

imaplib 模块提供了一个 Internet Message Access Protocol ( IMAP, Internet 消息访问协议) 的客户端实现. 这个协议允许你访问邮件服务器的邮件目录, 就好像是在本机访问一样. 如 Example 7-29 所示.

1.14.0.1. Example 7-29. 使用 imaplib 模块

File: imaplib-example-1.py

import imaplib
import string, random
import StringIO, rfc822

SERVER = "imap.spam.egg"

USER  = "mulder"
PASSWORD = "trustno1"

# connect to server
server = imaplib.IMAP4(SERVER)

# login
server.login(USER, PASSWORD)
server.select()

# list items on server
resp, items = server.search(None, "ALL")
items = string.split(items[0])

# fetch a random item
id = random.choice(items)
resp, data = server.fetch(id, "(RFC822)")
text = data[0][1]

file = StringIO.StringIO(text)

message = rfc822.Message(file)

for k, v in message.items():
    print k, "=", v

print message.fp.read()

server.logout()

*B*subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <[email protected]>
to = [email protected]
date = Tue, 12 Oct 1999 10:16:19 +0200 (MET DST)
from = <[email protected]>
received = ([email protected]) by imap.algonet.se (8.8.8+Sun/8.6.12)
id KAA12177 for [email protected]; Tue, 12 Oct 1999 10:16:19 +0200
(MET DST)

body text for test 5*b*

1.15. smtplib 模块

smtplib 模块提供了一个 Simple Mail Transfer Protocol ( SMTP , 简单邮件传输协议) 客户端实现. 该协议用于通过 Unix 邮件服务器发送邮件, 如 Example 7-30 所示.

读取邮件请使用 poplib 或 imaplib 模块.

1.15.0.1. Example 7-30. 使用 smtplib 模块

File: smtplib-example-1.py

import smtplib
import string, sys

HOST = "localhost"

FROM = "[email protected]"
TO = "[email protected]"

SUBJECT = "for your information!"

BODY = "next week: how to fling an otter"

body = string.join((
    "From: %s" % FROM,
    "To: %s" % TO,
    "Subject: %s" % SUBJECT,
    "",
    BODY), "\r\n")

print body

server = smtplib.SMTP(HOST)
server.sendmail(FROM, [TO], body)
server.quit()

*B*From: [email protected]
To: [email protected]
Subject: for your information!

next week: how to fling an otter*b*

1.16. telnetlib 模块

telnetlib 模块提供了一个 telnet 客户端实现.

Example 7-31 连接到一台 Unix 计算机, 登陆, 然后请求一个目录的列表.

1.16.0.1. Example 7-31. 使用 telnetlib 模块登陆到远程服务器

File: telnetlib-example-1.py

import telnetlib
import sys

HOST = "spam.egg"

USER = "mulder"
PASSWORD = "trustno1"

telnet = telnetlib.Telnet(HOST)

telnet.read_until("login: ")
telnet.write(USER + "\n")

telnet.read_until("Password: ")
telnet.write(PASSWORD + "\n")

telnet.write("ls librarybook\n")
telnet.write("exit\n")

print telnet.read_all()

*B*[spam.egg mulder]$ ls
README                                 os-path-isabs-example-1.py
SimpleAsyncHTTP.py                     os-path-isdir-example-1.py
aifc-example-1.py                      os-path-isfile-example-1.py
anydbm-example-1.py                    os-path-islink-example-1.py
array-example-1.py                     os-path-ismount-example-1.py
...*b*

1.17. nntplib 模块

nntplib 模块提供了一个网络新闻传输协议( Network News Transfer Protocol, NNTP )客户端的实现.

1.17.1. 列出消息

从新闻服务器上读取消息之前, 你必须连接这个服务器并选择一个新闻组. Example 7-32 中的脚本会从服务器下载一个完成的消息列表, 然后根据列表做简单的统计.

1.17.1.1. Example 7-32. 使用 nntplib 模块列出消息

File: nntplib-example-1.py

import nntplib
import string

SERVER = "news.spam.egg"
GROUP  = "comp.lang.python" 
AUTHOR = "[email protected]" # eff-bots human alias

# connect to server
server = nntplib.NNTP(SERVER)

# choose a newsgroup
resp, count, first, last, name = server.group(GROUP)
print "count", "=>", count
print "range", "=>", first, last

# list all items on the server
resp, items = server.xover(first, last)

# extract some statistics
authors = {}
subjects = {}
for id, subject, author, date, message_id, references, size, lines in items:
    authors[author] = None
    if subject[:4] == "Re: ":
        subject = subject[4:]
    subjects[subject] = None
    if string.find(author, AUTHOR) >= 0:
        print id, subject
    
print "authors", "=>", len(authors)
print "subjects", "=>", len(subjects)

*B*count => 607
range => 57179 57971
57474 Three decades of Python!
...
57477 More Python books coming...
authors => 257
subjects => 200*b*

1.17.2. 下载消息

下载消息是很简单的, 只需要调用 article方法, 如 Example 7-33 所示.

1.17.2.1. Example 7-33. 使用 nntplib 模块下载消息

File: nntplib-example-2.py

import nntplib
import string

SERVER = "news.spam.egg"
GROUP  = "comp.lang.python" 
KEYWORD = "tkinter"

# connect to server
server = nntplib.NNTP(SERVER)

resp, count, first, last, name = server.group(GROUP)
resp, items = server.xover(first, last)
for id, subject, author, date, message_id, references, size, lines in items:
    if string.find(string.lower(subject), KEYWORD) >= 0:
        resp, id, message_id, text = server.article(id)
        print author
        print subject
        print len(text), "lines in article"

*B*"Fredrik Lundh" <[email protected]>
Re: Programming Tkinter (In Python)
110 lines in article
...*b*

Example 7-34 展示了如何进一步处理这些消息, 你可以把它封装到一个 Message 对象中(使用 rfc822 模块).

1.17.2.2. Example 7-34. 使用 nntplib 和 rfc822 模块处理消息

File: nntplib-example-3.py

import nntplib
import string, random
import StringIO, rfc822

SERVER = "news.spam.egg"
GROUP  = "comp.lang.python"

# connect to server
server = nntplib.NNTP(SERVER)

resp, count, first, last, name = server.group(GROUP)
for i in range(10):
    try:
        id = random.randint(int(first), int(last))
        resp, id, message_id, text = server.article(str(id))
    except (nntplib.error_temp, nntplib.error_perm):
        pass # no such message (maybe it was deleted?)
    else:
        break # found a message!
else:
    raise SystemExit

text = string.join(text, "\n")
file = StringIO.StringIO(text)

message = rfc822.Message(file)

for k, v in message.items():
    print k, "=", v

print message.fp.read()

*B*mime-version = 1.0
content-type = text/plain; charset="iso-8859-1"
message-id = <[email protected]>
lines = 22
...
from = "Fredrik Lundh" <[email protected]>
nntp-posting-host = parrot.python.org
subject = ANN: (the eff-bot guide to) The Standard Python Library
...
</F>*b*

到这一步后, 你可以使用 htmllib , uu , 以及 base64 继续处理这些消息.

1.18. SocketServer 模块

SocketServer 为各种基于 socket 的服务器提供了一个框架. 该模块提供了大量的类, 你可以用它们来创建不同的服务器.

Example 7-35 使用该模块实现了一个 Internet 时间协议服务器. 你可以用前边的 timeclient 脚本连接它.

1.18.0.1. Example 7-35. 使用 SocketServer 模块

File: socketserver-example-1.py

import SocketServer
import time

# user-accessible port
PORT = 8037

# reference time
TIME1970 = 2208988800L

class TimeRequestHandler(SocketServer.StreamRequestHandler):
    def handle(self):
        print "connection from", self.client_address
        t = int(time.time()) + TIME1970
        b = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
        self.wfile.write(b)

server = SocketServer.TCPServer(("", PORT), TimeRequestHandler)
print "listening on port", PORT
server.serve_forever()

*B*connection from ('127.0.0.1', 1488)
connection from ('127.0.0.1', 1489)
...*b*

1.19. BaseHTTPServer 模块

这是一个建立在 SocketServer 框架上的基本框架, 用于 HTTP 服务器.

Example 7-36 在每次重新载入页面时会生成一条随机信息. path 变量包含当前 URL , 你可以使用它为不同的 URL 生成不同的内容 (访问除根目录的其他任何 path 该脚本都会返回一个错误页面).

1.19.0.1. Example 7-36. 使用 BaseHTTPServer 模块

File: basehttpserver-example-1.py

import BaseHTTPServer
import cgi, random, sys

MESSAGES = [
    "That's as maybe, it's still a frog.",
    "Albatross! Albatross! Albatross!",
    "It's Wolfgang Amadeus Mozart.",
    "A pink form from Reading.",
    "Hello people, and welcome to 'It's a Tree.'"
    "I simply stare at the brick and it goes to sleep.",
]

class Handler(BaseHTTPServer.BaseHTTPRequestHandler):

    def do_GET(self):
        if self.path != "/":
            self.send_error(404, "File not found")
            return
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        try:
            # redirect stdout to client
            stdout = sys.stdout
            sys.stdout = self.wfile
            self.makepage()
        finally:
            sys.stdout = stdout # restore
        
    def makepage(self):
        # generate a random message
        tagline = random.choice(MESSAGES)
        print "<html>"
        print "<body>"
        print "<p>Today's quote: "
        print "<i>%s</i>" % cgi.escape(tagline)
        print "</body>"
        print "</html>"

PORT = 8000

httpd = BaseHTTPServer.HTTPServer(("", PORT), Handler)
print "serving at port", PORT
httpd.serve_forever()

更有扩展性的 HTTP 框架请参阅 SimpleHTTPServer 和 CGIHTTPServer 模块.

1.20. SimpleHTTPServer 模块

SimpleHTTPServer 模块是一个简单的 HTTP 服务器, 它提供了标准的 GET 和 HEAD 请求处理器. 客户端请求的路径名称会被翻译为一个相对文件名 (相对于服务器启动时的当前路径). Example 7-37 展示了该模块的使用.

1.20.0.1. Example 7-37. 使用 SimpleHTTPServer 模块

File: simplehttpserver-example-1.py

import SimpleHTTPServer
import SocketServer

# minimal web server.  serves files relative to the
# current directory.

PORT = 8000

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer(("", PORT), Handler)

print "serving at port", PORT
httpd.serve_forever()

*B*serving at port 8000
localhost - - [11/Oct/1999 15:07:44] code 403, message Directory listing not sup
ported
localhost - - [11/Oct/1999 15:07:44] "GET / HTTP/1.1" 403 -
localhost - - [11/Oct/1999 15:07:56] "GET /samples/sample.htm HTTP/1.1" 200 -
*b*

这个服务器会忽略驱动器符号和相对路径名(例如 ..). 但它并没有任何访问验证处理, 所以请小心使用.

Example 7-38 实现了个迷你的 web 代理. 发送给代理的 HTTP 请求必须包含目标服务器的完整 URI . 代理服务器使用 urllib 来获取目标服务器的数据.

1.20.0.2. Example 7-38. 使用 SimpleHTTPServer 模块实现代理

File: simplehttpserver-example-2.py

# a truly minimal HTTP proxy

import SocketServer
import SimpleHTTPServer
import urllib

PORT = 1234

class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.copyfile(urllib.urlopen(self.path), self.wfile)

httpd = SocketServer.ForkingTCPServer(('', PORT), Proxy)
print "serving at port", PORT
httpd.serve_forever()

1.21. CGIHTTPServer 模块

CGIHTTPServer 模块是一个可以通过公共网关接口( common gateway interface

, CGI )调用外部脚本的 HTTP 服务器. 如 Example 7-39 所示.

1.21.0.1. Example 7-39. 使用 CGIHTTPServer 模块

File: cgihttpserver-example-1.py

import CGIHTTPServer
import BaseHTTPServer

class Handler(CGIHTTPServer.CGIHTTPRequestHandler):
    cgi_directories = ["/cgi"]

PORT = 8000

httpd = BaseHTTPServer.HTTPServer(("", PORT), Handler)
print "serving at port", PORT
httpd.serve_forever()

1.22. cgi 模块

cgi 模块为 CGI 脚本提供了函数和类支持. 它还可以处理 CGI 表单数据.

Example 7-40 展示了一个简单的 CGI 脚本, 它返回给定目录下的文件列表 (相对于脚本中指定的根目录)

1.22.0.1. Example 7-40. 使用 cgi 模块

File: cgi-example-1.py

import cgi
import os, urllib

ROOT = "samples"

# header
print "text/html"
print

query = os.environ.get("QUERY_STRING")
if not query:
    query = "."

script = os.environ.get("SCRIPT_NAME", "")
if not script:
    script = "cgi-example-1.py"

print "<html>"
print "<head>"
print "<title>file listing</title>"
print "</head>"
print "</html>"

print "<body>"

try:
    files = os.listdir(os.path.join(ROOT, query))
except os.error:
    files = []

for file in files:
    link = cgi.escape(file)
    if os.path.isdir(os.path.join(ROOT, query, file)):
        href = script + "?" + os.path.join(query, file)
        print "<p><a href= '%s'>%s</a>" % (href, cgi.escape(link))
    else:
        print "<p>%s" % link

print "</body>"
print "</html>"

*B*text/html

<html>
<head>
<title>file listing</title>
</head>
</html>
<body>
<p>sample.gif
<p>sample.gz
<p>sample.netrc
...
<p>sample.txt
<p>sample.xml
<p>sample~
<p><a href='cgi-example-1.py?web'>web</a>
</body>
</html>*b*

1.23. webbrowser 模块

(2.0 中新增) webbrowser 模块提供了一个到系统标准 web 浏览器的接口. 它提供了一个 open 函数, 接受文件名或 URL 作为参数, 然后在浏览器中打开它. 如果你又一次调用 open 函数, 那么它会尝试在相同的窗口打开新页面. 如 Example 7-41 所示.

1.23.0.1. Example 7-41. 使用 webbrowser 模块

File: webbrowser-example-1.py

import webbrowser
import time

webbrowser.open("http://www.pythonware.com")

# wait a while, and then go to another page
time.sleep(5)
webbrowser.open(
    "http://www.pythonware.com/people/fredrik/librarybook.htm"
    )

在 Unix 下, 该模块支持 lynx , Netscape , Mosaic , Konquerer , 和 Grail . 在 Windows 和 Macintosh 下, 它会调用标准浏览器 (在注册表或是 Internet 选项面板中定义).

PythonStandardLib/chpt7