异步编程 (Asynchronous Programming) -- dreamingk [2004-08-09 01:50:15]

介绍 (Introduction)

要编写网络程序,有很多不同的方式。主流的有:

  1. 在不同的进程中处理每个连接;
  2. 在不同的线程中处理每个连接(脚注1);
  3. 在一个线程中使用非阻塞系统调用来处理所有连接。

There are many ways to write network programs. The main ones are:

  1. Handle each connection in a separate process
  2. Handle each connection in a separate thread (Footnode 1)
  3. Use non-blocking system calls to handle all connections in one thread.

当在一个线程里处理多个连接时,调度不是由操作系统完成,它成为了应用程序的职责。当每个连接准备好读或写时,通常是通过调用一个注册函数来实现的──就是大家通常说的异步、事件驱动或基于回调的编程。

When dealing with many connections in one thread, the scheduling is the responsibility of the application, not the operating system, and is usually implemented by calling a registered function when each connection is ready to for reading or writing -- commonly known as asynchronous, event-driven or callback-based programming.

即使有很高的抽象,编写多线程的程序也需要很多技巧,Python 的“全局解释器锁”(Global Interpreter Lock)限制了潜在的性能提高。创建(fork)Python 子进程的方法也有许多缺陷,比如:Python 的引用计数在“写时拷贝”(copy-on-write)的方式下不能很好的工作,在处理共享状态方面也有问题。因此,最终事件驱动框架是最好的选择。不过,这样做的一个好处是可以让其他事件驱动型的框架(framework)来接管主循环。这样,服务器和客户端代码本质上是相同的──这在实际上形成了一个P2P(peer-to-peer)的形式。

Multi-threaded programming is tricky, even with high level abstractions, and Python's Global Interpreter Lock limits the potential performance gain. Forking Python processes also has many disadvantages, such as Python's reference counting not playing well with copy-on-write and problems with shared state. Consequently, it was felt the best option was an event-driven framework. A benefit of such an approach is that by letting other event-driven frameworks take over the main loop, server and client code are essentially the same -- making peer-to-peer a reality.

另一方面,事件驱动型的编程方法也包含需要技巧的方面。因为每次回调都必须尽快的结束,这使得它不可能在一个函数级别(function-local)的变量中保存持久的状态。此外,象递归之类的一些编程的技术不可能使用──例如,递归消解了协议处理程序成为循环下降的语法分析程序的可能。由于经常要编写各种状态机,事件驱动编程被公认为使用起来很费劲。但 Twisted 是基于这样的想法建立的:只要使用正确的库,事件驱动方法比多进程方法更简单。

However, event-driven programming still contains some tricky aspects. As each callback must be finished as soon as possible, it is not possible to keep persistent state in function-local variables. In addition, some programming techniques, such as recursion, are impossible to use -- for example, this rules out protocol handlers being recursive-descent parsers. Event-driven programming has a reputation of being hard to use due to the frequent need to write state machines. Twisted was built with the assumption that with the right library, event-driven programming is easier than multi-threaded programming.

注意:如果你需要,Twisted 也允许使用线程──这通常是用来与同步化的、继承来的代码进行接口的。详细信息参见“如何使用线程”。

Note that Twisted still allows the use of threads if you really need them, usually to interface with synchronous legacy code. See Using Threads for details.

异步设计的观点 (Async Design Issues)

在 Python 里,源代码经常被拆分到一个一般类里,它调用那些由子类实现的可重载方法。在它和与它类似的案例里,考虑合适的实现(implementation)是很重要的。如果这个实现完成一个动作(action)有可能需要很长的时间(不管是由于网络还是CPU的原因),那就应该把该方法(method)设计成异步的。通常,这意味着要把它改成基于回调的方法。在 Twisted 里,通常是让它返回一个“Deferred”。

In Python, code is often divided into a generic class calling overridable methods which subclasses implement. In that, and similar, cases, it is important to think about likely implementations. If it is conceivable that an implementation might perform an action which takes a long time (either because of network or CPU issues), then one should design that method to be asynchronous. In general, this means to transform the method to be callback based. In Twisted, it usually means returning a Deferred.

既然因为每个方法都要尽快返回,非易变(non-volatile)状态不能保存在局部变量里,所以它通常被保存到实例(instance)一级的变量中。在可能会有递归发生的案例里,这些状态通常必须保存在栈(stack)结构里──由 Python 里列表类型(list)实现,通过 append 和 pop 方法、手工控制访问。因为那些状态机频繁的转换(get non-trivial),最好是把它们分成不同层次,每个状态机只做一件事情──将事件从一个抽象层转化为下一个更高的抽象层。这样,代码更清晰,也更容易调试。

Since non-volatile state cannot be kept in local variables, because each method must return quickly, it is usually kept in instance variables. In cases where recursion would have been tempting, it is usually necessary to keep stacks manually, using Python's list and the .append and .pop method. Because those state machines frequently get non-trivial, it is better to layer them such that each one state machine does one thing -- converting events from one level of abstraction to the next higher level of abstraction. This allows the code to be clearer, as well as easier to debug.

使用反映(reflection) (Using Reflection)

使用回调类型的编程方法的一个重要后果就是:需要给那些小片的代码命名。虽然看起来这是只是个小问题,只要正确的使用就可以看到它的好处。如果使用严格、一致的命名,许多分析程序中象 if/else 形式或者很长的 case 语句这样的代码就可以避免了。例如,SMTP 客户端代码有一个实例级别的变量,用来标明它正在做的动作。当从服务器接收到一个响应,它就可以直接呼叫 "do_%s_%s" % (self.state, responseCode) 方法。这样,完全避免了注册回调函数或者一个很大的 if/else 判断链。另外,子类也可以很容易的重载它或者改变接收到一些响应时的动作,却不需要新增厚重的代码。这个 SMTP 客户端实现在 twisted/protocols/smtp.py 文件中可以找到。

One consequence of using the callback style of programming is the need to name small chunks of code. While this may seem like a trivial issue, used correctly it can prove to be an advantage. If strictly consistent naming is used, then much of the common code in parsers of the form of if/else rules or long cases can be avoided. For example, the SMTP client code has an instance variable which signifies what it is trying to do. When receiving a response from the server, it just calls the method "do_%s_%s" % (self.state, responseCode). This eliminates the requirement for registering the callback or adding to large if/else chains. In addition, subclasses can easily override or change the actions when receiving some responses, with no additional harness code. The SMTP client implementation can be found in twisted/protocols/smtp.py.

脚注 (Footnotes)

  1. 这种方法有一些其他的变化,比如:用一个有限尺寸的线程池来为所有连接提供服务──这其实是这种方法的一种优化。
  2. There are variations on this method, such as a limited-size pool of threads servicing all connections, which are essentially just optimizations of the same idea.

<< 返回PyTwisted/LowLevelNetworkingEventLoop

PyTwisted/LowLevelNetworkingEventLoop/LlNEL1 (last edited 2009-12-25 07:16:41 by localhost)