使用PyWSGI混合WEB组件 ::-- ZoomQuiet [2006-08-24 04:49:56]

CPUG联盟::

CPUG::门户plone

BPUG

SPUG

ZPUG

SpreadPython Python宣传

1. Mix and match Web components with Python WSGI

Learn about the Python standard for building Web applications with maximum flexibility

Uche Ogbuji ([email protected]), Principal Consultant, Fourthougth, Inc.

22 Aug 2006

The main reason for the success of the Web is its flexibility. You find almost as many ways to design, develop, and deploy Web sites and applications as there are developers. With a huge wealth of choices, a Web developer often chooses a unique combination of Web design tools, page style, content language, Web server, middleware, and DBMS technology, using different implementation languages and accessory toolkits. To make all of these elements work together to offer maximum flexibility, Web functionality should be provided through components as much as possible. These components should perform a limited number of focused tasks competently and work well with each other. This is easy to say, but in practice it's very difficult to achieve because of the many different approaches to Web technology.

The best hope to keep your sanity is the growth of standards for Web component interoperability. Some of these important standards are already developed, and the most successful Web development platforms have them as their backbone. Prominent examples include the Java servlet API and the Ruby on Rails framework. Some languages long popular for Web programming are only recently being given the same level of componentization and have learned from the experience of preceding Web framework component standards. One example is the Zend Framework for PHP (see Resources). Another is Web Server Gateway Interface (WSGI) for Python.

Many people have complained the popular Python programming language has too many Web frameworks, from well-known entrants such as Zope to under-the-radar frameworks such as SkunkWeb. Some have argued this diversity can be a good thing, as long as there is some underpinning standardization. Python and Web expert Phillip J. Eby went about the task of such standardization. He authored Python Enhancement Proposal (PEP) 333, which defines WSGI.

The goal of WSGI is to allow for greater interoperability between Python frameworks. WSGI's success brings about an ecosystem of plug-in components you can use with your favorite frameworks to gain maximum flexibility. In this article, I'll introduce WSGI, and focus on its use as a reusable Web component architecture. In all discussions and sample code, I'll assume that you're using Python 2.4 or a more recent version.

1.1. The basic architecture of WSGI

WSGI was developed under fairly strict constraints, but most important was the need for a reasonable amount of backward compatability with the Web frameworks preceding it. This constraint means WSGI unfortunately isn't as neat and transparent as Python developers are used to. Usually the only developers who have to deal directly with WSGI are those who build frameworks and reusable components. Most regular Web developers will pick a framework for its ease of use and be insulated from WSGI details.

If you want to develop reusable Web components, you have to understand WSGI, and the first thing you need to understand about it is how Web applications are structured in the WSGI world view. Figure 1 illustrates this structure.

Figure 1. Illustration of how HTTP request-response passes through the WSGI stack

http://www-128.ibm.com/developerworks/library/wa-wsgi/figure1.gif

The WSGI stack

The Web server, also called the gateway, is very low-level code for basic communication with the request client (usually the user's browser). The application layer handles the higher-level details that interpret requests from the user and prepare response content. The application interface to WSGI itself is usually just the more basic layer of an even higher level of application framework providing friendly facilities for common Web patterns such as Ajax techniques or content template systems. Above the server or gateway layer lies WSGI middleware. This important layer comprises components that can be shared across server and application implementations. Common Web features such as user sessions, error handling, and authentication can be implemented as WSGI middleware.

1.2. Code in the middle

WSGI middleware is the most natural layer for reusable components. WSGI middleware looks like an application to the lower layers, and like a server to the higher layers. It watches the state of requests, responses, and the WSGI environment in order to add some particular features. Unfortunately, the WSGI specification offers a very poor middleware example, and many of the other examples you can find are too simplistic to give you a feel for how to quickly write your own middleware. I'll give you a feel for the process WSGI middleware undertakes with the following broad outline. It ignores matters that most WSGI middleware authors won't need to worry about. In Python, where I use the word function, I mean any callable object.

1.3. A bold step toward XHTML

Many component technologies are rather complex, so the best examples for instruction are simple throwaway toys. This isn't the case with WSGI, and, in fact, I'll present a very practical example. Many developers prefer to serve XHTML Web pages because XML technologies are easier to manage than "tag soup" HTML, and emerging Web trends favor sites that are easier for automatons to read. The problem is that not all Web browsers support XHTML properly. Listing 1 (safexhtml.py) is a WSGI middleware module that checks incoming requests to see if the browser supports XHTML and, if not, translates any XHTML responses to plain HTML. You can use such a module so all of your main application code produces XHTML and the middleware takes care of any needed translation to HTML. Review Listing 1 carefully and try to combine it with the general outline of WSGI middleware execution from the previous section. I've provided enough comments so you can identify the different stages in the code.

Listing 1 (safexhtml.py). WSGI middleware to translate XHTML to HTML for browsers unable to handle it

import cStringIO
from xml import sax
from Ft.Xml import CreateInputSource
from Ft.Xml.Sax import SaxPrinter
from Ft.Xml.Lib.HtmlPrinter import HtmlPrinter

XHTML_IMT = "application/xhtml+xml"
HTML_CONTENT_TYPE = 'text/html; charset=UTF-8'

class safexhtml(object):
    """
    Middleware that checks for XHTML capability in the client and translates
    XHTML to HTML if the client can't handle it
    """
    def __init__(self, app):
        #Set-up phase
        self.wrapped_app = app
        return

    def __call__(self, environ, start_response):
        #Handling a client request phase.
        #Called for each client request routed through this middleware

        #Does the client specifically say it supports XHTML?
        #Note saying it accepts */* or application/* will not be enough
        xhtml_ok = XHTML_IMT in environ.get('HTTP_ACCEPT', '')

        #Specialized start_response function for this middleware
        def start_response_wrapper(status, response_headers, exc_info=None):
            #Assume response is not XHTML; do not activate transformation
            environ['safexhtml.active'] = False
            #Check for response content type to see whether it is XHTML
            #That needs to be transformed
            for name, value in response_headers:
                #content-type value is a media type, defined as
                #media-type = type "/" subtype *( ";" parameter )
                if ( name.lower() == 'content-type'
                     and value.split(';')[0] == XHTML_IMT ):
                    #Strip content-length if present (needs to be
                    #recalculated by server)
                    #Also strip content-type, which will be replaced below
                    response_headers = [ (name, value)
                        for name, value in response_headers
                            if ( name.lower()
                                 not in ['content-length', 'content-type'])
                    ]
                    #Put in the updated content type
                    response_headers.append(('content-type', HTML_CONTENT_TYPE))
                    #Response is XHTML, so activate transformation
                    environ['safexhtml.active'] = True
                    break

            #We ignore the return value from start_response
            start_response(status, response_headers, exc_info)
            #Replace any write() callable with a dummy that gives an error
            #The idea is to refuse support for apps that use write()
            def dummy_write(data):
                raise RuntimeError('safexhtml does not support the deprecated 
                                       write() callable in WSGI clients')
            return dummy_write

        if xhtml_ok:
            #The client can handle XHTML, so nothing for this middleware to do
            #Notice that the original start_response function is passed
            #On, not this middleware's start_response_wrapper
            for data in self.wrapped_app(environ, start_response):
                yield data
        else:
            response_blocks = []  #Gather output strings for concatenation
            for data in self.wrapped_app(environ, start_response_wrapper):
                if environ['safexhtml.active']:
                    response_blocks.append(data)
                    yield '' #Obey buffering rules for WSGI
                else:
                    yield data

            if environ['safexhtml.active']:
                #Need to convert response from XHTML to HTML 
                xhtmlstr = ''.join(response_blocks) #First concatenate response

                #Now use 4Suite to transform XHTML to HTML
                htmlstr = cStringIO.StringIO()  #Will hold the HTML result
                parser = sax.make_parser(['Ft.Xml.Sax'])
                handler = SaxPrinter(HtmlPrinter(htmlstr, 'UTF-8'))
                parser.setContentHandler(handler)
                #Don't load the XHTML DTDs from the Internet
                parser.setFeature(sax.handler.feature_external_pes, False)
                parser.parse(CreateInputSource(xhtmlstr))
                yield htmlstr.getvalue()
                return

The class safexhtml is the full middleware implementation. Each instance is a callable object because the class defines the special call method. You pass an instance of the class to the server, passing the application you are wrapping to the initializer init. The wrapped application might also be another middleware instance if you are chaining safexhtml to other middleware. When the middleware is invoked as a result of a request to the server, the class first checks the Accept headers sent by the client to see whether it includes the official XHTML media type. If so (the xhtml_ok flag), it's safe to send XHTML and the middleware doesn't do anything meaningful for that request.

When the client can't handle XHTML, the class defines the specialized nested function start_response_wrapper whose job it is to check the response headers from the application to see whether the response is XHTML. If so, the response needs to be translated to plain HTML, a fact flagged as safexhtml.active in the environment. One reason to use the environment for this flag is because it takes care of scoping issues in communicating the flag back to the rest of the middleware code. Remember that start_response_wrapper is called asynchronously at a time the application chooses, and it can be tricky to manage the needed state in the middleware.

Another reason to use the environment is to communicate down the WSGI stack the content has been modified. If the response body needs to be translated, not only does the start_response_wrapper set the safexhtml.active, but it also changes the response media type to text/html and removes any Content-Length header because the translation will almost certainly change the length of the response body, and it will have to be recalculated downstream, probably by the server.

Once the application starts sending the response body, if translation is needed, it gathers the data into the response_blocks list. The application might send the response in chunks, but, for simplicity of the code, it chooses to run the translation mechanism only against a complete XHTML input. WSGI rules, however, stipulate the middleware must pass on something to the server every time the application yields a block. It's okay to pass on an empty string and that's what it does. Once the application is finished, it stitches together the response body and runs it through the translation code, and then yields the entire output in one last string.

Listing 2 (wsgireftest.py) is server code to test the middleware. It uses wsgiref, which includes a very simple WSGI server. The module will be included in the Python 2.5 standard library.

Listing 2 (wsgireftest.py). Server code for testing Listing 1

import sys
from wsgiref.simple_server import make_server
from safexhtml import safexhtml

XHTML = open('test.xhtml').read()
XHTML_IMT = "application/xhtml+xml"
HTML_IMT = "text/html"
PORT = 8000


def app(environ, start_response):
    print "using IMT", app.current_imt
    start_response('200 OK', [('Content-Type', app.current_imt)])
    #Swap the IMT used for response (alternate between XHTML_IMT and HTML_IMT)
    app.current_imt, app.other_imt = app.other_imt, app.current_imt
    return [XHTML]

app.current_imt=XHTML_IMT
app.other_imt=HTML_IMT

httpd = make_server('', PORT, safexhtml(app))
print 'Starting up HTTP server on port %i...'%PORT

# Respond to requests until process is killed
httpd.serve_forever()

Listing 2 reads a simple XHTML file, given in Listing 3 (test.xhtml), and serves it up with alternating media types. It uses the standard XHTML media type for the first request, the HTML media type for the second, back to XHTML for the third, and so on. This exercises the middleware's capability to leave a response alone if it isn't flagged as XHTML.

Listing 3 (test.xhtml). Simple XHTML file used by the sample server in Listing 2

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
  </body>
</html>

You should be able to see the effect of this middleware if you run Listing 2 and view it in an XHTML-aware browser like Firefox and then an XHTML-challenged browser like Microsoft Internet Explorer. Make the request twice in a row for each browser to see the effect of the response media type on the operation of the middleware. Use View Source to see the resulting response body and the Page Info feature to see the reported response media type. You can also test the example using the command-line HTTP tool cURL: curl -H 'Accept: application/xhtml+xml,text/html' http://localhost:8000/ to simulate an XHTML-savvy browser, and curl -H 'Accept: text/html' http://localhost:8000/ to simulate the opposite case. If you want to see the response headers, use the -D <filename> and inspect the given file name after each cURL invocation.

1.4. Wrap-up

You've now learned about Python's WSGI and how to use it to implement a middleware service that you can plug into any WSGI server and application chain. You could easily chain this article's example middleware with middleware for caching or debugging. These all become components that let you quickly add well-tested features into your project regardless of what WSGI implementations you choose.

WSGI is a fairly young specification, but compatible servers, middleware, and utilities are emerging rapidly to completely revamp the Python Web frameworks landscape. The next time you have a major Web project to develop in Python, be sure to adopt WSGI by using existing WSGI components, and perhaps creating your own either for private use or for contribution back to your fellow Web developers.

WSGI/MixMatchWebComponentsWithPyWSGI (last edited 2009-12-25 07:10:06 by localhost)