文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- dreamingk [2004-08-12 03:50:20]

1. 12.1 Introduction

12.1 介绍

Credit: Paul Prescod, co-author of XML Handbook (Prentice Hall)

感谢:Paul Prescod,XML手册的合著者(Prentice Hall)

XML has become a central technology for all kinds of information exchange. Today, most new file formats that are invented are based on XML. Most new protocols are based upon XML. It simply isn't possible to work with the emerging Internet infrastructure without supporting XML. Luckily, Python has had XML support since Version 2.0.

XML已经成为各种信息交换的核心技术。今天,大多数新发明的文件格式都是基于XML的,许多新的协议也是基于XML的。如果没有对XML的支持,想要使用新兴的互联网基本设施简直是不可能的。

Python and XML are perfect complements for each other. XML is an open-standards way of exchanging information. Python is an open source language that processes the information. Python is strong at text processing and at handling complicated data structures. XML is text-based and is a way of exchanging complicated data structures.

Python和XML互相都是极好的互补。XML是标准开放的信息交换方式,Python则是开源的处理信息的计算机语言;Python有很强的文本处理功能,能够处理复杂的数据结构,而XML则是基于文本的,交换复杂数据结构的方式。

That said, working with XML is not so seamless that it takes no effort. There is always somewhat of a mismatch between the needs of a particular programming language and a language-independent information representation. So there is often a requirement to write code that reads (deserializes or parses) and writes (serializes) XML.

这就是说,如果不做什么努力,即便使用了XML,也不会显得自然和无缝。对一个特别的计算机语言的需求和对一个语言无关的信息表现的需求往往很难协调。因此,总是需要写代码来读(反序列化或解析)和写(序列化)XML。

Parsing XML can be done with code written purely in Python or with a module that is a C/Python mix. Python comes with the fast Expat parser written in C. This is what most XML applications use. Recipe 12.7 shows how to use Expat directly with its native API.

对XML的解析能够使用纯的Python代码或使用一个C/Python混合模块来完成。Python带着一个使用C写的快速的Expat解析器,能够在大多数的XML程序中使用。处方12.7演示了如何使用直接的使用Expat本地API。

Although Expat is ubiquitous in the XML world, it is not the only parser available. There is an API called SAX that allows any XML parser to be plugged into a Python program, as anydbm allows any database to be plugged in. This API is demonstrated in recipes that check that an XML document is well-formed, extract text from a document, count the tags in a document, and do some minor tweaking of an XML document. These recipes should give you a good understanding of how SAX works.

尽管Expat在XML世界使用普遍存在的,但它不是唯一可使用的解析器。被称为SAX的API允许在一个XML解析器植入到一个Python程序中,如同anydbm植入一个数据库一样。在处方中将演示如何使用这个API来检查一个XML文档是良好格式的,从文档中提取文本,对文档中的标记进行计数,以及对一个XML文档作一些小的调整。这些处方可以能够让你很好的理解SAX是如何工作的。

Recipe 12.13 shows the generation of XML from lists. Those of you new to XML (and some with more experience) will think that the technique used is a little primitive. It just builds up strings using standard Python mechanisms instead of using a special XML-generation API. This is nothing to be ashamed of, however. For the vast majority of XML applications, no more sophisticated technique is required. Reading XML is much harder than writing it. Therefore, it makes sense to use specialized software (such as the Expat parser) for reading XML, but nothing special for writing it.

处方12.13演示怎样由列表产生XML。如果你是XML的新手(或是有一些经验的)会认为所使用的技术有些原始。它正是使用标准的Python而不是使用特殊的产生XML的API来组建字符串的。这也没什么要惭愧的。对于大多数的XML应用程序,无需更为成熟的技术。读XML数据比起写XML数据更为的艰难。因此,使用特殊的软件(例如Expat)来读XML数据是很有意义的,但写的话就不用了。

XML-RPC is a protocol built on top of XML for sending data structures from one program to another, typically across the Internet. XML-RPC allows programmers to completely hide the implementation languages of the two communicating components. Two components running on different operating systems, written in different languages, can communicate easily. XML-RPC is built into Python 2.2. This chapter does not deal with XML-RPC because, together with its alternatives (which include SOAP, another distributed-processing protocol that also relies on XML), XML-RPC is covered in Chapter 13.

XML-RPC是一种建立在XML上的从一个程序向另外一个程序传输数据结构的协议,典型是通过Internet。XML-RPC允许程序员彻底在两个通讯的组件间隐藏其所实现的语言。两个组件运行在不同的操作系统上,由不同的语言所编写,能够相互方便的进行通讯。Python2.2内建了XML-RPC。本章并不设计XML-RPC,它将和其它的选择(包括SOAP,另外一个同样基于XML的分布式处理协议),同在第13章中论述。

The other recipes are a little bit more eclectic. For example, one shows how to extract information from an XML document in environments where performance is more important than correctness (e.g., an interactive editor). Another shows how to auto-detect the Unicode encoding that an XML document uses without parsing the document. Unicode is central to the definition of XML, so it helps to understand Python's Unicode objects if you will be doing sophisticated work with XML.

其它处方是一些折衷的方法。例如,有一个是演示在一个比起正确性来性能更为重要的环境中(比如,在一个交互式的编辑器中),如何从XML文档中提取信息。还有一个是演示如何自动的检测一个XML文档所使用的Unicode编码,而不用对文档进行分析。Unicode是XML定义中核心部分,如果你富有XML工作的经验,对于理解Python的Unicode对象很有帮助。

The PyXML extension package has a variety of useful tools for working with XML in more advanced ways. It has a full implementation of the Document Object Model (DOM)梐s opposed to the subset bundled with Python itself梐nd a validating XML parser written entirely in Python. The DOM is an API that loads an entire XML document into memory. This can make XML processing easier for complicated structures in which there are many references from one part of the document to another, or when you need to correlate (e.g., compare) more than one XML document. There is only one really simple recipe that shows how to normalize an XML document with the DOM (Recipe 12.9), but you'll find many other examples in the PyXML package (http://pyxml.sourceforge.net).

PyXML扩展包有大量有用的工具以较为高级的方式来处理XML。它完全的实现了文档对象模型(Document Object Model, DOM),而Python捆绑的XML包只是实现了一个子集,同时它还提供一个完全由Python实现的带检验的XML解析器。DOM是一组API,用来加载整个XML文档到内存中,它使得在处理那些结构复杂、文档中还有对于其他XML的引用的XML文档时较为的容易,或者你需要在多个XML文档间进行关联(比如,进行比较)。这里实际上只有一个简单处方来演示如何使用DOM(处方12.9)规格化一个XML文档,但你可以在PyXML包中发现更多的例子( http://pyxml.sourceforge.net )。

There are also two recipes that focus on XSLT: Recipe 12.5 shows how to drive two different XSLT engines, and Recipe 12.10 shows how to control XSLT stylesheet loading when using the XSLT engine that comes with the FourThought 4Suite package (http://www.4suite.org). This package provides a sophisticated set of open source XML tools above and beyond those provided in core Python or in the PyXML package. In particular, this package has implementations of a variety of standards, such as XPath, XSLT, XLink, XPointer, and RDF. This is an excellent resource for XML power users.

还有两个处方关注于XSLT:处方12.5演示如何驱动不同的XSLT引擎,处方12.10演示在使用FourThought 4Suite包( http://www.4suite.org )中的XSLT引擎时,如何来控制XSLT样式表的加载。这个包提供了一套成熟的开源的XML工具,超出了Python核心或PyXML包中所提供的功能。特别的,这个包实现了大量的XML标准,比如XPath、XSLT、XLink、XPointer和RDF,对XML的高级用户来说是一个极好的资源。

For more information on using Python and XML together, see Python and XML by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly).

关于一起使用Python和XML的更多信息,可阅读《Python and XML》,Christopher A. Jones和Fred L. Drake, Jr.著(O'Reilly)