文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 大熊 [2004-11-02 08:18:45]

1. 描述

12.7 Parsing an XML File with xml.parsers.expat Credit: Mark Nenadov

12.7 使用xml.parsers.expa来处理XML文件 感谢:Mark Nenadov

1.1. 问题 Problem

12.7.1 Problem The expat parser is normally used through the SAX interface, but sometimes you may want to use expat directly to extract the best possible performance.

12.7.1 问题

expat解析器一般可通过SAX接口使用,但有时你可能想要直接使用它,这样有可能获得较好的性能。

1.2. 解决 Solution

12.7.2 Solution Python is very explicit about the lower-level mechanisms that its higher-level modules' packages use. You're normally better off accessing the higher levels, but sometimes, in the last few stages of an optimization quest, or just to gain better understanding of what, exactly, is going on, you may want to access the lower levels directly from your code. For example, here is how you can use Expat directly, rather than through SAX:

12.7.2 解决

Python那些较高层次模块的包所使用的底层机制是很直观的。通常你使用那些高层次的包是很好的选择,但有时为了寻求最优化性能或是为了准确的了解底层是如何运转的,你可能想要在你的程序中直接使用较为底层的代码。例如,这里有一个如何直接使用Expat的例子,而不是通过SAX:

   1 import xml.parsers.expat, sys
   2 
   3 class MyXML:
   4     Parser = ""
   5 
   6     # Prepare for parsing
   7     def _ _init_ _(self, xml_filename):
   8         assert xml_filename != ""
   9         self.xml_filename = xml_filename
  10         self.Parser = xml.parsers.expat.ParserCreate(  )
  11 
  12         self.Parser.CharacterDataHandler = self.handleCharData
  13         self.Parser.StartElementHandler = self.handleStartElement
  14         self.Parser.EndElementHandler = self.handleEndElement
  15 
  16     # Parse the XML file
  17     def parse(self):
  18         try:
  19             xml_file = open(self.xml_filename, "r")
  20         except:
  21             print "ERROR: Can't open XML file %s"%self.xml_filename
  22             raise
  23         else:
  24             try: self.Parser.ParseFile(xml_file)
  25             finally: xml_file.close(  )
  26 
  27     # to be overridden by implementation-specific methods
  28     def handleCharData(self, data): pass
  29     def handleStartElement(self, name, attrs): pass
  30     def handleEndElement(self, name): pass

1.3. 讨论 Discussion

12.7.3 Discussion This recipe presents a reusable way to use xml.parsers.expat directly to parse an XML file. SAX is more standardized and rich in functionality, but expat is also usable, and sometimes it can be even lighter than the already lightweight SAX approach. To reuse the MyXML class, all you need to do is define a new class, inheriting from MyXML. Inside your new class, override the inherited XML handler methods, and you're ready to go.

12.7.3 讨论

这个处方展现了一个可重用的方式,直接使用xml.parsers.expat来处理一个XML文件。SAX更为的标准,功能也相当丰富,但expat也是可以使用的,有时它比已经很轻便的SAX方式更为轻便。要重用MyXML类,你只需要从MyXML继承一个新的类。在你的新类中,重载XML处理方法就可以了。

Specifically, the MyXML class creates a parser object that does callbacks to the callables that are its attributes. The StartElementHandler callable is called at the start of each element, with the tag name and the attributes as arguments. EndElementHandler is called at the end of each element, with the tag name as the only argument. Finally, CharacterDataHandler is called for each text string the parser encounters, with the string as the only argument. The MyXML class uses the handleStartElement, handleEndElement, and handleCharData methods as such callbacks. Therefore, these are the methods you should override when you subclass MyXML to perform whatever application-specific processing you require.

特别的,MyXML类创建了一个解析器对象,对属性做回调处理。StartElementHandler在每个元素的开始处被回调,调用参数为标记名和属性。EndElementHandler在每个元素的结束处被回调,调用参数为标记名。最后,CharacterDataHandler在每次解析器遇到文本是被回调,调用参数为文本字符串。MyXML类使用handleStartElement,handleEndElement和handleCharData方法来作为回调处理,因此这些方法在你的MyXML子类中需要重载,以便能处理程序特定的你所需要的处理。

1.4. 参考 See Also

12.7.4 See Also Recipe 12.2, Recipe 12.3, Recipe 12.4, and Recipe 12.6 for uses of the higher-level SAX API; while Expat was the brainchild of James Clark, Expat 2.0 is a group project, with a home page at http://expat.sourceforge.net/.

12.7.4 参考

处方12.2,处方12.3,处方12.4以及处方12.6演示了高级的SAX API的应用;同时,Expat为James Clark所创造,是一组项目,主页为 http://expat.sourceforge.net/