文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 0.706 [2004-08-18 17:18:27]

1. 用一个字符集过滤一个字符串

"Credit: Jürgen Hermann, Nick Perkins"

1.1. 问题 Problem

Given a set of characters to keep, you need to build a filtering functor (a function-like, callable object). The specific functor you need to build is one that, applied to any string s, returns a copy of s that contains only characters in the set.

给定一个需要保留的字符集,你需要构造一个过滤器(一个象函数一样的,可调用对象).该过滤器,可应用于任意字符串 s,它返回 s 的一个拷贝,但只含有指定字符集中包含的字符.

1.2. 解决 Solution

The string.maketrans function and translate method of string objects are fast and handy for all tasks of this ilk:

string.maketrans 函数以及字符串对象的 translate 方法,是这类工作中最快速和便利的.

   1 import string
   2 
   3 # Make a reusable string of all characters
   4 _allchars = string.maketrans('', '')
   5 
   6 def makefilter(keep):
   7     """ Return a functor that takes a string and returns a partial copy of that
   8         string consisting of only the characters in 'keep'.
   9     """
  10     # Make a string of all characters that are not in 'keep'
  11     delchars = _allchars.translate(_allchars, keep)
  12 
  13     # Return the functor, binding the two strings as default args
  14     return lambda s, a=_allchars, d=delchars: s.translate(a, d)
  15 
  16 def canonicform(keep):
  17     """ Given a string, considered as a set of characters, return the
  18         string's characters as a canonic-form string: alphabetized
  19         and without duplicates.
  20     """
  21     return makefilter(keep)(_allchars)
  22 
  23 if _ _name_ _ == '_ _main_ _':
  24     identifier = makefilter(string.letters + string.digits + '_')
  25     print identifier(_allchars)

1.3. 讨论 Discussion

The key to understanding this recipe lies in the definitions of the translate and maketrans functions in the string module. translate takes a string and replaces each character in it with the corresponding character in the translation table passed in as the second argument, deleting the characters specified in the third argument. maketrans is a utility routine that helps create the translation tables.

理解这一配方的关键,在于string模块中 translate 和 maketrans 函数的定义。translate接收一个字符串,并根据第二个参数中传入的转换表,将字符串中的每一个字符替换为相对应的字符,同时删除掉第三个参数中指定的字符(译者注: 当translate作为字符串对象的方法调用时,转换表为第一个参数,第二个参数中指定要删除的字符)。maketrans 是一个用来帮助创建转换表的例程。

Efficiency is vastly improved by splitting the filtering task into preparation and execution phases. The string of all characters is clearly reusable, so we build it once and for all when this module is imported. That way, we ensure that each filtering functor has a reference to the same string of all characters, not wasting any memory. The string of characters to delete depends on the set of characters to keep, so we build it in the makefilter factory function. This is done quite rapidly using the translate method to delete the characters to keep from the string of all characters. The translate method is very fast, as are the construction and execution of these useful little functors. The solution also supplies an extremely simple function to put any set of characters, originally an arbitrary string, into canonic-string form (alphabetically sorted, without duplicates). The same trick encapsulated in the canonicform function is also explicitly used in the test code that is executed when this runs as a script.

把过滤工作分成准备和执行两个阶段可以极大地提高效率.由所有字符组成的字符串显然是可重用的,所以我们在该模块被导入时首先创建它一次。那样,我们就能确保每一个过滤器都有一个由所有字符组成的同一个字符串的引用,不会浪费内存。由需要删除的字符组成的字符串依赖于需要保留的字符,所以我们在 makefilter 工厂函数中创建它,这可以相当快速地通过使用 translate 方法,从由所有字符组成的字符串中删掉要保留的字符来完成。在用来创建和执行这些小的器件时,translate 方法的运行速度很快。该解决方案还提供了一个极其简单的函数,它可以把任何字符串,一个原本很随意的字符串,转换为规规矩矩的字符串(按字典排序且无重复)。canonicform 函数中包含的技巧,同样出现在测试代码中,它们将在本模块作为脚本运行时执行。

Of course, you don't have to use lambda (here or anywhere else). A named function local to the factory function will do just as well. In other words, this recipe works fine if you change makefilter's return statement into the following two statements:

当然,你不必非得使用 lambda (在这里或其它任何地方)。一个在工厂函数中的局部命名函数也能工作的同样好,换句话说,如果你把 makefilter 中 的 return 语句换成下面两条语句,这个配方也能很好的工作:

   1 def filter(s, a=_allchars, d=delchars): return s.translate(a, d)
   2 return filter

Many Pythonistas would consider this clearer and more readable.

This isn't a big issue, but remember that lambda is never necessary. In any case in which you find yourself straining to fit code into a lambda's limitations (i.e., just an expression, with no statements allowed), you can and should always use a local named function instead, to avoid all the limitations and problems.

With Python 2.2, or Python 2.1 and a from _ _future_ _ import nested_scopes, you get lexically nested scopes, so that if you want to, you can avoid binding _allchars and delchars as default values for arguments in the returned functor. However, it is (marginally) faster to use this binding anyway: local variables are the fastest kind to access, and arguments are nothing but prebound local variables. Globals and names from nested scopes require a little more effort from the interpreter (and sometimes, perhaps more significantly, from a human being who is reading the code). This is why we bind _allchars as argument a here despite the fact that, in any release of Python, we could have just accessed it as a global variable.

1.4. 参考 See Also

Documentation for the maketrans function in the string module in the Library Reference.

last edited 2004-08-20 23:35:13 by 0.706