文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 0.706 [2004-08-15 17:02:47]

1. 检查是否一个字符串中包合某个集合中的字符

Credit: Jürgen Hermann, Horst Hansen

1.1. 问题 Problem

You need to check for the occurrence of any of a set of characters in a string.

你需要检查检查是否一个集合的字符在某个字符串中出现.

1.2. 解决 Solution

The solution generalizes to any sequence (not just a string), and any set (any object in which membership can be tested with the in operator, not just one of characters):

本配方通用于任何序列(不仅仅是字符串),任何集合(任何可以用in 操作符来检查其成员关系的对象,不仅仅是一个字符集).

   1 def containsAny(str, set):
   2     """ Check whether sequence str contains ANY of the items in set. """
   3     return 1 in [c in str for c in set]
   4 
   5 def containsAll(str, set):
   6     """ Check whether sequence str contains ALL of the items in set. """
   7     return 0 not in [c in str for c in set]

1.3. 讨论 Discussion

While the find and count string methods can check for substring occurrences, there is no ready-made function to check for the occurrence in a string of a set of characters.

字符串方法 'find' 和 'count' 方法可以检查子串在字符串中的出现,但是却没有一个现成的函数检查一个字符集在字符串中的出现.

While working on a condition to check whether a string contained the special characters used in the glob.glob standard library function, I came up with the above code (with help from the OpenProjects IRC channel #python). Written this way, it really is compatible with human thinking, even though you might not come up with such code intuitively. That is often the case with list comprehensions.

The following code creates a list of 1/0 values, one for each item in the set:

下行代码产生一个由1/0值组成的列表,每个元素对应于set中的一项.

[c in str for c in set]

Then this code checks whether there is at least one true value in that list:

然后下面的代码检查是否列表中至少有一个为真(1).

1 in [c in str for c in set]

Similarly, this checks that no false values are in the list:

类似的,下面的代码检查是否列表中没有值为假(0).

0 not in [c in str for c in set]

Usage examples are best cast in the form of unit tests to be appended to the .py source file of this module, with the usual idiom to ensure that the tests execute if the module runs as a main script:

最好的用法例子是在该模块的 .py 源文件后以单元测试的形式加上的,这种惯例确保在该模块作为主脚本运行时,执行那些测试.

   1 if _ _name_ _ == "_ _main_ _":
   2     # unit tests, must print "OK!" when run
   3     assert containsAny('*.py', '*?[]')
   4     assert not containsAny('file.txt', '*?[]')
   5     assert containsAll('43221', '123')
   6     assert not containsAll('134', '123')
   7     print "OK!"

Of course, while the previous idioms are neat, there are alternatives (aren't there always?). Here are the most elementary梐nd thus, in a sense, the most Pythonic梐lternatives:

当然,尽管前面的配方非常简洁,但仍然有些其它的选择(总是有其它的选择?).这里是一个基本的,并且感觉最 python 化的选择:

   1 
   2 def containsAny(str, set):
   3     for c in set:
   4         if c in str: return 1
   5     return 0
   6 
   7 def containsAll(str, set):
   8     for c in set:
   9         if c not in str: return 0
  10     return 1

Here are some alternatives that ensure minimal looping (earliest possible return). These are the most concise and thus, in a sense, the most powerful:

这是几个能确保循环最少的(尽可能早返回)办法.它们是最简炼且感觉到是最强大的.

   1 from operator import and_, or_, contains
   2 
   3 def containsAny(str, set):
   4     return reduce(or_, map(contains, len(set)*[str], set))
   5 
   6 def containsAll(str, set):
   7     return reduce(and_, map(contains, len(set)*[str], set))

Here are some even slimmer variants of the latter that rely on a special method that string objects supply only in Python 2.2 and later:

这是上面方法的更苗条的变种,它们依赖Python 2.2及以后版本中字符串的一个特殊方法.

   1 from operator import and_, or_
   2 
   3 def containsAny(str, set):
   4     return reduce(or_, map(str._ _contains_ _, set))
   5 
   6 def containsAll(str, set):
   7     return reduce(and_, map(str._ _contains_ _, set))

And here is a tricky variant that relies on functionality also available in 2.0:

这是一个狡滑的用法,它依赖2.2版中可用的机制.

   1 def containsAll(str, set):
   2     try: map(str.index, set)
   3     except ValueError: return 0
   4     else: return 1

Fortunately, this rather tricky approach lacks an immediately obvious variant applicable to implement containsAny. However, one last tricky scheme, based on string.translate's ability to delete all characters in a set, does apply to both functions:

很幸运的,这个相当狡猾的方法没有一个明显的变种可用来实现containsAny.可是,最后一个狡猾的方法,它依赖于 string.translate 能删除集合中字符的能力,它确实可以用来实现所有两个函数:

   1 
   2 import string
   3 notrans = string.maketrans('', '')  # identity "translation"
   4 
   5 def containsAny(str, set):
   6     return len(set)!=len(set.translate(notrans, str))
   7 
   8 def containsAll(str, set):
   9     return 0==len(set.translate(notrans, str))

This trick at least has some depth梚t relies on set.translate(notrans, str) being the subsequence of set that is made of characters not in str. If that subsequence has the same length as set, no characters have been removed by set.translate, so no characters of set are in str. Conversely, if that subsequence has length 0, all characters have been removed, so all characters of set are in str. The translate method of string objects keeps coming up naturally when one wants to treat strings as sets of characters, partly because it's so speedy and partly because it's so handy and flexible. See Recipe 3.8 for another similar application.

这个窍门深深地依赖于: set.translate(notrans, str)返回set的一个子列,那个子列由不在str出现的字符组成。如果那个子列与set的长度相同,那么没有字符被 set.translate 移走,所以set中没有字符在str中.相反,如果那个子列的长度为0,所有字符被 set.translate 移走,所以set中所有字符都在str中。字符串对象的translate方法,在人们把字符串看作字符集合时,保持了相当地自然风味,这部分原因是由于它的速度,部分 由于它的便利与柔性。参看 Recipe 3.8 以寻找它的另一个应用。

One last observation is that these different ways to approach the task have very different levels of generality. At one extreme, the earliest approaches, relying only on in (for looping on str and for membership in set) are the most general; they are not at all limited to string processing, and they make truly minimal demands on the representations of str and set. At the other extreme, the last approach, relying on the translate method, works only when both str and set are strings or closely mimic string objects' functionality.

最后看一下,这些不同的方法在完成工作时有不同水平的踏通用性。在一个极端,最早的方法,只依次赖于 in (在 str 上的 for 循环和 set 的成员关系),是最通用的.它不限于字符串处理,并且它们对 str 和 set 的表达方式的要求最小。在另一个极端,最后的方法,依赖于 translate 方法,只能工作于 str 和 set 都是字符串或尽量模仿字符串机制的对象上.

1.4. 参考 See Also

Recipe 3.8; documentation for the translate and maketrans functions in the string module in the Library Reference.