具有索引的中文文章模板

1. 1.9 Finding the Intersection of Two Dictionaries

找出两个字典的交集

Credit: Andy McKay, Chris Perkins, Sami Hangaslammi

1.1. 1.9.1 Problem

问题

Given two dictionaries, you need to find the set of keys that are in both dictionaries.

给定两个字典,你要找出它们共有的键的集合。

1.2. 1.9.2 Solution

解决

Dictionaries are a good concrete representation for sets in Python, so operations such as intersections are common. Say you have two dictionaries (but pretend that they each contain thousands of items):

在Python中,字典是集合的一个很好的表现形式,所以像交集这样的操作是普遍的。假定你有两个字典(假定他们包含数千条项目):

   1 some_dict = { 'zope':'zzz', 'python':'rocks' }
   2 another_dict = { 'python':'rocks', 'perl':'$' }

Here's a bad way to find their intersection that is very slow:

这是发现他们交集的一个不好的办法,非常慢:

   1 intersect = []
   2 for item in some_dict.keys(  ):
   3     if item in another_dict.keys(  ):
   4         intersect.append(item)
   5 print "Intersects:", intersect

And here's a good way that is simple and fast:

这是一个好的办法,并且速度比较快:

   1 intersect = []
   2 for item in some_dict.keys(  ):
   3     if another_dict.has_key(item):
   4         intersect.append(item)
   5 print "Intersects:", intersect

In Python 2.2, the following is elegant and even faster:

在Python2.2中,下面的表示优雅且更快速:

   1 print "Intersects:", [k for k in some_dict if k in another_dict]

And here's an alternate approach that wins hands down in speed, for Python 1.5.2 and later:

对于Python1.5.2和以后版本,这里有一个可选的方法,它很轻易地在速度上取胜。

   1 print "Intersects:", filter(another_dict.has_key, some_dict.keys())

1.3. 1.9.3 Discussion

讨论

The keys method produces a list of all the keys of a dictionary. It can be pretty tempting to fall into the trap of just using in, with this list as the righthand side, to test for membership. However, in the first example, you're looping through all of, then each time looping through all of another_dict. If some_dict has N1 items, and another_dict has N2 items, your intersection operation will have a compute time proportional to the product of N1x N2. (O(N1x N2) is the common computer-science notation to indicate this.)

Keys方法产生字典里所有键的列表。这个列表就在你右手边,它有相当的诱惑力让人跌入陷阱并且使用他们去测试成员关系。然而,在第一个例子里,你所做的循环遍历some_dict的全部。然后,再一次循环遍历another_dict的全部。如果some_dict有N1个条目,another_dict有N2个条目,你的交集操作的计算时间和N1x N2.成比例(O(N1x N2)是一个计算机科学符号来指出这一点)

By using the has_key method, you are not looping on another_dict any more, but rather checking the key in the dictionary's hash table. The processing time for has_key is basically independent of dictionary size, so the second approach is O(N1). The difference is quite substantial for large dictionaries! If the two dictionaries are very different in size, it becomes important to use the smaller one in the role of some_dict, while the larger one takes on the role of another_dict (i.e., loop on the keys of the smaller dictionary, thus picking the smaller N1).

通过使用has_key方法,你再也不用在another_dict里做循环,而是检查字典hash表里的键。 Has_key的处理时间是独立于字典的尺寸的,所以,第二个方法是O(N1)。 对大的字典来说差别是巨大的!如果两个字典都有不同的尺寸,当大的那一个作为another_dict的角色(在小的那个字典上遍历keys, 那样会让N1小点)

Python 2.2 lets you iterate on a dictionary's keys directly, with the statement:

Python2.2让你直接在一个字典的keys里迭代,它使用这样的语句:

for key in dict

You can test membership with the equally elegant:

你能够同样优雅的测试成员关系:

if key in dict

rather than the equivalent but syntactically less nice dict.has_key(key). Combining these two small but nice innovations of Python 2.2 with the list-comprehension notation introduced in Python 2.0, we end up with a very elegant approach, which is at the same time concise, clear, and quite speedy.

而不是用同等的但是在语句上略逊一筹的dict.has_key(key)。用在python2.0中介绍的列表包含(list-comprehension)符号来结合这两种小而美的python2.2的创新,我们使用一个优雅在同时又是简练,清晰,快速的方法来结束。

However, the fastest approach is the one that uses filter with the bound method another_dict.has_key on the list some_dict.keys. A typical intersection of two 500-item dictionaries with 50% overlap, on a typical cheap machine of today (AMD Athlon 1.4GHz, DDR2100 RAM, Mandrake Linux 8.1), took 710 microseconds using has_key, 450 microseconds using the Python 2.2 technique, and 280 microseconds using the filter-based way. While these speed differences are almost substantial, they pale in comparison with the timing of the bad way, for which a typical intersection took 22,600 microseconds?0 times longer than the simple way and 80 times longer than the filter-based way! Here's the timing code, which shows a typical example of how one goes about measuring relative speeds of equivalent Python constructs:

然而,最快速的方法是在some_dict.keys列表上绑定another_dict.has_key方法来使用filter。一个典型的交集是,2个有500个条目,并且50%重复的字典,在今天一个典型的便宜机器上(AMD Athlon 1.4GHz, DDR2100 RAM, Mandrake Linux 8.1), 使用has_key花费了710微秒,使用python2.2技术花费450微秒,使用基于filter的方法花费280微秒。这些速度的差异是真实的,他们让不好的方法(译注:指最前面使用两层循环的那个程序)失色于比较之中,对于那样一个典型的交集,22600微秒是简单方法的40倍,是基于filter方法的80倍。这儿是计时的代码,它显示一个典型的例子,怎么样去测量同等python结构的速度

   1 import time
   2 def timeo(fun, n=1000):
   3     def void(  ): pass
   4     start = time.clock(  )
   5     for i in range(n): void(  )
   6     stend = time.clock(  )
   7     overhead = stend - start
   8     start = time.clock(  )
   9     for i in range(n): fun(  )
  10     stend = time.clock(  )
  11     thetime = stend-start
  12     return fun._ _name_ _, thetime-overhead
  13 to500 = {}
  14 for i in range(500): to500[i] = 1
  15 evens = {}
  16 for i in range(0, 1000, 2): evens[i] = 1
  17 def simpleway(  ):
  18     result = []
  19     for k in to500.keys(  ):
  20         if evens.has_key(k):
  21             result.append(k)
  22     return result
  23 def pyth22way(  ):
  24     return [k for k in to500 if k in evens]
  25 def filterway(  ):
  26     return filter(evens.has_key, to500.keys(  ))
  27 def badsloway(  ):
  28     result = []
  29     for k in to500.keys(  ):
  30         if k in evens.keys(  ):
  31             result.append(k)
  32     return result
  33 for f in simpleway, pyth22way, filterway, badsloway:
  34     print "%s: %.2f"%timeo(f)

You can save this code into a .py file and run it (a few times, on an otherwise quiescent machine, of course) with python -O to check how the timings of the various constructs compare on any specific machine in which you're interested. (Note that this script requires Python 2.2 or later.) Timing different code snippets to find out how their relative speeds compare is an important Python technique, since intuition is a notoriously unreliable guide to such relative-speed comparisons. For detailed and general instruction on how to time things, see the introduction to Chapter 17.

你能保存这些代码到一个.py文件并且使用python -O运行它(当然,在一个静止的机器上,需要少量的时间)在你感兴趣的任意的特别的机器上去检查不同结构的时间比较。(注意这个脚本要求python2.2或以后版本)。时间差代码片断去发现他们相关速度怎样比较是一种重要的python技术,因为对于那样相关速度的比较,直觉是不可靠的。

When applicable without having to use a lambda form or a specially written function, filter, map, and reduce often offer the fastest solution to any given problem. Of course, a clever Pythonista cares about speed only for those very, very few operations where speed really matters more than clarity, simplicity, and elegance! But these built-ins are pretty elegant in their own way, too.

当可用的而没有不得不使用lambda形式或者特别写的函数的时候,filter, map, 和reduce经常为被给出的问题提供最快的解决方案

We don't have a separate recipe for the union of the keys of two dictionaries, but that's because the task is even easier, thanks to a dictionary's update method:

对于两个字典的键的联合,我们没有一个单独的配方。那是因为由于字典的update方法,而使任务甚至是更容易的,

   1 def union_keys(some_dict, another_dict):
   2     temp_dict = some_dict.copy(  )
   3     temp_dict.update(another_dict)
   4     return temp_dict.keys(  )

1.4. 1.9.4 See Also

参考

The Library Reference section on mapping types.