文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- Zoom.Quiet [2004-08-11 16:54:01]

1. 在一个有序的序列里使用二分查找法来查找条目

2.6 Looking for Items in a Sorted Sequence Using Binary Search 在一个有序的序列里使用二分查找法来查找条目

Credit: Noah Spurrier, Alex Martelli

1.1. 问题 Problem

You need to look for a lot of items in a sequence.

你需要在一个序列里寻找许多条目。

1.2. 解决 Solution

Binary search is a basic algorithm provided by bisect in Python. How to perform a binary search to check if a value is present in a list can be summarized in three lines of code:

二分查找法是一个在python通过bisect来提供的基本算法。怎样去执行一个二分查找去检查是否一个值出现在一个list里可以总结成三行代码:

   1 thelist.sort(  )
   2 item_insert_point = bisect.bisect(thelist, theitem)
   3 is_present = thelist[item_insert_point-1:item_insert_point] == [theitem]

1.3. 讨论 Discussion

A friend of mine was searching a large file for missing lines. A linear search was taking forever, so a binary search was needed. The name of the bisect module misled him into believing that Python did not have a binary search algorithm. I created an example to show how bisect is used to perform a binary search.

我的一个朋友为了丢失的一些行正在查找一个大的文件。 一个线性查找永远是吸引人的。所以一个二分查找是需要的。Bisect模块的名字误导它相信python没有一个二分查找算法。我建立了一个例子去显示bisect怎么样被使用去执行一个二分查找。

The third line in the recipe may not be immediately obvious. If we know that thelist is not empty, this is much simpler and more obvious: 在这个配方里,第三行或许不是太明显。如果我们知道thelist不是空的,这样会更简单和更显著。

is_present = thelist[item_insert_point-1] == theitem 

However, this simpler approach raises an exception for an empty thelist, since indexing must check for a valid index (the only way to return an indicator of "no such index" is by raising IndexError). Slicing is more relaxed than indexing, since it results in an empty slice for invalid slice-boundary indexes. Therefore, in general:

然而,这个简单的方法为一个空的thelist引发一个异常, 因为必须检查索引的有效性(返回一个"no such index"的指示的唯一种方法是通过引发IndexError)。分片比索引更简单,因为它为无效的分片边界索引导致一个空的分片。所以,一般来说:

somelist[x:x+1] 

yields the same one-item list as:

象这样产生相同的一个条目的list

[somelist[x]] 

when x is a valid index in somelist. However, it results in an empty list ([]) when the indexing would raise an IndexError. The third line in the recipe uses this idea to avoid having to deal with exceptions, thereby handling empty and nonempty cases for thelist in a perfectly uniform way. Another approach is:

当x在somelist是一个有效的索引。然而,当索引的时候引发一个IndexError, 它将导致一个空list([])的产生。在配方里的第三行使用这个点子去避免处理异常,因而处理空的和非空的情况对于thelist来说,使用一个完美的统一的方法。另一种方法是:

is_present = thelist and thelist[item_insert_point-1] == theitem 

This exploits and's short-circuiting behavior to guard the indexing, instead of using slicing. Both approaches are worth keeping in mind, since their areas of applicability overlap but do not coincide.

它使用and的短路行为代替使用分片来保证索引。两种方法都值得记在脑子里,因为他们的使用范围重叠了,但并不一致。

In my friend's case, of course, an auxiliary dictionary was also a possibility, because strings are hashable. However, the approach based on a sorted list may be the only way out in a few other cases, with items that are not hashable (but only if they are comparable, of course, since otherwise the list cannot be sorted). Also, if the list is already sorted, and the number of items you need to look up in it is not extremely large, it may be faster to use bisect rather than to build an auxiliary dictionary, since the investment of time in the latter operation might not be fully amortized.

象我朋友的这种情况,当然,一个辅助的字段也是一种可能。因为string是hashable。然而,在一些其他情况里,没有hashable, 基于一个排序list的方法就是唯一种途径(但是如果他们是可比较的,当然,另外因为list不能被排序)。 同样,如果list已经被排序了。并且你需要查找的条目不是特别的大。或许使用bisect比构造一个辅助字典更快。因为后者的时间投资不可能完全被付清。

1.4. 参考 See Also

Documentation for the bisect module in the Library Reference.