-- Zoom.Quiet [2004-08-11 16:54:01]

在一个有序的序列里使用二分查找法来查找条目

2.6 Looking for Items in a Sorted Sequence Using Binary Search

Credit: Noah Spurrier, Alex Martelli

问题 Problem

You need to look for a lot of items in a sequence. 你需要在一个序列里寻找许多条目。

解决 Solution

Binary search is a basic algorithm provided by bisect in Python. How to perform a binary search to check if a value is present in a list can be summarized in three lines of code:

```   1 thelist.sort(  )
2 item_insert_point = bisect.bisect(thelist, theitem)
3 is_present = thelist[item_insert_point-1:item_insert_point] == [theitem]
```

讨论 Discussion

A friend of mine was searching a large file for missing lines. A linear search was taking forever, so a binary search was needed. The name of the bisect module misled him into believing that Python did not have a binary search algorithm. I created an example to show how bisect is used to perform a binary search.

The third line in the recipe may not be immediately obvious. If we know that thelist is not empty, this is much simpler and more obvious: 在这个配方里，第三行或许不是太明显。如果我们知道thelist不是空的，这样会更简单和更显著。

`is_present = thelist[item_insert_point-1] == theitem `

However, this simpler approach raises an exception for an empty thelist, since indexing must check for a valid index (the only way to return an indicator of "no such index" is by raising IndexError). Slicing is more relaxed than indexing, since it results in an empty slice for invalid slice-boundary indexes. Therefore, in general:

`somelist[x:x+1] `

yields the same one-item list as:

`[somelist[x]] `

when x is a valid index in somelist. However, it results in an empty list ([]) when the indexing would raise an IndexError. The third line in the recipe uses this idea to avoid having to deal with exceptions, thereby handling empty and nonempty cases for thelist in a perfectly uniform way. Another approach is:

`is_present = thelist and thelist[item_insert_point-1] == theitem `

This exploits and's short-circuiting behavior to guard the indexing, instead of using slicing. Both approaches are worth keeping in mind, since their areas of applicability overlap but do not coincide.

In my friend's case, of course, an auxiliary dictionary was also a possibility, because strings are hashable. However, the approach based on a sorted list may be the only way out in a few other cases, with items that are not hashable (but only if they are comparable, of course, since otherwise the list cannot be sorted). Also, if the list is already sorted, and the number of items you need to look up in it is not extremely large, it may be faster to use bisect rather than to build an auxiliary dictionary, since the investment of time in the latter operation might not be fully amortized.

Documentation for the bisect module in the Library Reference.

PyCkBk-2-6 (last edited 2009-12-25 07:14:02 by localhost)

• Page.execute = 1.797s
• getACL = 0.545s
• init = 0.029s