文章来自《Python cookbook》.


-- Zoom.Quiet [2004-08-11 16:47:25]


2.4 Sorting While Guaranteeing Sort Stability


Credit: Alex Martelli, David Goodger

问题 Problem

You need to sort a Python list in a guaranteed-stable way (i.e., with no alteration in the relative ordering of items that compare equal).

你需要一个保证稳定的方法来排序一个python list(相等的元素的相关顺序不用变动)

解决 Solution

In addition to speed, decorate-sort-undecorate (DSU) offers flexibility that sort with a comparison function argument just can't match. For example, you can ensure the sort's stability, as follows:

除了速度之外, 装饰-排序-去除装饰(DSU)提供了比仅仅使有一个比较函数参数的排序所没有的弹性。例如, 你能够确保排序的稳定, 象下面一样:

   1 def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
   2     # Decorate: prepare a suitable auxiliary list
   3     decorated = zip(alist, _indices)
   5     # Sort: do a plain built-in sort on the auxiliary list
   6     decorated.sort(  )
   8     # Undecorate: extract the list from the sorted auxiliary list
   9     return [ item for item, index in decorated ]
  11 def stable_sort_inplace(alist):
  12     # To sort in place: assign sorted result to all-list slice of original list
  13     alist[:] = stable_sorted_copy(alist)

讨论 Discussion

The notion of a stable sort is typically not relevant if the sequences you are sorting contain objects that are uniform in type and are simple, such as a list of integers. In such cases, objects that compare equal are equal in all measurable ways. However, in cases where equality for the sake of ordering doesn't correspond to "deep" equality梥uch as tuples containing floating-point and integer numbers, or, more commonly, rich objects with arbitrary internal structure梚t may matter that the elements that start off earlier in the list than their "equal" counterparts remain earlier after the sort.

如果你正在排序的序列包含类型统一,简单的对象, 诸如一个整数list,稳定排序的概念是典型的不相关的。在那样的情况下,在所有可测量的方法里, 比较相等的对象是同等的。然而, 在这里为了顺序的缘故,等同性不对应【深度】相等。诸如tuple包含了浮点数和整数,或者更普通的, 具有内在结构的丰富对象是在list里开始的早的元素比排序后他们相等的副本保持的更早的实质[有个乱码,这句话不太懂]

Python lists' sort method is not guaranteed to be stable: items that compare equal may or may not be in unchanged order (they often are, but you cannot be sure). Ensuring stability is easy, as one of the many applications of the common DSU idiom. For another specific example of DSU usage, see Recipe 2.7.

Python list的排序方法不能保证是稳定的:对比相等的条目可以是或者可以不是在未改变的顺序(他们经常是, 但你不能确定)。 确保稳定是容易的, 作为普通DSU惯例的许多应用之一。其他特别的DSU的例子使用,请看 PyCkBk-2-7---Recipe 2.7

First, you build an auxiliary list (the decorate step), where each item is a tuple made up of all sort keys, in descending order of significance, of the corresponding item of the input sequence. You must include in each tuple of the auxiliary list all of the information in the corresponding item, and/or an index to it, so that you can fetch the item back, or reconstruct it, in the third step.

首先, 你构造一个辅助list(装饰步骤), 这里,每一个条目是一个由所有按递减排序的,和输入序列相对应条目的的key组成的tuple。 在每一个辅助list的tuple中, 你必须包括对应条目的所有信息,并且(或者)对他们索引。以便在第三步你能取回他们或者重构他们。

In the second step, you sort the auxiliary list by its built-in sort method, without arguments (this is crucial for performance).

在第二步, 你通过辅助list的内建排序方法来排序它, 而不用参数(这对性能是至关重要的)

Finally, you reconstruct the desired sorted list by undecorating the now-sorted auxiliary list. Steps 1 and 3 can be performed in several ways: with map, zip, list comprehensions, or explicit loops. List comprehensions and zip are generally simpler, although they require Python 2.0 or later. If you need to be compatible with Python 1.5.2, you will have to use map, which unfortunately is often less clear and readable.

最后,你通过去除现在排序的辅助list的装饰来重构list。 第一和第三步能够用好几种方法来执行: map, zip, list内涵, 或者显式的循环。一般来说List内涵和zip是比较简单的。尽管他们要求python2.0以后。 如果你需要同python 1.5.2兼容,你不得不使用map, 不幸的是它缺乏清晰性和可读性。

This idiom is also known as the "Schwartzian transform," by analogy with a related Perl idiom. However, that term implies using Perl's map and grep functions and performing the whole idiom inside of a single statement.

这个惯用方法也是众所周知的Schwartzian变换, 类似于perl相关的惯例。 让而,那个术语暗示使用perl的map和grep函数,并且在单个语句里执行整个的惯用方法。

DSU inherently supplies a sorted copy, but if the input sequence is a list, we can just assign to its include-everything slice to get an in-place effect instead.

DSU天生的就提供一个排序的拷贝, 但是如果输入序列是一个list。我们能仅仅赋给它一个包括万物的片断,就得到一个有效的替换。

This recipe specifically demonstrates using DSU to achieve a stable sort (i.e., a sort where items that compare equal keep the same relative order in the result list as they had in the input sequence). For this specific task, passing an argument to the built-in sort method is of no use. More generally, the DSU idiom can sometimes be replaced by passing a comparison function argument to sort, but DSU tends to be much faster, and speed often matters when you are sorting sequences that aren't tiny.

这个配方特别地使用DSU去获得一个稳定的排序(比较相等的条目在结果list里保持相同的关联顺序就象他们在输入序列里一样的一个排序)。 对于这个特别的任务, 传递一个参数到内建的sort方法没有使用。更普遍的是, DSU惯用方法能够有时候通过传递一个比较函数的参数来代替去排序。但是DSU趋于更快。并且当你正在排序一个不是很小的序列的时候,速度经常是紧要的。

The speed comes from maximally accelerating (by not using a Python-coded function as an argument to sort) the O(N log N) part, which dominates sorting time for sequences of substantial length N. The decoration and undecoration steps are both O(N), and thus they contribute negligible time if N is large enough and reasonably little time even for many practical values of N.

速度来自于最大的加速(不使用python代码函数作为一个参数去排序)O(N log N)部分, 它对于固定长度N的序列在排序时间上占优势,因而, 即使对于N的许多实际值, 如果N是足够大的,他们仅仅占用可以忽略的一点时间和适当少的时间。

Note the named argument _indices. This ensures that a single copy of the necessary xrange object is generated at function-definition time. It can then be reused to decorate any argument sequence with indexes, by exploiting zip's truncation behavior on unequal-length arguments (see Recipe 1.14).

注意命名的参数_indices。 这个确保一个单独的必要的xrange对象的拷贝是在函数定义时期产生的。通过在使用不等长度参数上zip的截断行为,它能被重新使用去装饰任何带有索引的参数序列。

参考 See Also

PyCkBk-1-14--Recipe 1.14 and PyCkBk-2-7--Recipe 2.7.

PyCkBk-2-4 (last edited 2009-12-25 07:14:02 by localhost)