文章来自《Python cookbook》.


-- Zoom.Quiet [2004-08-11 01:12:01]


1.13 Flattening a Nested Sequence

Credit: Luther Blissett

问题 Problem

You have a sequence, such as a list, some of whose items may in turn be lists, and so on. You need to flatten it out into a sequence of its scalar items (the leaves, if you think of the nested sequence as a tree).

你有一个序列, 例如一个list, 或许它的元素中的一些依次是list等等.你需要把它的元素按照它的标量展开到一个序列中去

解决 Solution

Of course, we need to be able to tell which of the elements we're handling are to be deemed scalar. For generality, say we're passed as an argument a predicate that defines what is scalar梐 function that we can call on any element and that returns 1 if the element is scalar or 0 otherwise. Given this, one approach is:


   1 def flatten(sequence, scalarp, result=None):
   2     if result is None: result = []
   3     for item in sequence:
   4         if scalarp(item): result.append(item)
   5         else: flatten(item, scalarp, result)
   6     return result

In Python 2.2, a simple generator is an interesting alternative, and, if all the caller needs to do is loop over the flattened sequence, may save the memory needed for the result list:


   1 from _ _future_ _ import generators
   2 def flatten22(sequence, scalarp):
   3     for item in sequence:
   4         if scalarp(item):
   5             yield item
   6         else:
   7             for subitem in flatten22(item, scalarp):
   8                 yield subitem

讨论 Discussion

The only problem with this recipe is that determining what is a scalar is not as obvious as it might seem, which is why I delegated that decision to a callable predicate argument that the caller is supposed to pass to flatten. Of course, we must be able to loop over the items of any non-scalar with a for statement, or flatten will raise an exception (since it does, via a recursive call, attempt a for statement over any non-scalar item). In Python 2.2, that's easy to check:

这个配方的一个问题是, 【决定一个标量是什么】似乎不是象它一样明显。那就是为什么我决定委托一个可调用的谓词参数,它的调用者被假定平坦传递的。 当然,我们必须能够使用for语句在任何非标量的元素上循环,或者平坦化将引发一个异常(因为,它通过一个递归调用,企图让 for 语句作用在任何非标量条目上)

   1 def canLoopOver(maybeIterable):
   2     try: iter(maybeIterable)
   3     except: return 0
   4     else: return 1

The built-in function iter, new in Python 2.2, returns an iterator, if possible. for x in s implicitly calls the iter function, so the canLoopOver function can easily check if for is applicable by calling iter explicitly and seeing if that raises an exception.

内建的函数iter,在python2.2中是新的,如果可能,它返回一个iterator。'for x in s' 隐含在s上调用 iter 函数, canLoopOver 函数通过显示调用 iter 并看它是否抛出异常,能容易地检查出是否可以用 for 在其上循环。

   1 def canLoopOver(maybeIterable):
   2     try:
   3         for x in maybeIterable:
   4             return 1
   5         else:
   6             return 1
   7     except:
   8         return 0

Here we have to rely on the for statement itself raising an exception if maybeIterable is not iterable after all. Note that this approach is not fully suitable for Python 2.2: if maybeIterable is an iterator object, the for in this approach consumes its first item.

这儿,如果maybeIterable根本不是可迭代的,我们不得不依赖于for语句自己抛出一个异常。注意,这个方法不是完全适用于python2.2。 如果maybeIterable是一个迭代对象,for在这个方法里消灭掉第一个条目。

Neither of these implementations of canLoopOver is entirely satisfactory, by itself, as our scalar-testing predicate. The problem is with strings, Unicode strings, and other string-like objects. These objects are perfectly good sequences, and we could loop on them with a for statement, but we typically want to treat them as scalars. And even if we didn't, we would at least have to treat any string-like objects with a length of 1 as scalars. Otherwise, since such strings are iterable and yield themselves as their only items, our flatten function would not cease recursion until it exhausted the call stack and raised a RuntimeError due to "maximum recursion depth exceeded."

canLoopOver的这些实现方法作为我们的标量测试谓语,没有一个是完全令人满意的。问题是string,Unicode string 和其他类string的对象。这些对象是非常好的序列。并且我们能用一个for语句在他们上面循环。但是我们不想把他们作为一个标量。 甚至于如果我们不,我们至少不得不把他们作为一个长度为1的标量。否则,因为那样的string是可迭代的,并且产生自己作为他们唯一的条目。我们平坦化函数不能引起一个异常,知道它花费调用堆栈并且由于“递归层数太深”而引发 RuntimeError 异常。

Fortunately, we can easily distinguish string-like objects by attempting a typical string operation on them:

幸运的是,我们能很容易的通过 在他们上面尝试一个典型的string操作来分辨类string对象。

   1 def isStringLike(obj):
   2     try: obj+''
   3     except TypeError: return 0
   4     else: return 1

Now, we finally have a good implementation for the scalar-checking predicate:


   1 def isScalar(obj):
   2     return isStringLike(obj) or not canLoopOver(obj)

By simply placing this isScalar function and the appropriate implementation of canLoopOver in our module, before the recipe's functions, we can change the signatures of these functions to make them easier to call in most cases. For example:

通过在我们模块里,在配方的函数前面,简单的放置这个isScalar函数和合适的canLoopOver实现,在大多数情况下,我们能够改变这些函数的签名让他们更容易调用。 例如:

def flatten22(sequence, scalarp=isScalar): 

Now the caller needs to pass the scalarp argument only in those (hopefully rare) cases where our definition of what is scalar does not quite meet the caller's application-specific needs.



1.13.4 See Also The Library Reference section on sequence types.

PyCkBk-1-13 (last edited 2009-12-25 07:18:03 by localhost)