文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 0.706 [2004-09-13 20:18:24]

1. Accessing Substrings 访问子串

1.1. 问题 Problem

You want to access portions of a string. For example, you've read a fixed-width record and want to extract the record's fields.

你想存取字符串的一部分.何如,你读取了一条固定宽度的记录,然后从中提取出其中的字段.

1.2. 解决 Solution

Slicing is great, of course, but it only does one field at a time:

当然,切片(Slicing)是很好的方法,可是它一次只能处理一个字段.

   1 afield = theline[3:8]

If you need to think in terms of field length, struct.unpack may be appropriate. Here's an example of getting a five-byte string, skipping three bytes, getting two eight-byte strings, and then getting the rest:

如果你需要按字段长度(来提取),struct.unpack可能是适当的.这是一个例子,它读取5字节的字符串,跳过3个字节,再读取两个8字节的字符串,然后读取剩余部分.

   1 import struct
   2 # Get a 5-byte string, skip 3, get two 8-byte strings, then all the rest:
   3 baseformat = "5s 3x 8s 8s"
   4 numremain = len(theline)-struct.calcsize(baseformat)
   5 format = "%s %ds" % (baseformat, numremain)
   6 leading, s1, s2, trailing = struct.unpack(format, theline)

If you need to split at five-byte boundaries, here's how you could do it:

如果你需要在5个字节边界上分隔,你可以这么做:

   1 numfives, therest = divmod(len(theline), 5)
   2 form5 = "%s %dx" % ("5s "*numfives, therest)
   3 fivers = struct.unpack(form5, theline)
   4 Chopping a string into individual characters is of course easier:

把一个字符串分成单独的字符当然更容易:

   1 chars = list(theline)

If you prefer to think of your data as being cut up at specific columns, slicing within list comprehensions may be handier:

如果你更喜欢考虑把你的数据在指定的列上切开,在列表内涵中切片可能是更简洁的:

   1 cuts = [8,14,20,26,30]
   2 pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts+[sys.maxint]) ]

1.3. 讨论 Discussion

This recipe was inspired by Recipe 1.1 in the Perl Cookbook. Python's slicing takes the place of Perl's substr. Perl's built-in unpack and Python's struct.unpack are similar. Perl's is slightly handier, as it accepts a field length of * for the last field to mean all the rest. In Python, we have to compute and insert the exact length for either extraction or skipping. This isn't a major issue, because such extraction tasks will usually be encapsulated into small, probably local functions. Memoizing, or automatic caching, may help with performance if the function is called repeatedly, since it allows you to avoid redoing the preparation of the format for the struct unpacking. See also Recipe 17.8.

这个配方由Perl Cookbook的配方1.1引发,Python 的切片取胜代了Perl的子串.Perl内建的unpack 与Python 的struct.unpack相似. 在Perl中显得稍微简洁,因为为它允许用*来表示最后一段的长度,而取得所有其余部分.在Python中,我们不得不计算并插入正确的长度, 无论是提取还是跳过.这不是个主要问题,因为这样的提取工作将通常被压缩到很小,或许是局部函数.Memoizing,或自动缓存,在函数被重复 调用时可以提高性能,因为它让你在解包(struct unpacking)时避免重复作格式准备工作.

In a purely Python context, the point of this recipe is to remind you that struct.unpack is often viable, and sometimes preferable, as an alternative to string slicing (not quite as often as unpack versus substr in Perl, given the lack of a *-valued field length, but often enough to be worth keeping in mind).

在纯Python环境下,这个配方的要点是记住,作为字符串切片的另一个选择,struct.unpack经常是可行的并且有时是更好的.

Each of these snippets is, of course, best encapsulated in a function. Among other advantages, encapsulation ensures we don't have to work out the computation of the last field's length on each and every use. This function is the equivalent of the first snippet in the solution:

这些代码片断中的每一个,最好当然是封装到一个函数里.在另外的优点中,封装确保我们不必每一次都去计算最后字段的长度.这个函数等价于解决方案中的第一个片断:

   1 def fields(baseformat, theline, lastfield=None):
   2     numremain = len(theline)-struct.calcsize(baseformat)
   3     format = "%s %d%s" % (baseformat, numremain, lastfield and "s" or "x")
   4     return struct.unpack(format, theline)

If this function is called in a loop, caching with a key of (baseformat, len(theline), lastfield) may be useful here because it can offer an easy speed-up.

如果这个函数在循环中被调用,用一个键值来缓存(baseformat, len(theline), lastfield) 可能是有用的,它将容易提高速度。

The function equivalent of the second snippet in the solution is:

这个函数等价于解决方案中的第二个片断:

   1 def split_by(theline, n, lastfield=None):
   2     numblocks, therest = divmod(len(theline), n)
   3     baseblock = "%d%s"%(n, lastfield and "s" or "x")
   4     format = "%s %dx"%(baseblock*numblocks, therest)

And for the third snippet:

对第三个片断:

   1 def split_at(theline, cuts, lastfield=None):
   2     pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts) ]
   3     if lastfield:
   4         pieces.append(theline(cuts[-1:]))
   5     return pieces

In each of these functions, a decision worth noticing (and, perhaps, worth criticizing) is that of having a lastfield=None optional parameter. This reflects the observation that while we often want to skip the last, unknown-length subfield, sometimes we want to retain it instead. The use of lastfield in the expression lastfield and "s" or "x" (equivalent to C's lastfield?'s':'c') saves an if/else,but it's unclear whether the saving is worth it. "sx"[not lastfield] and other similar alternatives are roughly equivalent in this respect; see Recipe 17.6. When lastfield is false, applying struct.unpack to just a prefix of theline (specifically, theline[ :struct.calcsize(format)]) is an alternative, but it's not easy to merge with the case of lastfield being true,when the format does need a supplementary field for len(theline)-struct.calcsize(format).

这些函数中的每一个,一个值得注意(也许是值得责备)的决定是,都有一个可选的参数 lastfield=None,这反应了观测结果:我们经常跳过最后未知长度的字段,但有时我们却想保留它们。在表达式 lastfield and "s" or "x"(相当于C中的 lastfield?'s':'c')中,lastfield的用途时节省了一对 if/else .但不清楚是否值得节省。 "sx"(无lastfield)和其它相似的方案在这里都大概等价;参阅Recipe 17.6.当lastfield为false时,仅把struct.unpack应用到theline的前段(明确说:theline[ :struct.calcsize(format)])是一个选择,但在当数据格式确实需要一个附助的长度为len(theline)-struct.calcsize(format)的字段时,不容易与lastfield 为 true时的情况合并.

1.4. 参考 See Also

Recipe 17.6 and Recipe 17.8; Perl Cookbook Recipe 1.1.