文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 61.182.251.99 [2004-09-24 23:03:18]

1. 描述

Updating a Random-Access File

更新可随机访问的文件

Credit: Luther Blissett

1.1. 问题 Problem

需要从由固定长度的记录块组成的文件中随机读取某特定的2进制记录,改变此记录的值,然后在文件中更新此记录。

1.2. 解决 Solution

Read the record, unpack it, perform whatever computations you need for the update, pack the fields back into the record, seek to the start of the record again, and write it back. Phew. Faster to code than to say:

读取记录(见4.14)、 解组、 对记录进行需要的计算更新,编组更新后的记录,seek到文件中记录原来的字节位移处,写入新记录。

呜呼!写代码比讲解快:

   1 
   2 import struct
   3 
   4 thefile = open('somebinfile', 'r+b')                          #以可读写方式2进制模式打开文件
   5 record_size = struct.calcsize(format_string)                  #记录模式串
   6 
   7 thefile.seek(record_size * record_number)                     #seek文件指针到目的记录处
   8 buffer = thefile.read(record_size)                            #读取 
   9 fields = list(struct.unpack(format_string, buffer))       #按模式解组纪录,处理返回的tuple,获得域list 
  10 
  11 # Perform computations, suitably modifying fields, then:      #计算,处理  
  12 
  13 buffer = struct.pack(format_string, *fields)                  #编组 
  14 thefile.seek(record_size * record_number)                     #seek文件指针到原来处
  15 thefile.write(buffer)                                         #写回
  16 
  17 thefile.close(  )

1.3. 讨论 Discussion

此方法仅适用于如下文件:文件(一般是2进制文件)包含的记录具有一致、固定大小, 而对于处理普通文本文件并不适用。 同时, 记录块的大小必须与(代码中)结构模式串确定的大小一致。

典型的模式串,比如"8l"(#译注 :"l"in word "letter", not "1" in "123", see any differene? odd!),确定了记录块由8个4字节的整数组成, 每个整数会被解析成有符号的值,解组成Python的int类型。

In this case, the fields variable in the recipe would be bound to a list of eight ints.

如上,脚本中将记录中各值域解组成一个有8个整数的list.

  • Note that struct.unpack returns a tuple. Because tuples are immutable, the computation would have to rebind the entire fields variable. A list is not immutable, so each field can be rebound as needed. Thus, for convenience, we explicitly ask for a list when we bind fields.

注意struct.unpack返回一个元组。由于元组不可改变,所以计算处理记录的值域变量时必须绑定(到其他结构中)。list 是可变的,其中的每个元素可以按需要从新赋值。因此,为方便计,这里我们使用了一个list来绑定记录值域变量。

Make sure, however, not to alter the length of the list. In this case, it needs to remain composed of exactly eight integers, or the struct.pack call will raise an exception when we call it with a format_string that is still "8l". Also note that this recipe is not suitable for working with records that are not all of the same, unchanging length.

不过:记住不要改变list的长度,这里list的元素必须还是由8个整数组成,否者用原来的模式串"81"进行struct.pack编组会抛出异常。 同时要注意本节方法对于除具有统一、不变长度记录块的文件之外的文件并不适用。

To seek back to the start of the record, instead of using the record_size*record_number offset again, you may choose to do a relative seek:

seek到原记录的位置,可以不再使用record*record_number进行位移, 而是进行相对位移:

thefile.seek(-record_size, 1)

The second argument to the seek method (1) tells the file object to seek relative to the current position (here, so many bytes back, because we used a negative number as the first argument).

参数1使得文件对象指针?相对于当前位置移动(这里第一个参数使用了负数,正好移动到原记录开始处)

seek's default is to seek to an absolute offset within the file (i.e., from the start of the file). You can also explicitly request this default behavior by calling seek with a second argument of 0. (呵呵!不译!)

Of course, you don't need to open the file just before you do the first seek or close it right after the write. Once you have a file object that is correctly opened (i.e., for update, and as a binary rather than a text file), you can perform as many updates on the file as you want before closing the file again. These calls are shown here to emphasize the proper technique for opening a file for random-access updates and the importance of closing a file when you are done with it.

当然,不需要再第一次seek前打开文件,也不需要在write后关闭文件.只要有一个正确打开的文件(为了更新而打开的2进制文件)对象,可以进行需要的多次更新. 这里打开关闭文件只是为了强调打开随机访问文件的方法以及关闭文件的重要性.

The file needs to be opened for updating (i.e., to allow both reading and writing). That's what the 'r+b' argument to open means: open for reading and writing, but do not implicitly perform any transformations on the file's contents, because the file is a binary one (the 'b' part is unnecessary but still recommended for clarity on Unix and Unix-like systems梙owever, it's absolutely crucial on other platforms, such as Macintosh and Windows). If you're creating the binary file from scratch but you still want to be able to reread and update some records without closing and reopening the file, you can use a second argument of 'w+b' instead. However, I have never witnessed this strange combination of requirements; binary files are normally first created (by opening them with 'wb', writing data, and closing the file) and later opened for update with 'r+b'.

为了跟新文件而打开文件,需要使用"r+b"参数: 以可读写的方式、2进制方式打开文件,并不隐式对文件内容进行任何变换(在Unix以及类Unix平台上虽然不必要,但为了清晰性建议也使用"b", 在Windows和Mactintosh平台上"b"是必须的)。 建立一个空白文件写入内容,再读取更新记录,可以不需要关闭文件再打开文件进行处理,可以只使用w+b参数建立打开新文件。 不过,作者没遇到过如此奇怪的复合要求: 正常都是先建立2进制文件(用参数wb,写入信息,关闭文件),然后使用"r+b"参数再次打开文件进行处理。

1.4. 参考 See Also

Python 库参考file对象和struct模块部分;

Perl Cookbook recipe 8.13