文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 218.25.65.133 [2004-09-30 04:40:07]

1. 描述

Computing Directory Sizes in a Cross-Platform Way

平台无关地计算目录大小

Credit: Frank Fejes

1.1. 问题 Problem

You need to compute the total size of a directory (or set of directories) in a way that works under both Windows and Unix-like platforms.

需要计算目录(或者目录集合)的大小,要求在Windows和类Unix系统上代码都适用。

1.2. 解决 Solution

There are easier platform-dependent solutions, such as Unix's du, but Python also makes it quite feasible to have a cross-platform solution:

特定平台上有简单的方法,如Unix上的du命令。使用Python可以更容易的获得平台无关的方法:

   1 import os
   2 from os.path import *
   3 
   4 class DirSizeError(Exception): pass
   5 
   6 def dir_size(start, follow_links=0, start_depth=0, max_depth=0, skip_errs=0):
   7 
   8     # Get a list of all names of files and subdirectories in directory start
   9     try: dir_list = os.listdir(start)
  10     except:
  11         # If start is a directory, we probably have permission problems
  12         if os.path.isdir(start):                                    #译注:没有读权限
  13             raise DirSizeError('Cannot list directory %s'%start)
  14         else:  # otherwise, just re-raise the error so that it propagates
  15             raise
  16 
  17     total = 0L
  18     for item in dir_list:
  19         # Get statistics on each item--file and subdirectory--of start
  20         path = join(start, item)
  21         try: stats = os.stat(path)
  22         except:
  23             if not skip_errs:
  24                 raise DirSizeError('Cannot stat %s'%path)           #译注:没有读权限
  25         # The size in bytes is in the seventh item of the stats tuple, so:
  26         total += stats[6]
  27         # recursive descent if warranted
  28         if isdir(path) and (follow_links or not islink(path)):      #译注:遍历计算子目录
  29             bytes = dir_size(path, follow_links, start_depth+1, max_depth)
  30             total += bytes
  31             if max_depth and (start_depth < max_depth):
  32                 print_path(path, bytes)
  33     return total
  34 
  35 def print_path(path, bytes, units='b'):
  36     if units == 'k':
  37         print '%-8ld%s' % (bytes / 1024, path)
  38     elif units == 'm':
  39         print '%-5ld%s' % (bytes / 1024 / 1024, path)
  40     else:
  41         print '%-11ld%s' % (bytes, path)
  42 
  43 def usage (name):
  44     print "usage: %s [-bkLm] [-d depth] directory [directory...]" % name
  45     print '\t-b\t\tDisplay in Bytes (default)'
  46     print '\t-k\t\tDisplay in Kilobytes'
  47     print '\t-m\t\tDisplay in Megabytes'
  48     print '\t-L\t\tFollow symbolic links (meaningful on Unix only)'
  49     print '\t-d, --depth\t# of directories down to print (default = 0)'
  50 
  51 if _ _name_ _=='_ _main_ _':
  52     # When used as a script:
  53     import string, sys, getopt
  54 
  55     units = 'b'
  56     follow_links = 0
  57     depth = 0
  58 
  59     try:
  60         opts, args = getopt.getopt(sys.argv[1:], "bkLmd:", ["depth="])           #译注:解析命令行参数 
  61     except getopt.GetoptError:
  62         usage(sys.argv[0])
  63         sys.exit(1)
  64 
  65     for o, a in opts:
  66         if o == '-b': units = 'b'
  67         elif o == '-k': units = 'k'
  68         elif o == '-L': follow_links = 1
  69         elif o == '-m': units = 'm'
  70         elif o in ('-d', '--depth'):
  71             try: depth = int(a)
  72             except:
  73                 print "Not a valid integer: (%s)" % a
  74                 usage(sys.argv[0])
  75                 sys.exit(1)
  76 
  77     if len(args) < 1:
  78         print "No directories specified"
  79         usage(sys.argv[0])
  80         sys.exit(1)
  81     else:
  82         paths = args
  83 
  84     for path in paths:
  85         try: bytes = dir_size(path, follow_links, 0, depth)
  86         except DirSizeError, x: print "Error:", x
  87         else: print_path(path, bytes)

1.3. 讨论 Discussion

Unix-like platforms have the du command, but that doesn't help when you need to get information about disk-space usage in a cross-platform way. This recipe has been tested under both Windows and Unix, although it is most useful under Windows, where the normal way of getting this information requires using a GUI. In any case, the recipe's code can be used both as a module (in which case you'll normally call only the dir_size function) or as a command-line script. Typical use as a script is:

类Unix平台上可以使用du命令计算目录大小,但是在其它平台上没有这个命令。上面脚本在Windows和Unix下都测试过。在Windows下面计算目录大小信息通常需要使用GUI(?),这样脚本在windows下更有用些. 在任何平台下,脚本代码可以作为一个模块(可能仅仅使用dir-size函数)或者作为命令行脚本使用。 典型的使用方法如下:

C:\> python dir_size.py "c:\Program Files"

This will give you some idea of where all your disk space has gone. To help you narrow the search, you can, for example, display each subdirectory:

使用上面命令行脚本可以获得全部磁盘空间的使用情况。如果想获得每个子目录的信息,可以使用搜索:

C:\> python dir_size.py --depth=1 "c:\Program Files"

The recipe's operation is based on recursive descent. os.listdir provides a list of names of all the files and subdirectories of a given directory. If dir_size finds a subdirectory, it calls itself recursively. An alternative architecture might be based on os.path.walk, which handles the recursion on our behalf and just does callbacks to a function we specify, for each subdirectory it visits. However, here we need to be able to control the depth of descent (e.g., to allow the useful --depth command-line option, which turns into the max_depth argument of the dir_size function). This control is easier to attain when we administer the recursion directly, rather than letting os.path.walk handle it on our behalf.

脚本代码中使用下向递归。os.listdir获得指定目录下所有文件名称和子目录名称的list。如果函数dir_size判断出一个子目录,会递归调用自己。另一种解决方法是以os.path.walk为基础,walk代替我们处理递归,在每个子目录上使用我们指定的回调函数获得目录大小信息。不过,这里我们需要控制递归的深度(比如,允许使用便利的命令行参数--depth来确定dir_size函数的参数max_depth),直接处理递归而不用os.path.walk代替我们处理, 可以更容易的控制函数行为。

1.4. 参考 See Also

Documentation for the os.path and getopt modules in the Library Reference.