文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 0.706 [2004-09-27 19:36:28]

1. Printing Unicode Characters to Standard Output 打印Unicode字符到标准输出

Credit: David Ascher

1.1. 问题 Problem

You want to print Unicode strings to standard output (e.g., for debugging), but they don't fit in the default encoding.

你想要把Unicode字符串打印到标准输出(举例来说,为了Debug),但是他们不符合缺省编码。

1.2. 解决 Solution

Wrap the stdout stream with a converter, using the codecs module:

使用 codecs 模块,用一个转换器包装 stdout 流:

   1 import codecs, sys
   2 sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)

1.3. 讨论 Discussion

Unicode strings live in a large space, big enough for all of the characters in every language worldwide, but thankfully the internal representation of Unicode strings is irrelevant for users of Unicode.Alas, a file stream, such as sys.stdout, deals with bytes and has an encoding associated with it.You can change the default encoding that is used for new files by modifying the site module.That, however, requires changing your entire Python installation, which is likely to confuse other applications that may expect the encoding you originally configured Python to use (typically ASCII).This recipe rebinds sys.stdout to be a stream that expects Unicode input and outputs it in ISO8859-1 (also known as Latin-1).This doesn't change the encoding of any previous references to sys.stdout, as illustrated here.First, we keep a reference to the original, ASCII-encoded stdout:

Unicode串拥有大的空间,对于全世界各种语言所有的字符都是足够大的,但是谢天谢地,Unicode码串的内在表现对Unicode的使用者是不必关心的。一个 sys.stdout这样处理字节的文件流,有一个编码与它关联。你可以通过修改site模块,改变新建文件的缺省编码。然而那样需要变更你的python安装, 这可能搞乱其他应用程序,它们可能期待你原先配置的编码(典型地,ASCII)。这一份配方重新绑定sys.stdout到一个流,它期待Unicode输入,并用ISO8859-1 (也即 Latin-1)输出它. 这不改变任何sys.stdout的早先引用的编码, 如这里列举的。 首先,我们保存最初使用ASCII码的stdout:

>>> old = sys.stdout

Then we create a Unicode string that wouldn't go through stdout normally:

然后我们产生一个不能正常地通过stdout的Unicode串:

>>> char = u"\N{GREEK CAPITAL LETTER GAMMA}"  # a character that doesn't fit in ASCII
>>> print char
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

Now we wrap stdout in the codecs stream writer for UTF-8, a much richer encoding, rebind sys.stdout to it, and try again:

现在我们用codecs中关于UTF-8,一个非常充足的编码,的写流函数(stream_writer),包装stdout流,把它重绑定到sys.stdout, 再一次试:

>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
>>> print char


1.4. 参考 See Also

Documentation for the codecs and site modules and setdefaultencoding in sys in the Library Reference.