PyTips

收集Python 各种方面的技巧片段！ -- Zoom.Quiet [2004-08-09 23:28:59]

Contents

各种实用代码片段
Python多进程处理之参考大全
将你的Python脚本转换为Windows exe程序
使用 WinAPI 的例子
在函数中确定其调用者！
Python哲学--内省的威力
在正则表达式中嵌入注释时的陷阱
python写的数字转中文的程序

各种实用代码片段

正则表达式使用

正在使用正则表达式，随手翻译了一正python的文档

::-- ZoomQuiet [2005-04-28 04:15:10]

日期: 2005-4-28 上午11:08
主题: [python-chinese] 正在使用正则表达式，随手翻 译了一正python的文档
回复 | 回复所有人 | 转发 | 打印 | 将发件人添加到通讯录 | 删除该邮件 | 这是网络欺诈 | 显示原始邮件
大部分与其它语言中的规则一致，但是也有部分不同的地方，手头有个工作要用到正则表达式，就随手翻译了一了python的帮助文档。组织的不是很正规。看懂是没有问题的。

###########################################################
特殊字符:
###########################################################
   "."      匹配除 "\n" 之外的任何单个字符。要匹配包括 '\n' 在内的任何字符，请使用象 '[.\n]' 的模式。
   "^"      匹配输入字符串的开始位置。
   "$"      匹配输入字符串的结束位置。
   "*"      匹配前面的子表达式零次或多次。例如，zo* 能匹配 "z" 以及"zoo"。 * 等价于{0,}。 Greedy means 贪婪的
   "+"      匹配前面的子表达式一次或多次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等价于 {1,}。
   "?"      匹配前面的子表达式零次或一次(贪婪的)
   *?,+?,?? 前面三个特殊字符的非贪婪版本
   {m,n}    最少匹配 m 次且最多匹配 n 次(m 和 n 均为非负整数，其中m <= n。)
   {m,n}?   上面表达式的非贪婪版本.
   "\\"      Either escapes special characters or signals a special sequence.
   []       表示一个字符集合，匹配所包含的任意一个字符
            第一个字符是 "^" 代表这是一个补集
   "|"      A|B, 匹配 A 或 B中的任一个
   (...)    Matches the RE inside the parentheses（圆括号）.（匹配pattern 并获取这一匹配）
            The contents can be retrieved（找回） or matched later in the string.
   (?iLmsux) 设置 I, L, M, S, U, or X 标记 (见下面).
   (?:...)  圆括号的非成组版本.
   (?P<name>...) 被组（group）匹配的子串，可以通过名字访问
   (?P=name) 匹配被组名先前匹配的文本（Matches the text matched earlier by the
group named name.）
   (?#...)  注释；被忽略.
   (?=...)  Matches if ... matches next, but doesn't consume the
string（但是并不消灭这个字串.）
   (?!...)  Matches if ... doesn't match next.

The special sequences consist of "\\" and a character from the list
below.  If the ordinary character is not on the list, then the
resulting RE will match the second character.
   \number  Matches the contents of the group of the same number.
   \A       Matches only at the start of the string.
   \Z       Matches only at the end of the string.
   \b       Matches the empty string, but only at the start or end of a word
                                       匹配一个空串但只在一个单词的开始或者结束的地方.匹配单词的边界
   \B       匹配一个空串, 但不是在在一个单词的开始或者结束的地方.（匹配非单词边界）
   \d       匹配一个数字字符。等价于 [0-9]。
   \D       匹配一个非数字字符。等价于 [^0-9]。
   \s       匹配任何空白字符，包括空格、制表符、换页符等等。等价于[ \f\n\r\t\v]。
   \S       匹配任何非空白字符。等价于 [^ \f\n\r\t\v]。
   \w       匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'.
            With LOCALE, it will match the set [0-9_] plus characters defined
            as letters for the current locale.
   \W       匹配\w的补集（匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'。）
   \\       匹配一个"\"(反斜杠)

##########################################################
共有如下方法可以使用：
##########################################################
   match    从一个字串的开始匹配一个正则表达式
   search   搜索匹配正则表达式的一个字串
   sub      替换在一个字串中发现的匹配模式的字串
   subn     同sub，但是返回替换的个数
   split    用出现的模式分割一个字串
   findall  Find all occurrences of a pattern in a string.
   compile  把一个模式编译为一个RegexObject对像.
   purge                       清除正则表达式缓存
   escape   Backslash（反斜杠）all non-alphanumerics in a string.

Some of the functions in this module takes flags as optional parameters:
   I  IGNORECASE  Perform case-insensitive matching.（执行大小写敏感的匹配）
   L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
   M  MULTILINE   "^" matches the beginning of lines as well as the string.
                  "$" matches the end of lines as well as the string.
   S  DOTALL      "." matches any character at all, including the newline（换行符）.
   X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
   U  UNICODE     Make \w, \W, \b, \B, dependent on the Unicode locale.

This module also defines an exception 'error'.

compile(pattern, flags=0)
返回一个模式对像
Compile a regular expression pattern, returning a pattern object.

escape(pattern)
Escape all non-alphanumeric characters in pattern.

findall(pattern, string)
如果出现一个或多个匹配，返回所有组的列表；这个列表将是元组的列表。
空匹配也在返回值中
Return a list of all non-overlapping（不相重叠的） matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.

finditer(pattern, string)
返回一个指示器（iterator）；每匹配一次，指示器返回一个匹配对像。
空匹配也在返回值中
Return an iterator over all non-overlapping matches in the
string.  For each match, the iterator returns a match object.
Empty matches are included in the result.

match(pattern, string, flags=0)
返回一个匹配的对像，如果没有匹配的，返回一个None
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.

purge()
Clear the regular expression cache

search(pattern, string, flags=0)
返回一个匹配的对像，如果没有匹配的，返回一个None
Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found.

split(pattern, string, maxsplit=0)
返回一个包含结果字串的列表
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

sub(pattern, repl, string, count=0)
返回一个字串，最左边被不重叠的用"repl"替换了。
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl

subn(pattern, repl, string, count=0)
返回一个包含(new_string, number)的2元组；number是替换的次数
Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl.  number is the number of
substitutions that were made.

template(pattern, flags=0)
返回一个模式对像
Compile a template pattern, returning a pattern object

_______________________________________________
python-chinese list
[email protected]
http://python.cn/mailman/listinfo/python-chinese

自动检查md5sums

From: steve <[email protected]>

   1 #! /usr/local/bin/python
   2 
   3 import commands
   4 file = raw_input("Enter the filename: ")
   5 sum = raw_input("Enter the md5sum: ")
   6 md = "md5sum " + file
   7 print md
   8 check = str(commands.getoutput(md))
   9 checksum = sum + "  " + file
  10 #print checksum
  11 print check
  12 if check == checksum: print "Sums OK"
  13 else: print "Sums are not the same!"

提取网页中的超链接

   1 r='<a(?:(?:\\s*.*?\\s)|(?:\\s+))href=(?P<url>\S*?)(?:(?:\\s.*>)|(?:>)).*?</a>'
   2 compile(r).findall(a)

这个是hoxide和天成讨论出来的方法,用来提取网页中的超链接.

解决在 Python 中登录网站的问题

刚刚看了xyb的代码，有点启发。 
写了一小段试了以下，可以登录了。呵呵。 
import httplib 
import urllib 
user=? 
pwd=? 
params=urllib.urlencode({"Loginname":user,"Loginpass":pwd,"firstlogin":1,"option":"登入论坛"}) 
headers={"Accept":"text/html","User-Agent":"IE","Content-Type":"application/x-www-form-urlencoded"} 
website="www.linuxforum.net" 
path="/forum/start_page.php" 
conn=httplib.HTTPConnection(website) 
conn.request("POST",path,params,headers) 
r=conn.getresponse() 
print r.status,r.reason 
data=r.read() 
print data 
conn.close() 

不知从form submit数据和直接提交request有些什么区别？

中国Linux论坛
由xyb总结:PythonClientCookie

浮点数的输出格式

>>> a=6200-6199.997841
>>> a
0.0021589999996649567
>>> print "%f"%a
0.002159
>>> import fpformat
>>> fpformat.fix(a, 6)
'0.002159'
>>> print fpformat.fix(a, 6)
0.002159
>>> print "%.6f"%a
0.002159
>>> print "%.7f"%a
0.0021590
>>> print "%.10f"%a
0.0021590000
>>> print "%.5f"%a
0.00216

怎么下载网络上的一张图片到本地

>知道了一张图片的URL >比如http://www.yahoo.com/images/logo.gif >想将它下载到本地保存应该怎么实现?

   1 urllib.urlretrieve(url, filename)

---Limodou

使用locale判断本地语言及编码

from::limodou的学习记录

在支持unicode软件中，经常需要在不同的编码与unicode之间进行转换。

那么对于一个本地文件的处理，首先要将其内容读出来转换成unicode编码，在软件中处理完毕后，再保存为原来的编码。

如果我们不知道文件的确切编码方式，可以使用默认的编码方式。那么我们可以使用locale模块来判断默认编码方式。

>>>import locale
>>>print locale.getdefaultlocale()
('zh_CN', 'cp936')

可以看出，我的机器上默认语言是简体中文，编码是GBK。

new的使用

from: 中国Linux论坛 -rings

new

new是python里object的方法。如果你要重载new，那么你需要继承object。 new是类方法。他不带self参数。 new和init是不一样的。init带 self参数。所以他是在对象已经被构造好了以后被调用的。而如果你要在对象构造的时候做一些事情，那么就需要使用new。new的返回值必须是对象的实例。 new一般在一些模式里非常有用。我们看一个例子。这个例子是《thinking in python》里的一个Singleton例子

class OnlyOne(object): 
    class __OnlyOne: 
        def __init__(self): 
            self.val = None 
        def __str__(self): 
            return ′self′ + self.val 
            
        instance = None 
        def __new__(cls): # __new__ always a classmethod 
            if not OnlyOne.instance: 
            OnlyOne.instance = OnlyOne.__OnlyOne() 
            return OnlyOne.instance 
        def __getattr__(self, name): 
            return getattr(self.instance, name) 
        def __setattr__(self, name): 
            return setattr(self.instance, name) 

x = OnlyOne() 
x.val = 'sausage' 
print x 
y = OnlyOne() 
y.val = 'eggs' 
print y 
z = OnlyOne() 
z.val = 'spam' 
print z 
print x 
print y

我们可以看到OnlyOne从object继承而来。

如果你不继承object，那么你的 new就不会在构造的时候来调用。

当x = OnlyOne()的时候，其实就是调用new(OnlyOne), 每次实例化OnlyOne 的时候都会调用。

因为他是类方法。

所以这段代码就是利用这个特性来实现Singleton的。

因为不管构造多少对象，都要调用new.

那么在OnlyOne里保持一个类的属性， instance.

他代表嵌套的_OnlyOne的实例。

所以，对于他，我们只构造一次。

以后每次构造的时候都是直接返回这个实例的。

所以，在这里， x,y,z 都是同一个实例。

这个方法和典型的用C++ 来实现 Singleton的道理是一样的。

traceback 的处理

from::Limodou的学习记录

trackback在 Python 中非常有用，它可以显示出现异常(Exception)时代码执行栈的情况。但当我们捕捉异常，一般是自已的出错处理，因此代码执行栈的信息就看不到了，如果还想显示的话，就要用到traceback模块了。

这里只是简单的对traceback模块的介绍，不是一个完整的说明，而且只是满足我个人的要求，更详细的还是要看文档。

打印完整的traceback

让我们先看一个traceback的显示：

>>> 1/0

Traceback (most recent call last):
  File "", line 1, in -toplevel-
    1/0
ZeroDivisionError: integer division or modulo by zero

可以看出 Python 缺省显示的traceback有一个头：第一行，出错详细位置：第二、三行，异常信息：第四行。也就是说分为三部分，而在traceback可以分别对这三部分进行处理。不过我更关心完整的显示。

在traceback中提供了print_exc([limit[, file]])函数可以打印出与上面一样的效果。 limit参数是限定代码执行栈的条数，file参数可以将traceback信息输出到文件对象中。缺省的话是输出到错误输出中。举例：

>>> try:
    1/0
except:
    traceback.print_exc()
 
Traceback (most recent call last):
  File "", line 2, in ?
ZeroDivisionError: integer division or modulo by zero

当出现异常sys.exc_info()函数会返回与异常相关的信息。如：

>>> try:
    1/0
except:
    sys.exc_info()

(<class exceptions.ZeroDivisionError at 0x00BF4CC0>, 
<exceptions.ZeroDivisionError instance at 0x00E29DC8>, 
<traceback object at 0x00E29DF0>)

sys.exc_info()返回一个tuple，异常类，异常实例，和traceback。

print_exc()是直接输出了，如果我们想得到它的内容，如何做？使用 format_exception(type, value, tb [,limit])，type, value, tb分别对应 sys.exc_info()对应的三个值。如：

>>> try:
    1/0
except:
    type, value, tb = sys.exc_info()
 print traceback.format_exception(type, value, tb)

['Traceback (most recent call last):\n', '  File "", line 2, in ?\n', 
'ZeroDivisionError: integer division or modulo by zero\n']

这样，我们知道了format_exception返回一个字符串列表，这样我们就可以将其应用到我们的程序中了。

os.walk()的用法, 修改cvsroot

重装系统, windows盘符大乱, 原来是'e:\cvsroot'现在变为'g:\cvsroot', 众多由cvs管理的目录无法正常工作了. python脚本出动:

   1 import os
   2 from os.path import join, getsize
   3 import sys
   4 
   5 print sys.argv[1]
   6 for root, dirs, files in os.walk(sys.argv[1]):
   7     if 'CVS' in dirs:
   8         fn = join(root+'\CVS', 'ROOT')
   9         print root+' :', fn
  10         #dirs.remove('CVS')  # don't visit CVS directories
  11         f = open(fn,'r')
  12         r = f.read()
  13         print r
  14         f.close()
  15         if r.startswith('e:\cvsroot'):
  16             open(fn, 'w').write('g:\cvsroot')
  17             f = open(fn,'r')
  18             r = f.read()
  19             print r
  20             f.close()

Python多进程处理之参考大全

* PyCourse --from: http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

多进程处理让很多开发人员转向了线程处理的操作，但是，在一些特别的情况下，我们必须考虑两个完全不同的系统间的进程通讯。方法很多，最简单的办法就是通过IPC进行操作了。这几天我一直在寻找这方面的资源，有所收获后，记载下来，收录归档罢。

http://remoted.sourceforge.net/

RemontD是一个python的进程间解决方案，它可以让你的多个进程共享一个dictionary来进行交互操作。

http://pyro.sourceforge.net

则是一个类RMI的多台服务器远程过程调用的解决方案，但是它支持在本机调用时使用IPC进行操作。

http://poshmodule.sourceforge.net/

POSH的作者显然是一个线程的憎恨者，他为PyCon DC 2003贡献了POSH，它使得python的进程间可以共享数据。

显然是两个python的IPC简单包装，你可能在不同的平台上要自己仔细试试了。在表面上，两者都说明自己支持Linux平台的。

http://www.onlamp.com/pub/a/php/2004/05/13/shared_memory.html

伟大的ONLAMP总是会给我惊喜，这是使用PHP来操作共享内存的一个非常好的文章，它甚至将IPC在unix下的使用也说的非常的明白了。

http://gigue.peabody.jhu.edu/~mdboom/omi/source/shm_source/shm.html

这是一个在两年前就看到的一个python操作共享内存的实现，如果必须从头做起，相信这个原始的代码，可以成为一个很好的参考。

将你的Python脚本转换为Windows exe程序

from:: http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

将Python的脚本变为一个可以执行的Windows exe程序可能的好处会说出很多，我最喜欢的则是它会让你写的程序更像是一个“程序”罢。但是，凡事有利就有弊，这样必然会让python的一些好处没有了。

你可以从这里找到py2exe的相关信息，可以在这里下载到py2exe-0.4.2.win32-py2.3.exe安装包。但是它的使用也还是比较麻烦的，需要你自己手工的写一个小的脚本，就像这样：

   1 # setup.py
   2 from distutils.core import setup
   3 import py2exe
   4 
   5 setup(name="myscript",
   6 scripts=["myscript.py"],
   7 )

再通过python的执行：

python setup.py py2exe

来使用。更多的信息上它的网站看罢。

使用 WinAPI 的例子

/PyWinApi -- 简单范例

在函数中确定其调用者！

Python哲学--内省的威力

AlbertLee
Xie Yanbo 引发
Remember, Python comes with batteries included!
PyBatteriesIncluded -- 使用内省的功能，获得丰富的信息

在正则表达式中嵌入注释时的陷阱

如下代码所示：

s = 'create table testtable'
>>> p =  r"""
^create\ table   # create table
\s*                 # whitespace
([a-zA-Z]*)      # table name
$                   # end
"""
>>> re.compile(p, re.VERBOSE).match(s).groups()
('testtable',)
>>>

如果在create和table之间没有那个转义的空格，即\ ,在re.VERBOSE 的时候，就会将那个空格忽略掉，因此变成是匹配createtable了，这样他就会匹配不到了

python写的数字转中文的程序

源于qq上Jaina(16009966)的提问. 花了一个晚上实现了一下, 基本想法是4位为一个断, 用conv4转换, 然后再用conv组合之. 程序在Windows2003, python2.4下调试通过. 注意编码问题.

   1 # coding:utf-8
   2 
   3 UUNIT=[u'', u'十' , u'百' , u'千']
   4 BUINT = [u'', u'万', u'亿', u'万亿' , u'亿亿']
   5 NUM=[u'零',u'一',u'二', u'三', u'四', u'五' , u'六', u'七', u'八', u'九'] 
   6 
   7 def conv4(num, flag=False):
   8    ret = u''
   9    s = str(num)
  10    l = len(s)
  11    assert(len(s) <= 4)
  12    if flag and len(s)<4:
  13       ret = ret + NUM[0]
  14    for i in xrange(l):
  15       if s[i] != '0':
  16          ret = ret + NUM[int(s[i])]+UUNIT[l-i-1]
  17       elif s[i-1] != '0':
  18          ret = ret + NUM[0]
  19    return ret
  20 
  21 def conv(num):
  22    ss = str(num)
  23    l = len(ss)
  24    j = l / 4
  25    jj = l % 4
  26    lss = [ss[0:jj] for i in [1] if ss[0:jj]] \
  27        + [ss[i*4+jj:(i+1)*4+jj] for i in xrange(j) if ss[i*4+jj:(i+1)*4+jj] ]
  28    print lss
  29    ul = len(lss)
  30    ret = u''
  31    zflag = False
  32    for i in xrange(ul):
  33       bu = BUINT[ul-i-1]
  34       tret = conv4(int(lss[i]), flag = i)
  35       if tret[-1:] == NUM[0]:
  36          tret = tret[:-1]
  37       if tret:
  38          print zflag , (tret+bu).encode('mbcs')
  39          if zflag and tret[0] != NUM[0] :
  40             ret = ret + NUM[0] +tret+bu
  41          else:
  42             ret = ret + tret+bu
  43          zflag = False
  44       else:
  45          zflag = True
  46    return ret
  47 
  48 if __name__ == '__main__':
  49    #print conv(11111)
  50    print conv(103056).encode('mbcs')
  51    print conv(101000).encode('mbcs')
  52    print conv(1200999100000000010).encode('mbcs')
  53