深入Python 文本处理

可爱的Python 系列文章
的作者: David Mertz是非常有名的Python 专家!
是那种相信 esse est denunte 的 Foucauldian Berkeley。可以通过 [email protected] 与 David 联系;
他的个人 Web 页面 上介绍了他的生活。
由他撰写出版的::Text Processing in Python(原文)

等不及中文版了!自个儿来! -- Zoom.Quiet [2004-08-09 23:30:02]

1. 翻译流程

借鉴一下CorePy2的工作方式。

  1. 翻译,一定要和原文的格式一样。
  2. 译者自校
  3. 二校
  4. 三校(如果需要的话, 且人手足够)
  5. 删除英文
    • 提取只有中文的版本,保存名使用后缀 _zh_SmartAscii.txt

  6. 整理工作,用脚本将这些稿件输出成MoinMoin, rest,LaTeX等。

    • 原文自称"Smart ASCII"格式
    • 看完本书,理应轻松写脚本搞定。
    • 书中就提供了好几种解析方法
    • 转换成MoinMoin格式,保存名使用后缀 _zh_MoinMoin.txt

    • 转换成RST格式,保存名使用后缀 _zh_RST.txt

2. TPiP 翻译进度

名称

行数

翻译

进度

译者自校进度

二校

进度

三校

进度

备注

Intro

介绍

485

Chap1

Python基础

5722

DuYanJun

Chap2

基本字符串操作

4827

digitalwit

Chap3

正则表达式

2835

wenliang.lu

(./)

(./)

Chap4

解析器和状态机

4607

wenliang.lu

(./)

0%

Chap5

互联网工具和技术

3923

wenliang.lu

(./)

AppendixA

python精要

1959

HuangYi

(./)

AppendixB

数据压缩入门

681

wenliang.lu

(./)

AppendixC

理解Unicode

327

wenliang.lu

(./)

AppendixD

给文本增加标记的状态机

373

wenliang.lu

(./)

glossary

词汇表

197

AcKnowledgments

鸣谢

149

3. 资源/信息

  • TPiP中文翻译主页面就是本页。
  • 加入翻译,获取权限
    • woodpecker注册, 此处人员调配,任务认领。

    • 加入google code,此处源码仓库。

    • 加入google group,此乃讨论去处。和google code需要分别加入。

    • 请发信通报老大ZoomQuiet,告知你的啄木鸟IDgoogle ID,以获得一站式批准

    • 其他问题请找lwl

  • 所有讨论/问题,以及代码签入请在标题使用[TPiP]前缀,例如“[TPiP] [chap4] 50% 完成”。

  • 讨论请到openbookproject OBP 的OBP group 讨论组。

  • 翻译和校对请到revision认领。需要啄木鸟ID。

  • 翻译使用svn签入签出即可。

  • 校对请到这里提交issue。

    • 注意使用[TPiP]标签

    • 最好能抄送译者以及其他校对人员,经讨论后由译者更新,这样避免冲突。
  • 英汉术语对照表,查看技术翻译词典

3.0.1. Mertz 授权

作者授权:David Mertz已经回信同意OpenBookProject翻译发布这本书的中文版。原文经允许抄录如下:

2008/11/29 Wenliang Lu <[email protected]>:
> ---------- Forwarded message ----------
> From: David Mertz <[email protected]>
> Date: Mon, Dec 10, 2007 at 5:14 PM
> Subject: Re: TPiP chinese translation
> To: Wenliang Lu <[email protected]>
> Cc: "Dr. David Mertz" <[email protected]>
>
>
>> Hi David,
>> We have read your book of TPiP, which is an amazing book!
>> Just wondering can you grand us the permission to translate TPiP into
>> Chinese.
>> We will distribute at http://code.google.com/p/openbookproject/ for
>> NON-commercial purpose,
>> as this project aims to translate good python books into Chinese to
>> promote Python in China.
>
> I would be very happy to have this translation made and published at the
> Open Book Project.  I am not of any mind to impose any republication
> permissions on anything I write (my articles are released to the public
> domain explicitly, for example).
>
> Per copyright law, it's not entirely simple to authorize, since the
> publisher maintains rights on the book. However, I am sufficiently
> comfortable that the agreement I have with the publisher to allow gratis
> online publication via my own site can reasonably be taken to cover
> publication of a translation as well.
>
> Moreover, at a practical level, while the book has sold moderately well, it
> is hardly the sort of large seller that the publisher will worry about
> translating.  So there should not be any concrete issue with the translation
> and publication you suggest.  Please proceed... let me know the progress as
> things advance, just for my own gratification.
>
> Yours, David...
>
> -----------------------------------------------------------------------
> mertz@ | The specter of free information is haunting the `Net!  All the
> gnosis | powers of IP- and crypto-tyranny have entered into an unholy
> .cx    | alliance...ideas have nothing to lose but their chains.  Unite
>       | against "intellectual property" and anti-privacy regimes!

4. TPiP

4.1. acknowledgments

/AcKnowledgments FOLKS WHO HAVE MADE THIS BOOK BETTER

4.2. intro

/Intro INTRODUCTION

4.3. chap1

/Chap1 PYTHON BASICS

4.4. chap2

/Chap2 BASIC STRING OPERATIONS

4.5. chap3

/Chap3 REGULAR EXPRESSIONS

4.6. chap4

/Chap4 PARSERS AND STATE-MACHINES

4.7. chap5

/Chap5 INTERNET TOOLS AND TECHNIQUES

4.8. appendix_a

/AppendixA A SELECTIVE AND IMPRESSIONISTIC SHORT REVIEW OF PYTHON

4.9. appendix_b

/AppendixB A DATA COMPRESSION PRIMER

4.10. appendix_c

/AppendixC UNDERSTANDING UNICODE

4.11. appendix_d

/AppendixD A STATE-MACHINE FOR ADDING MARKUP TO TEXT

4.12. glossary

/glossary GLOSSARY TERMS

4.13. 反馈

  • 如何得到TPiP原文?
    • 呀呀呀??不好意思,直接到 David Mertz 的主页去拿是也乎 -- Zoom.Quiet
  • David Mertz的声明 (http://gnosis.cx/TPiP/):

   This stuff is copyrighted by AW (except the code samples which are released to the public domain). Feel free to use this material personally; but no permission is given for further distribution beyond your personal use.
  • 是不是不能拿来翻译发布?
  • 我们是自用哪??没有进行出版发布的,保留他的版权声明,作我们自个儿的事儿就好…… ZoomQuiet

  • well, 下载了, 第四章很有意思阿, 状态机~~~, 附录d也很不错. 计划阅读之 --- hoxide
  • 因为有书出来,所以感觉还是有点担心版权问题. -- andelf