1. Converting Between Different Naming Conventions 在不同的命名约定之间转换

Credit: Sami Hangaslammi

1.1. 问题 Problem

You have a body of code whose identifiers use one of the common naming conventions to represent multiple words in a single identifier (CapitalizedWords, mixedCase, or under_scores), and you need to convert the code to another naming convention in order to merge it smoothly with other code.

你有一段代码,其中使用某种常见的命名约定表示一个多词标识符(首符大写形式CapitalizedWords，大小写混合形式 mixedCase 或下划线连接形式 under_scores) ，为了能平滑地与其他的代码合并 , 你需要将代码转换成另外的命名约定。

1.2. 解决 Solution

re.sub covers the two hard cases, converting underscore to and from the others:

re.sub 包含两种很难(理解)的情形, 将'下划线连接形式'(underscore)转换成其它形式和从其它形式转换成'下划线连接形式'(underscore):

   1 import re
   2 
   3 def cw2us(x): # 首符大写形式 to 下划线连接形式
   4     return re.sub(r'(?<=[a-z])[A-Z]|(?<!^)[A-Z](?=[a-z])',
   5         r"_\g<0>", x).lower(  )
   6 
   7 def us2mc(x): # 下划线连接形式 to 大小写混合形式
   8     return re.sub(r'_([a-z])', lambda m: (m.group(1).upper(  )), x)

Mixed-case to underscore is just like capwords to underscore (the case-lowering of the first character becomes redundant, but it does no harm):

'大小写混合形式'到'下划线连接形式'的转换,正类似于'首符大写形式'到'下划线连接形式':(变第一个字符为小写成为多余，但是它没有害处)

   1 def mc2us(x): # mixed-case to underscore notation
   2     return cw2us(x)

Underscore to capwords can similarly exploit the underscore to mixed-case conversion, but it needs an extra twist to uppercase the start:

'下划线连接形式'到'首符大写形式' 能同样地使用对'下划线连接形式'到'大小写混合形式'的转换,但是它需要额外的把开头变为大写字母:

   1 def us2cw(x): # underscore to capwords notation
   2     s = us2mc(x)
   3     return s[0].upper(  )+s[1:]

Conversion between mixed-case and capwords is, of course, just an issue of lowercasing or uppercasing the first character, as appropriate:

在'大小写混合形式'和'首符大写形式'之间转换, 当然只是适当的用小写字母或大写字母写第一个字符的问题:

   1 def mc2cw(x): # mixed-case to capwords
   2     return s[0].lower(  )+s[1:]
   3 
   4 def cw2mc(x): # capwords to mixed-case
   5     return s[0].upper(  )+s[1:]

1.3. 讨论 Discussion

Here are some usage examples:

一些用法例子在这里:

>>> cw2us("PrintHTML")
'print_html'
>>> cw2us("IOError")
'io_error'
>>> cw2us("SetXYPosition")
'set_xy_position'
>>> cw2us("GetX")
'get_x'

The set of functions in this recipe is useful, and very practical, if you need to homogenize naming styles in a bunch of code, but the approach may be a bit obscure.In the interest of clarity, you might want to adopt a conceptual stance that is general and fruitful.In other words, to convert a bunch of formats into each other, find a neutral format and write conversions from each of the N formats into the neutral one and back again.This means having 2N conversion functions rather than N x (N-1)梐 big win for large N梑ut the point here (in which N is only three) is really one of clarity.

如果你需要一致化一组代码的命名风格,这一份配方中的函数是有用且非常实际的.但是方式可能有一点晦涩。(如果有)追求清楚的兴趣，你可能想要采用在概念上(更加)一般的和有成效的方式。换句话说,为了在一组格式间彼此转换,找一个中立的格式,并且写出从N个格式中的每一个到中立者以及相反的转换。这意谓着有2N个转换函数，而不是 N*(N-1)个,这与这里相比(只有三种格式),在N很大时的优势确实很明显。

Clearly, the underlying neutral format that each identifier style is encoding is a list of words.Let's say, for definiteness and without loss of generality, that they are lowercase words:

显然，隐藏在每种标识符风格下面的中立格式就是一个单词的列表。让我们说, 为了明确且不失一般性,他们都是小写字母组成的词:

   1 
   2 import string, re
   3 def anytolw(x):  # any format of identifier to list of lowercased words
   4 
   5     # First, see if there are underscores:
   6     lw = string.split(x,'_')
   7     if len(lw)>1: return map(string.lower, lw)
   8 
   9     # No. Then uppercase letters are the splitters:
  10     pieces = re.split('([A-Z])', x)
  11 
  12     # Ensure first word follows the same rules as the others:
  13     if pieces[0]: pieces = [''] + pieces
  14     else: pieces = pieces[1:]
  15 
  16     # Join two by two, lowercasing the splitters as you go
  17     return [pieces[i].lower(  )+pieces[i+1] for i in range(0,len(pieces),2)]

There's no need to specify the format, since it's self-describing.Conversely, when translating from our internal form to an output format, we do need to specify the format we want, but on the other hand, the functions are very simple:

没有需要去指明(参数的)格式, 因为它是自我描述的。相反地，当从我们的内在形式翻译到一个输出格式的时候，我们确实需要叙述我们想要的格式，但是另一方面，这些函数却是在非常简单的:

   1 def lwtous(x): return '_'.join(x)
   2 def lwtocw(x): return ''.join(map(string.capitalize,x))
   3 def lwtomc(x): return x[0]+''.join(map(string.capitalize,x[1:]))

Any other combination is a simple issue of functional composition:

任何其他的组合是简单的把(上面的)函数合成起来的结果:

   1 def anytous(x): return lwtous(anytolw(x))
   2 cwtous = mctous = anytous
   3 def anytocw(x): return lwtocw(anytolw(x))
   4 ustocw = mctocw = anytocw
   5 def anytomc(x): return lwtomc(anytolw(x))
   6 cwtomc = ustomc = anytomc

The specialized approach is slimmer and faster, but this generalized stance may ease understanding as well as offering wider application.

这种特殊的方式更加简短且快速，而且这种一般化的态度易于理解且能推广到很多应用上。

1.4. 参考 See Also

The Library Reference sections on the re and string modules.