2.3. Step 1: Representing Records
If we're going to store records in a database, the first step is probably deciding what those records will look like. There are a variety of ways to represent information about people in the Python language. Built-in object types such as lists and dictionaries are often sufficient, especially if we don't care about processing the data we store.
2.3 第一步: 表现记录
当我们要在数据库中储存记录时,首先要做的就是要适当地定义那些记录该是什么样的. 在Python语言中,有很多方法表现人们的信息,很多内置的对象类型,例如链表,字典在大部分情况下已足够. 特别是你不关注数据存储的过程时.

2.3.1. Using Lists
Lists, for example, can collect attributes about people in a positionally ordered way. Start up your Python interactive interpreter and type the following two statements (this works in the IDLE GUI, after typing python at a shell prompt, and so on, and the >>> characters are Python's promptif you've never run Python code this way before, see an introductory resource such as O'Reilly's Learning Python for help with getting started):
2.3.1 使用链表
链表, 它可以以顺序方式存储人们的属性信息. 启动你的Python解释器, 键入下面的两个句子(在IDLE GUI,当你在shell中键入python时, 就会出现Python提示符:>>>, 如果从来没这样运行过Python代码, 可以先看一些如O'Reilly's Learning Python的书来帮助你起步):

>>> bob = ['Bob Smith', 42, 30000, 'software']
>>> sue = ['Sue Jones', 45, 40000, 'music']

We've just made two records, albeit simple ones, to represent two people, Bob and Sue (my apologies if you really are Bob or Sue, generically or otherwise). Each record is a list of four properties: name, age, pay, and job field. To access these fields, we simply index by position (the result is in parentheses here because it is a tuple of two results):
我们刚刚创建了两条记录.虽然简单.用来表现两个人.Bob 和 Sue (如果你的名字是Bob或Sue,或者类似 我为此表示歉意.).每条记录是一四个属性的链表.名字.年龄.薪水和工作领域.我们可以简单地通过位置索引来访问这些属性.(两个结果在括号内是因为这是两个结果的一个元组):

  • No, I'm serious. For an example I present in Python classes I teach, I had for many years regularly used the named "Bob Smith," age 40.5, and jobs "developer" and "manager" as a supposedly fictitious database recorduntil a recent class in Chicago, where I met a student name Bob Smith who was 40.5 and was a developer and manager. The world is stranger than it seems.

  • 不,我是认真的, 例如有一次我教的Python课上.我用"Bob Smith",年龄40.5,工作为开发者和管理者的人作为虚构的数据记录好多年了. 直到最近在芝加哥的一次课上. 我遇到一位学员名叫 Bob Smith. 40.5岁. 并且工作也是开发者和管理者.这个世界并不是你们所看到的那样.

>>> bob[0], sue[2]               # fetch name, pay
('Bob Smith', 40000)

Processing records is easy with this representation; we just use list operations. For example, we can extract a last name by splitting the name field on blanks and grabbing the last part, and we may give someone a raise by changing their list in-place:
在这种数据表现形式下处理记录相当容易.我们只需要利用一些链表的操作.例如.我们可以通过用空格分隔记录的姓名. 然后取最后一段来得到记录中人的名. 同时我们可以通过就地替换记录来为某人加工资:

>>> bob[0].split( )[-1]         # what's bob's last name?
'Smith'
>>> sue[2] *= 1.25             # give sue a 25% raise
>>> sue
['Sue Jones', 45, 50000.0, 'music']

The last-name expression here proceeds from left to right: we fetch Bob's name, split it into a list of substrings around spaces, and index his last name (run it one step at a time to see how).
这里处理名的表达式是从左到右.我们先取了Bob的姓名.以空格将其分隔成一个链表.然后通过名的位置索引出值(运行一次第一步看看它是如何执行的).

2.3.1.1. A database list
Of course, what we really have at this point is just two variables, not a database; to collect Bob and Sue into a unit, we might simply stuff them into another list:
当然,我们现在只有两个变量.并不是一个数据库.为了将Bob和Sue放入一个集合.我们仅仅将它们放了另一个链表.

>>> people = [bob, sue]
>>> for person in people:
        print person
['Bob Smith', 42, 30000, 'software']
['Sue Jones', 45, 50000.0, 'music']

Now, the people list represents our database. We can fetch specific records by their relative positions and process them one at a time, in loops:
现在,people链表用来表现我们的数据库. 我们可以通过记录的位置索引来取得记录.并依次在循环中处理.

>>> people[1][0]
'Sue Jones'
>>> for person in people:
        print person[0].split( )[-1]     # print last names
        person[2] *= 1.20               # give each a 20% raise                    
Smith
Jones
>>>for person in people: print person[2]       # check new pay
36000.0
60000.0

Now that we have a list, we can also collect values from records using some of Python's more powerful iteration tools, such as list comprehensions, maps, and generator expressions:
现在我们有一个链表.我们可以利用Python强大的迭代工具如包含链表.映射.表达式计算来读取记录中的值.

>>> pays = [person[2] for person in people]     # collect all pay
>>> pays
[36000.0, 60000.0]
>>> pays = map((lambda x: x[2]), people)       # ditto
>>> pays
[36000.0, 60000.0]
>>> sum(person[2] for person in people)       # generator expression sum (2.4)
96000.0

To add a record to the database, the usual list operations, such as append and extend, will suffice:
为了给链表中插入新的记录.链表的常用操作如appendextend可以满足.

>>> people.append(['Tom', 50, 0, None])
>>> len(people)
3
>>> people[-1][0]
'Tom'

Lists work for our people database, and they might be sufficient for some programs, but they suffer from a few major flaws. For one thing, Bob and Sue, at this point, are just fleeting objects in memory that will disappear once we exit Python. For another, every time we want to extract a last name or give a raise, we'll have to repeat the kinds of code we just typed; that could become a problem if we ever change the way those operations workwe may have to update many places in our code. We'll address these issues in a few moments.
链表可以用于的我们的people数据库.同时他们可能对于很多程序是满足需求的.但是同时链表也有一些缺点.如这一点.现在的Bob和sue仅仅存在于内存中.一旦我们退出Python.他们就会消失.还有.每次你需要提取名或给某人涨薪时. 我们必须一次一次地重复键入.这样就会造成一个问题.一旦你需要改变某些操作时.你可能需要更新很多处的代码.我们将在下面的部分说到这些问题.

2.3.1.2. Field labels
Perhaps more fundamentally, accessing fields by position in a list requires us to memorize what each position means: if you see a bit of code indexing a record on magic position 2, how can you tell it is extracting a pay? In terms of understanding the code, it might be better to associate a field name with a field value.
从根本说.用位置来定位字段城要我们记住每个位置的含义.如果你只看代码中从记录中提取位置2上的字段.你能知道那就是薪水吗?为了让代码容易理解. 最好将字段值与一个字段名关联起来

We might try to associate names with relative positions by using the Python range built-in function, which builds a list of successive integers:
我们尝试用Python内建函数range来关联字段名与字段位置. range函数可以用来生成一串连续的整数.

>>> NAME, AGE, PAY = range(3)         # [0, 1, 2]     
>>> bob = ['Bob Smith', 42, 10000]
>>> bob[NAME]
'Bob Smith'
>>> PAY, bob[PAY]
(2, 10000)

This addresses readability: the three variables essentially become field names. This makes our code dependent on the field position assignments, thoughwe have to remember to update the range assignments whenever we change record structure. Because they are not directly associated, the names and records may become out of sync over time and require a maintenance step.
这就是可读性:三个变量从最终变成字段名.这使得我们的代码依赖于字段位置的分配. 因为我们需要记得去更新位置分配当我们更新记录结构时.这是因为他们并不是直接关联. 字段名与值可能会失去同步.同时可能需要进行维护.

Moreover, because the field names are independent variables, there is no direct mapping from a record list back to its field's names. A raw record, for instance, provides no way to label its values with field names in a formatted display. In the preceding record, without additional code, there is no path from value 42 to label AGE.
而且,因为字段名是独立的变量.并不是直接与记录相映射.例如.一行记录是无法在格式化输出时用标签来标识其值的意义.在前面的记录中.如果不添加代码. 我们是没办法将值:42与标签:年龄对应起来.

We might also try this by using lists of tuples, where the tuples record both a field name and a value; better yet, a list of lists would allow for updates (tuples are immutable). Here's what that idea translates to, with slightly simpler records:
我们可以尝试去元组链表.每个元组是一个字段名和字段值.当然如果更好一点.一个包含链表的链表还可以进行更新(元组是不可修改的).下面就是这种想法的实现.仅仅是一些简单的记录.

>>> bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]]
>>> sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]]
>>> people = [bob, sue]  

This really doesn't fix the problem, though, because we still have to index by position in order to fetch fields:
这样的做法并没有真正解决问题.因为我们还是需要通过位置索引去读取字段.

>>> for person in people:
        print person[0][1], person[2][1]      # name, pay
Bob Smith 10000
Sue Jones 20000
>>> [person[0][1] for person in people]        # collect names
['Bob Smith', 'Sue Jones']
>>> for person in people:
        print person[0][1].split( )[-1]        # get last names
        person[2][1] *= 1.10                  # give a 10% raise
Smith
Jones
>>> for person in people: print person[2]
['pay', 11000.0]
['pay', 22000.0]

All we've really done here is add an extra level of positional indexing. To do better, we might inspect field names in loops to find the one we want (the loop uses tuple assignment here to unpack the name/value pairs):
现在我们所做的不过是增加一层额外的位置索引.做的更好一点.我们可以在循环中检查字段名来查找我们需要的字段.(这个循环利用元组的位置分配来提取出 名字/值 的组合)

>>>for person in people:
       for (name, value) in person:
           if name == 'name': print value         # find a specific field
Bob Smith
Sue Jones

Better yet, we can code a fetcher function to do the job for us:
做的更好一点. 我们可写一个取值函数来做上面的事情.

>>> def field(record, label):
         for (fname, fvalue) in record:  # find any field by name
             if fname == label:
                 return fvalue
                            
>>> field(bob, 'name')
'Bob Smith'
>>> field(sue, 'pay')
22000.0
>>> for rec in people:
        print field(rec, 'age')                       # print all ages
42
45

If we proceed down this path, we'll eventually wind up with a set of record interface functions that generically map field names to field data. If you've done any Python coding in the past, you probably already know that there is an easier way to code this sort of association, and you can probably guess where we're headed in the next section.
如果我们用这样的方法来做.我们可能会写出一系列记录的从字段名字取值的接口函数.但如果你曾经写过Python程序. 你可知道Python有更简单的处理这种关联的方式.同时你应该猜到我们接下要讲的内容.

2.3.2. Using Dictionaries
"" 2.3.2 使用字典 ""
The list-based record representations in the prior section work, though not without some cost in terms of performance required to search for field names (assuming you need to care about milliseconds and such). But if you already know some Python, you also know that there are more convenient ways to associate property names and values. The built-in dictionary object is a natural:
前面用到基于链表的记录表现方式.在搜索字段名字存在很多性能方面的消耗(设定你需要关注毫秒级别). 但是如果你了解点Python.你应该知道有更方便的方式来处理这种属性名与值关联的问题.内建的字典对象就是用来处理的这种问题的..

>>> bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
>>> sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'}

Now, Bob and Sue are objects that map field names to values automatically, and they make our code more understandable and meaningful. We don't have to remember what a numeric offset means, and we let Python search for the value associated with a field's name with its efficient dictionary indexing:
现在.Bob和Sue已经将其字段名和值自动关联起来了.这使我们的代码更有易懂且更接近应用.我们不用去记某个位置的意义.我们通过Python高效率的字典索引去查找与某个字段名关联的值.

>>> bob['name'], sue['pay']             # not bob[0], sue[2]
('Bob Smith', 40000)
>>> bob['name'].split( )[-1]
'Smith'
>>> sue['pay'] *= 1.10
>>> sue['pay']
44000.0

Because fields are accessed mnemonically now, they are more meaningful to those who read your code (including you). 因为现在我们过助记符来访问数据.这将使得别人读起你的代码来易懂一点(也包括你自己).

2.3.2.1. Other ways to make dictionaries
Dictionaries turn out to be so useful in Python programming that there are even more convenient ways to code them than the traditional literal syntax shown earliere.g., with keyword arguments and the type constructor:
字典在Python中是如此的有用,同时除了前面介绍的那种语法外还有几种构造的方式. 将索引词当作字典构造方法参数:

>>> bob = dict(name='Bob Smith', age=42, pay=30000, job='dev')
>>> bob
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}

Lists are convenient any time we need an ordered container of other objects that may need to change over time. A simple way to represent matrixes in Python, for instance, is as a list of nested liststhe top list is the matrix, and the nested lists are the rows: Now, to combine one matrix's components with another's, step over their indexes with nested loops; here's a simple pairwise multiplication: To build up a new matrix with the results, we just need to create the nested list structure along the way: Nested list comprehensions such as either of the following will do the same job, albeit at some cost in complexity (if you have to think hard about expressions like these, so will the next person who has to read your code!): List comprehensions are powerful tools, provided you restrict them to simple tasksfor example, listing selected module functions, or stripping end-of-lines: If you are interested in matrix processing, also see the mathematical and scientific extensions available for Python in the public domain, such as those available through NumPy and SciPy. The code here works, but extensions provide optimized tools. NumPy, for instance, is seen by some as an open source Matlab equivalent.

by filling out a dictionary one field at a time:

链表常被作为存放可以改变的不同种类对象的顺序容器.这里有个Python中简单在表现矩阵的方式

>>> M = [[1, 2, 3],
>>>     [4, 5, 6],
>>>     [7, 8, 9]]

>>> N = [[2, 2, 2],
>>>     [3, 3, 3],
>>>     [4, 4, 4]]
Now, to combine one matrix's components with another's, step over their indexes with nested loops; here's a simple pairwise multiplication:

>>> for i in range(3):
        for j in range(3):
            print M[i][j] * N[i][j],
        print
2 4 6
12 15 18
28 32 36


To build up a new matrix with the results, we just need to create the nested list structure along the way:

>>> tbl = []
>>> for i in range(3):
        row = []
        for j in range(3):
            row.append(M[i][j] * N[i][j])
        tbl.append(row)

>>> tbl
[[2, 4, 6], [12, 15, 18], [28, 32, 36]]


Nested list comprehensions such as either of the following will do the same job, albeit at some cost in complexity (if you have to think hard about expressions like these, so will the next person who has to read your code!):

[[M[i][j] * N[i][j] for j in range(3)] for i in range(3)]

[[x * y for x, y in zip(row1, row2)]
            for row1, row2 in zip(M, N)]


List comprehensions are powerful tools, provided you restrict them to simple tasksfor example, listing selected module functions, or stripping end-of-lines:

>>> import sys
>>> [x for x in dir(sys) if x.startswith('getr')]
['getrecursionlimit', 'getrefcount']

>>> lines = [line.rstrip( ) for line in open('README.txt')]
>>> lines[0]
'This is Python version 2.4 alpha 3'


If you are interested in matrix processing, also see the mathematical and scientific extensions available for Python in the public domain, such as those available through NumPy and SciPy. The code here works, but extensions provide optimized tools. NumPy, for instance, is seen by some as an open source Matlab equivalent.

by filling out a dictionary one field at a time:

>>> sue = {}
>>> sue['name'] = 'Sue Jones'
>>> sue['age']  = 45
>>> sue['pay']  = 40000
>>> sue['job']  = 'mus'
>>> sue

{'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}

and by zipping together name/value lists:

>>>> names  = ['name', 'age', 'pay', 'job']
>>> values = ['Sue Jones', 45, 40000, 'mus']
>>> zip(names, values)
[('name', 'Sue Jones'), ('age', 45), ('pay', 40000), ('job', 'mus')]
>>> sue = dict(zip(names, values))
>>> sue
{'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}

We can even make dictionaries today from a sequence of key values and an optional starting value for all the keys (handy to initialize an empty dictionary):

>>> fields = ('name', 'age', 'job', 'pay')
>>> record = dict.fromkeys(fields, '?')
>>> record
{'job': '?', 'pay': '?', 'age': '?', 'name': '?'}



2.3.2.2. Lists of dictionaries
Regardless of how we code them, we still need to collect our records into a database; a list does the trick again, as long as we don't require access by key:

>>> people = [bob, sue]
>>> for person in people:
        print person['name'], person['pay']          # all name, pay

Bob Smith 30000
Sue Jones 44000.0
>>> for person in people:
        if person['name'] == 'Sue Jones':            # fetch sue's pay
            print person['pay']

44000.0

Iteration tools work just as well here, but we use keys rather than obscure positions (in database terms, the list comprehension and map in the following code project the database on the "name" field column):

>>> names = [person['name'] for person in people]    # collect names
>>> names
['Bob Smith', 'Sue Jones']
>>> map((lambda x: x['name']), people)                # ditto
['Bob Smith', 'Sue Jones']
>>> sum(person['pay'] for person in people)           # sum all pay
74000.0

And because dictionaries are normal Python objects, these records can also be accessed and updated with normal Python syntax:

>>> for person in people:
        print person['name'].split( )[-1]             # last name
        person['pay'] *= 1.10                        # a 10% raise

Smith
Jones
>>> for person in people: print person['pay']
33000.0
48400.0



2.3.2.3. Nested structures
Incidentally, we could avoid the last-name extraction code in the prior examples by further structuring our records. Because all of Python's compound datatypes can be nested inside each other and as deeply as we like, we can build up fairly complex information structures easilysimply type the object's syntax, and Python does all the work of building the components, linking memory structures, and later reclaiming their space. This is one of the great advantages of a scripting language such as Python.

The following, for instance, represents a more structured record by nesting a dictionary, list, and tuple inside another dictionary:

>>> bob2 = {'name': {'first': 'Bob', 'last': 'Smith'},
            'age':  42,
            'job':  ['software', 'writing'],
            'pay':  (40000, 50000)}
             

Because this record contains nested structures, we simply index twice to go two levels deep:

>>> bob2['name']                            # bob's full name
{'last': 'Smith', 'first': 'Bob'}
>>> bob2['name']['last']                     # bob's last name
'Smith'
>>> bob2['pay'][1]                         # bob's upper pay
50000

The name field is another dictionary here, so instead of splitting up a string, we simply index to fetch the last name. Moreover, people can have many jobs, as well as minimum and maximum pay limits. In fact, Python becomes a sort of query language in such caseswe can fetch or change nested data with the usual object operations:

>>> for job in bob2['job']: print job      # all of bob's jobs
software
writing
>>> bob2['job'][-1                          # bob's last job
'writing'
>>>  bob2['job'].append('janitor')           # bob gets a new job
>>> bob2
{'job': ['software', 'writing', 'janitor'], 'pay': (40000, 50000), 'age': 42, 'name':
{'last': 'Smith', 'first': 'Bob'}}

It's OK to grow the nested list with append, because it is really an independent object. Such nesting can come in handy for more sophisticated applications; to keep ours simple, we'll stick to the original flat record structure.

2.3.2.4. Dictionaries of dictionaries
One last twist on our people database: we can get a little more mileage out of dictionaries here by using one to represent the database itself. That is, we can use a dictionary of dictionariesthe outer dictionary is the database, and the nested dictionaries are the records within it. Rather than a simple list of records, a dictionary-based database allows us to store and retrieve records by symbolic key:

>>> db = {}
>>> db['bob'] = bob
>>>  db['sue'] = sue
>>>
>>> db['bob']['name']                    # fetch bob's name
'Bob Smith'
>>> db['sue']['pay'] = 50000              # change sue's pay
>>> db['sue']['pay'                      # fetch sue's pay
50000

Notice how this structure allows us to access a record directly instead of searching for it in a loop (we get to Bob's name immediately by indexing on key bob). This really is a dictionary of dictionaries, though you won't see all the gory details unless you display the database all at once:

>>> db
{'bob': {'pay': 33000.0, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'},
 'sue': {'job': 'mus', 'pay': 50000, 'age': 45, 'name': 'Sue Jones'}}

If we still need to step through the database one record at a time, we can now rely on dictionary iterators. In recent Python releases, a dictionary iterator produces one key in a for loop each time through (in earlier releases, call the keys method explicitly in the for loop: say db.keys( ) rather than just db):

>>> for key in db:
        print key, '=>', db[key]['name']
bob => Bob Smith
sue => Sue Jones
>>> for key in db:
        print key, '=>', db[key]['pay']
bob => 33000.0
sue => 50000

To visit all records, either index by key as you go:

>>> for key in db:
        print db[key]['name'].split( )[-1]
        db[key]['pay'] *= 1.10
Smith
Jones

or step through the dictionary's values to access records directly:

>>> for record in db.values( ): print record['pay']
36300.0
55000.0
>>> x = [db[key]['name'] for key in db]
>>> x
['Bob Smith', 'Sue Jones']
>>> x = [rec['name'] for rec in db.values( )
>>> x
['Bob Smith', 'Sue Jones']

And to add a new record, simply assign it to a new key; this is just a dictionary, after all:

>>> db['tom'] = dict(name='Tom', age=50, job=None, pay=0)
>>>
>>> db['tom']
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
>>> db['tom']
'Tom'
>>> db.keys( )
['bob', 'sue', 'tom']
>>> len(db)
3

Although our database is still a transient object in memory, it turns out that this dictionary-of-dictionaries format corresponds exactly to a system that saves objects permanentlythe shelve (yes, this should be shelf grammatically speaking, but the Python module name and term is shelve). To learn how, let's move on to the next section.