1. 简介

  • 该手工打造的工具包中囊括了对 Berkeley DB(一种开源的内嵌式数据库系统)的Python 封装。Berkeley DB 编程工具包为桌面和服务器应用程序提供了高性能的内建数据库支持。Berkeley DB 的访问方法包括了 B+数、扩展线性散列、等长和可变长度记录以及队列。Berkeley DB t提供了完整的事务支持、数据恢复、在线备份、多线程和多进程访问等。该Python封装允许你以字符串或整数作为键(取决于数据库访问方法),存储任何长度的Python字符串对象。结合该工具包中的其他一些模块,标准的类shelve功能可使你能够储存任何可进行pickle操作的Python对象。

2. Berkeley DB 4.x Python 扩展包

2.1. 简介

  • 这是一份 bsddb3.db Python 扩展模块的简要文档,该模块封装了 Berkeley DB 4.x 的 C 类库。该扩展模块与数个纯 Python 模块一起被放置在 Python 开发工具包中。根据预计,不同的程序员在不同的情况下使该模块的主要方式有下面几种。该模块的设计目标是在简单的情况下所有方法不会让事情太复杂,同时不遗漏复杂情况下所需的功能。
  • 向前兼容: 对该工具包的期望是它能够成为与 Python 一同发行的 bsddb 模块之(a near drop-in)替代品,bsddb 模块被设计用于封装 DB 1.85 或者与 DB 1.85兼容的接口。这意味着必须提供相同的对象创建函数(btopen()、 hashopen() 和 rnopen()),同时返回的对象必须有相同或者至少相似的方法(特别是要提供 first()、 last()、 next() 和 prev() 以使用户无须直接使用游标 cursor)。所有的这些在 bsddb3.init.py 模块中都得以实现。

  • 简单的持久化字典: 在上述功能之上再迈进一小步。程序员可以知道并且直接使用新的 DB 对象类型,但只在单进程或者线程中才需要这样做。程序员不应被 DBEnv 的使用所困惑,同时 DB 对象必须尽可能像字典(dictionary)那样工作。

  • 并发访问字典: 这指的是具备对某一 DB 可同时存在一个写入者和多个读取者(多线程或者多进程)的能力,并且可以简单地通过用特定标志创建 DBEnv 来实现。在 bsddb3 中无须做任何额外工作即可实现这种访问模式。

  • 高级事务处理数据储存: 该使用模式将把 Berkeley DB 类库整个能力发挥出来。程序员也许不会像使用DB对象的常规方法那样频繁地使用字典访问方法,因此他可以将事务处理对象传入方法中。同样,多数这些高级功能只需用特定标志打开 DBEnv 即可激活,同时通过使用事务处理并掌握、处置死锁例外等等。

2.2. 所提供的类型

  • bsddb3.db 扩展模块提供了以下对象类型:
  • DB: 基本数据库对象,具备 Hash、 BTree、 Recno 和 Queue 访问方法的能力。 * DBEnv: 为更高级的使用提供了一个数据库环境。使用事务处理、日志、并发访问等等将需要一个环境对象。 * DBCursor: 用于遍历数据库的类指针对象。 * DBTxn: 数据库事务处理。提供多文件提交、终止和数据库编辑检查点等功能。 * DBLock: 锁(lock)的非透明句柄。参阅 DBEnv.lock_get() 和 DBEnv.lock_put(). 锁对数据库中的任何事物都不是必须的,但可用于打开 DBEnv 的线程和进程所进行的任何同步任务。 * DBSequence: 序列(Sequences)提供了任何数量的持久对象,以返回递增或者递减的整数序列。打开一个序列句柄,并将之与数据库中的一条记录相关联。

2.3. 提供的例外

  • Berkeley DB C API 使用函数返回代码来标识不同错误。 bsddb3.db 模块检查这些错误代码,并将它们转换成 Python 例外,使你能够用熟悉的 try:... except:... 结构,同时不必经受对每个方法检查返回值的烦恼。所有的错误代码被转换成代码对应的例外,如下面表格所列。如果你在使用 C API 文档,那么可以很轻松地将其中某个错误返回代码与抛出的 Python 例外名称对应起来。只需参照下面的表格。

    所有的例外都从 DBError 例外类继承而来,因此如果只想捕捉一般错误,你可以使用 DBError 来实现。由于当给出的键无法在数据库中找到时将会引发 DBNotFoundError ,DBNotFoundError 也从标准的 KeyError 例外继承而来,以使 DB 的外观和行为都像一个字典。当任何一个例外被引发时,相关的值都是一个元组,它包含一个代表错误代码的整数以及指向错误信息自身的字符串。

DBError

基类,所有的其他例外由此继承而来

DBIncompleteError

DB_INCOMPLETE

DBKeyEmptyError

DB_KEYEMPTY

DBKeyExistError

DB_KEYEXIST

DBLockDeadlockError

DB_LOCK_DEADLOCK

DBLockNotGrantedError

DB_LOCK_NOTGRANTED

DBNotFoundError

DB_NOTFOUND (also derives from KeyError)

DBOldVersionError

DB_OLD_VERSION

DBRunRecoveryError

DB_RUNRECOVERY

DBVerifyBadError

DB_VERIFY_BAD

DBNoServerError

DB_NOSERVER

DBNoServerHomeError

DB_NOSERVER_HOME

DBNoServerIDError

DB_NOSERVER_ID

DBInvalidArgError

EINVAL

DBAccessError

EACCES

DBNoSpaceError

ENOSPC

DBNoMemoryError

ENOMEM

DBAgainError

EAGAIN

DBBusyError

EBUSY

DBFileExistsError

EEXIST

DBNoSuchFileError

ENOENT

DBPermissionsError

EPERM

2.4. 工具包中其他模块

  • dbshelve.py: 这是为使用 bsddb3 储存对象而对标准Python Shelve 概念的实现,同时也拓展了一些更加高级的方法和构筑DB底层的能力。 * dbtables.py: 这是由 Gregory Smith 编写的模块,它在DB基础上实现了一个简化的表结构。 * dbutils.py: 对DB使用非常有用的Python代码的大包围。

  • dbobj.py: 包括了 DB 和 DBEnv 可子类化的版本。 * dbrecio.py: 包括了 DBRecIO 类,它可用于使用类文件接口对某条DB记录进行部分读写。由 Itamar Shtull-Trauring 提供。

2.5. 测试

  • 一个完整的单元测试包,被开发用于体验不同的对象类型,它们的方法及不同使用模式在简介部分已有介绍。使用了PyUnit,测试的结构方式使得它们可以在无人值守的情况下自动运行。目前大概有300个测试案例!(2008年3月)

2.6. 参考

  • 参阅Oracle站点的 C 语言 API online documentation 了解这些方法之功能的更多细节。Python 模块的名字应该和C API 中的相同或者相似。注意: 下面列出的所有方法中具有一个以上关键字参数的方法严格地使用关键字解析实现,因此你可以按需要通过关键字来提供可选的参数。那些只有一个可选参数的方法没有使用关键字解析来实现,以使实现方案更加简洁。如果对此感到困惑,请告诉我,我会考虑对所有方法使用关键字参数。

3. DBEnv

3.1. DBEnv 属性

DBEnv(flags=0)
数据库主目录 (只读)

3.2. DBEnv 方法

DBEnv(flags=0)

构造器。More info...

set_rpc_server(host, cl_timeout=0, sv_timeout=0)

为该 dbenv 建立一个到 RPC 服务器的连接。More info...

close(flags=0)

关闭数据库环境,释放资源。More info...

open(homedir, flags=0, mode=0660)

为使用而准备数据库环境。More info...

remove(homedir, flags=0)

删除一个数据库环境。More info...

dbremove(file, database=None, txn=None, flags=0)

删除 file 和 database 参数所指定的数据库。如果未指定 database ,由 file 所指向的底层文件将被删除,即删除了它所包含的所有数据库。More info...

dbrename(file, database=None, newname, txn=None, flags=0)

将 file 和 database 参数指向的数据库更名为 newname。如果未指定 database ,由 file 所指向的底层文件将被更名,即对它所包含的所有数据库进行了更名。More info...

set_encrypt(passwd, flags=0)

设置 Berkeley DB library 所使用的口令来执行加密和解密操作。More info...

set_timeout(timeout, flags)

设置数据库环境中锁定和事务操作的 timeout 值。More info...

set_shm_key(key)

为 Berkeley DB 环境共享内存区域指定一个基本的段ID(对VxWorks 或支持 X/Open-style 共享内存接口的系统:如支持 shmget(2)的 UNIX 系统和相关的 System V IPC 接口)More info...

set_cachesize(gbytes, bytes, ncache=0)

设置共享内存缓冲池的大小。More info...

set_data_dir(dir)

设置环境数据目录。More info...

set_flags(flags, onoff)

为 DBEnv 设置额外的标志。onoff 参数指定该标志是被设置还是被清除。More info...

set_tmp_dir(dir)

设置用于临时文件的目录。More info...

set_get_returns_none(flag)
缺省情况下,当 DB.get 或 DBCursor.get、 get_both、 first、 last、 next 或 prev 遇到 DB_NOTFOUND 错误时,它们返回 None 而不是引发 DBNotFoundError 。该行为对 Python 字典进行枚举而更便于循环。 . 你可使用该方法切换前述所有方法的行为方式,或将它扩展应用到 DBCursor.set、 set_both、 set_range 和 set_recno 方法。所支持的 flag 值:
  • 0 所有 DB 和 DBCursor 的 get 和 set 方法将会引发 DBNotFoundError 而不是返回 None 。 * 1 4.2.4 版本之前模块的缺省值 DB.get 和 DBCursor.get、 get_both、 first、 last、 next 和 prev 方法返回 None。 * 2 4.2.4 版本之后模块的缺省值1 的行为扩展到 DBCursor set、 set_both、 set_range 和 set_recno 方法。 缺省返回 None 使我们可以像下面这样不捕捉 DBNotFoundError (KeyError) 就能完成一些工作。

  • #!python
    
    data = mydb.get(key)
    
    if data:
    
        doSomething(data)
    
    
    • 或者这样

    #!python
    
    rec = cursor.first()
    
    while rec:
    
        print rec
    
        rec = cursor.next()
    
    
    • 为了完成下列工作让 cursor set 方法返回 None 是非常有益的:

    #!python
    
    rec = mydb.set()
    
    while rec:
    
        key, val = rec
    
        doSomething(key, val)
    
        rec = mydb.next()
    
    
    . The downside to this it that it is inconsistent with the rest of the package and noticeably diverges from the Oracle Berkeley DB API. 如果你更愿意让get 和 set 方法在 key 无法找到的情况下引发一个例外,使用该方法告知它们。对一个 DBEnv 对象调用该方法,将会为在该环境下创建的所有 DB 设置该缺省值。对 DB 对象调用该方法仅设定该 DB 的行为。将会返回之前的设置值。 set_private(object)

    将任意对象链接到 DBEnv 。 get_private():: 给出链接到 DBEnv 的对象。 set_lg_bsize(size):: 以字节为单位设定内存中的日志缓冲大小。More info...

    set_lg_dir(dir)

    某目录的路径被设定为日志文件的位置。由日志管理子系统创建的日志文件将会在该目录中创建。More info...

    set_lg_max(size)

    以字节为单位设定日志中单个文件的最大尺寸。More info...

    get_lg_max(size)

    返回日志文件尺寸的最大值。More info...

    set_lg_regionmax(size)

    以字节为单位设定日志中单个区域的最大尺寸。More info...

    set_lk_detect(mode)

    设定自动死锁检测模式。More info...

    set_lk_max(max)

    设定锁的最大数目。(This method is deprecated.) More info...

    set_lk_max_locks(max)

    设定 Berkeley DB 锁子系统支持的最大锁数目。More info...

    set_lk_max_lockers(max)

    设置 Berkeley DB 锁子系统支持的并发锁实体最大数目。More info...

    set_lk_max_objects(max)

    设定 Berkeley DB 锁子系统支持的并发锁定对象最大数目。More info...

    set_mp_mmapsize(size)

    在内存池中以只读方式打开的文件(满足其他几个验证)缺省情况下被映射到进程地址空间而不是被拷贝到本地缓冲。这样做的结果是比通常情况下表现出更好的性能,因为一般来说可用的虚拟内存往往比本地缓冲更大,而且在很多系统中页面出错处理比页面拷贝速度更快。然而,在有限虚拟内存的情况下这将导致资源匮乏,而在大型数据库的情况下,这将使进程尺寸变得超大。 该方法以字节为单位设定将被映射到进程地址空间中的文件的最大尺寸。如果为指定数值,缺省为 10MB 。More info...

    log_archive(flags=0)

    返回一个日志或者数据库文件名称的列表。By default, log_archive returns the names of all of the log files that are no longer in use (e.g., no longer involved in active transactions), and that may safely be archived for catastrophic recovery and then removed from the system. More info...

    log_flush()

    Force log records to disk. Useful if the environment, database or transactions are used as ACI, instead of ACID. For example, if the environment is opened as DB_TXN_NOSYNC. More info...

    log_set_config(flags, onoff)

    Configures the Berkeley DB logging subsystem. More info...

    lock_detect(atype, flags=0)

    Run one iteration of the deadlock detector, returns the number of transactions aborted. More info...

    lock_get(locker, obj, lock_mode, flags=0)

    Acquires a lock and returns a handle to it as a DBLock object. The locker parameter is an integer representing the entity doing the locking, and obj is an object representing the item to be locked. More info...

    lock_id()

    Acquires a locker id, guaranteed to be unique across all threads and processes that have the DBEnv open. More info...

    lock_id_free(id)

    Frees a locker ID allocated by the “dbenv.lock_id()” method. More info...

    lock_put(lock)

    Release the lock. More info...

    lock_stat(flags=0)
    Returns a dictionary of locking subsystem statistics with the following keys:

    id

    Last allocated lock ID.

    cur_maxid

    The current maximum unused locker ID.

    nmodes

    Number of lock modes.

    maxlocks

    Maximum number of locks possible.

    maxlockers

    Maximum number of lockers possible.

    maxobjects

    Maximum number of objects possible.

    nlocks

    Number of current locks.

    maxnlocks

    Maximum number of locks at once.

    nlockers

    Number of current lockers.

    nobjects

    Number of current lock objects.

    maxnobjects

    Maximum number of lock objects at once.

    maxnlockers

    Maximum number of lockers at once.

    nrequests

    Total number of locks requested.

    nreleases

    Total number of locks released.

    nupgrade

    Total number of locks upgraded.

    ndowngrade

    Total number of locks downgraded.

    lock_wait

    The number of lock requests not immediately available due to conflicts, for which the thread of control waited.

    lock_nowait

    The number of lock requests not immediately available due to conflicts, for which the thread of control did not wait.

    ndeadlocks

    Number of deadlocks.

    locktimeout

    Lock timeout value.

    nlocktimeouts

    The number of lock requests that have timed out.

    txntimeout

    Transaction timeout value.

    ntxntimeouts

    The number of transactions that have timed out. This value is also a component of ndeadlocks, the total number of deadlocks detected.

    objs_wait

    The number of requests to allocate or deallocate an object for which the thread of control waited.

    objs_nowait

    The number of requests to allocate or deallocate an object for which the thread of control did not wait.

    lockers_wait

    The number of requests to allocate or deallocate a locker for which the thread of control waited.

    lockers_nowait

    The number of requests to allocate or deallocate a locker for which the thread of control did not wait.

    locks_wait

    The number of requests to allocate or deallocate a lock structure for which the thread of control waited.

    locks_nowait

    The number of requests to allocate or deallocate a lock structure for which the thread of control did not wait.

    hash_len

    Maximum length of a lock hash bucket.

    regsize

    Size of the region.

    region_wait

    Number of times a thread of control was forced to wait before obtaining the region lock.

    region_nowait

    Number of times a thread of control was able to obtain the region lock without waiting.

    • More info...

      set_tx_max(max)

      Set the maximum number of active transactions. More info...

      set_tx_timestamp(timestamp)

      Recover to the time specified by timestamp rather than to the most current possible date. More info...

      txn_begin(parent=None, flags=0)

      Creates and begins a new transaction. A DBTxn object is returned. More info...

      txn_checkpoint(kbyte=0, min=0, flag=0)

      Flushes the underlying memory pool, writes a checkpoint record to the log and then flushes the log. More info...

      txn_stat()
      Return a dictionary of transaction statistics with the following keys:

      More info...

      lsn_reset(file=None, flags=0)

      This method allows database files to be moved from one transactional database environment to another. More info...

      log_stat(flags=0)
      Returns a dictionary of logging subsystem statistics with the following keys:

    magic

    The magic number that identifies a file as a log file.

    version

    The version of the log file type.

    ||mode|| The mode of any created log files.

    lg_bsize

    The in-memory log record cache size.

    lg_size

    The log file size.

    record

    The number of records written to this log.

    w_mbytes

    The number of megabytes written to this log.

    w_bytes

    The number of bytes over and above w_mbytes written to this log.

    wc_mbytes

    The number of megabytes written to this log since the last checkpoint.

    wc_bytes

    The number of bytes over and above wc_mbytes written to this log since the last checkpoint.

    wcount

    The number of times the log has been written to disk.

    wcount_fill

    The number of times the log has been written to disk because the in-memory log record cache filled up.

    rcount

    The number of times the log has been read from disk.

    scount

    The number of times the log has been flushed to disk.

    cur_file

    The current log file number.

    cur_offset

    The byte offset in the current log file.

    disk_file

    The log file number of the last record known to be on disk.

    disk_offset

    The byte offset of the last record known to be on disk.

    maxcommitperflush

    The maximum number of commits contained in a single log flush.

    mincommitperflush

    The minimum number of commits contained in a single log flush that contained a commit.

    regsize

    The size of the log region, in bytes.

    region_wait

    The number of times that a thread of control was forced to wait before obtaining the log region mutex.

    region_nowait The number of times that a thread of control was able to obtain the log region mutex without waiting.

    • More info...

      txn_recover()
      Returns a list of tuples (GID, TXN) of transactions prepared but still unresolved. This is used while doing environment recovery in an application using distributed transactions.

      This method must be called only from a single thread at a time. It should be called after DBEnv recovery. More info...

      set_verbose(which, onoff)

      Turns specific additional informational and debugging messages in the Berkeley DB message output on and off. To see the additional messages, verbose messages must also be configured for the application. More info...

      get_verbose(which)

      Returns whether the specified which parameter is currently set or not. More info...

      set_event_notify(eventFunc)

      Configures a callback function which is called to notify the process of specific Berkeley DB events. More info...

    3.3. DBEnv Replication Manager Methods

    • This module automates many of the tasks needed to provide replication abilities in a Berkeley DB system. The module is fairly limited, but enough in many cases. Users more demanding must use the full Base Replication API. This module requires POSIX support, so you must compile Berkeley DB with it if you want to be able to use the Replication Manager.

      repmgr_start(nthreads, flags)

      Starts the replication manager. More info...

      repmgr_set_local_site(host, port, flags=0)

      Specifies the host identification string and port number for the local system. More info...

      repmgr_add_remote_site(host, port, flags=0)
      Adds a new replication site to the replication manager’s list of known sites. It is not necessary for all sites in a replication group to know about all other sites in the group.

      Method returns the environment ID assigned to the remote site. More info...

      repmgr_set_ack_policy(ack_policy)

      Specifies how master and client sites will handle acknowledgment of replication messages which are necessary for “permanent” records. More info...

      repmgr_get_ack_policy()

      Returns the replication manager’s client acknowledgment policy. More info...

      repmgr_site_list()
      Returns a dictionary with the status of the sites currently known by the replication manager.
      The keys are the Environment ID assigned by the replication manager. This is the same value that is passed to the application’s event notification function for the DB_EVENT_REP_NEWMASTER event. The values are tuples containing the hostname, the TCP/IP port number and the link status.

      More info...

      repmgr_stat(flags=0)
      Returns a dictionary with the replication manager statistics. Keys are:

    perm_failed

    The number of times a message critical for maintaining database integrity (for example, a transaction commit), originating at this site, did not receive sufficient acknowledgement from clients, according to the configured acknowledgement policy and acknowledgement timeout.

    msgs_queued

    The number of outgoing messages which could not be transmitted immediately, due to a full network buffer, and had to be queued for later delivery.

    msgs_dropped

    The number of outgoing messages that were completely dropped, because the outgoing message queue was full. (Berkeley DB replication is tolerant of dropped messages, and will automatically request retransmission of any missing messages as needed.)

    connection_drop

    The number of times an existing TCP/IP connection failed.

    connect_fail

    The number of times an attempt to open a new TCP/IP connection failed.

    3.4. DBEnv Replication Methods

    rep_elect(nsites, nvotes)

    Holds an election for the master of a replication group. More info...

    rep_set_transport(envid, transportFunc)

    Initializes the communication infrastructure for a database environment participating in a replicated application. More info...

    rep_process_messsage(control, rec, envid)
    Processes an incoming replication message sent by a member of the replication group to the local database environment.
  • Returns a two element tuple.

    More info...

    rep_start(flags, cdata=None)
    Configures the database environment as a client or master in a group of replicated database environments.

    The DB_ENV->rep_start method is not called by most replication applications. It should only be called by applications implementing their own network transport layer, explicitly holding replication group elections and handling replication messages outside of the replication manager framework.

    More info...

    rep_sync()

    Forces master synchronization to begin for this client. This method is the other half of setting the DB_REP_CONF_DELAYCLIENT flag via the DB_ENV->rep_set_config method. More info...

    rep_set_config(which, onoff)

    Configures the Berkeley DB replication subsystem. More info...

    rep_get_config(which)

    Returns whether the specified which parameter is currently set or not. More info...

    rep_set_limit(bytes)

    Sets a byte-count limit on the amount of data that will be transmitted from a site in response to a single message processed by the DB_ENV->rep_process_message method. The limit is not a hard limit, and the record that exceeds the limit is the last record to be sent. More info...

    rep_get_limit()

    Gets a byte-count limit on the amount of data that will be transmitted from a site in response to a single message processed by the DB_ENV->rep_process_message method. The limit is not a hard limit, and the record that exceeds the limit is the last record to be sent. More info...

    rep_set_request(minimum, maximum)

    Sets a threshold for the minimum and maximum time that a client waits before requesting retransmission of a missing message. Specifically, if the client detects a gap in the sequence of incoming log records or database pages, Berkeley DB will wait for at least min microseconds before requesting retransmission of the missing record. Berkeley DB will double that amount before requesting the same missing record again, and so on, up to a maximum threshold of max microseconds. More info...

    rep_get_request()

    Returns a tuple with the minimum and maximum number of microseconds a client waits before requesting retransmission. More info...

    rep_set_nsites(nsites)

    Specifies the total number of sites in a replication group. More info...

    rep_get_nsites()

    Returns the total number of sites in the replication group. More info...

    rep_set_priority(priority)

    Specifies the database environment’s priority in replication group elections. The priority must be a positive integer, or 0 if this environment cannot be a replication group master. More info...

    rep_get_priority()

    Returns the database environment priority. More info...

    rep_set_timeout(which, timeout)

    Specifies a variety of replication timeout values. More info...

    rep_get_timeout(which)

    Returns the timeout value for the specified which parameter. More info...

  • 4. DB

    4.1. DB Methods

    DB(dbEnv=None, flags=0)

    Constructor. More info...

    append(data, txn=None)

    A convenient version of put() that can be used for Recno or Queue databases. The DB_APPEND flag is automatically used, and the record number is returned. More info...

    associate(secondaryDB, callback, txn=None, flags=0)

    Used to associate secondaryDB to act as a secondary index for this (primary) database. The callback parameter should be a reference to a Python callable object that will construct and return the secondary key or DB_DONOTINDEX if the item should not be indexed. The parameters the callback will receive are the primaryKey and primaryData values. More info...

    close(flags=0)

    Flushes cached data and closes the database. More info...

    consume(txn=None, flags=0)

    For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. More info...

    consume_wait(txn=None, flags=0)

    For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. If the Queue database is empty, the thread of control will wait until there is data in the queue before returning. More info...

    cursor(txn=None, flags=0)

    Create a cursor on the DB and returns a DBCursor object. If a transaction is passed then the cursor can only be used within that transaction and you must be sure to close the cursor before commiting the transaction. More info...

    delete(key, txn=None, flags=0)

    Removes a key/data pair from the database. More info...

    fd()

    Returns a file descriptor for the database. More info...

    get(key, default=None, txn=None, flags=0, dlen=-1, doff=-1)

    Returns the data object associated with key. If key is an integer then the DB_SET_RECNO flag is automatically set for BTree databases and the actual key and the data value are returned as a tuple. If default is given then it is returned if the key is not found in the database. Partial records can be read using dlen and doff, however be sure to not read beyond the end of the actual data or you may get garbage. More info...

    pget(key, default=None, txn=None, flags=0, dlen=-1, doff=-1)

    This method is available only on secondary databases. It will return the primary key, given the secondary one, and associated data. More info...

    set_private(object)
    Link an arbitrary object to the DB.
    get_private()
    Give the object linked to the DB.
    get_both(key, data, txn=None, flags=0)

    A convenient version of get() that automatically sets the DB_GET_BOTH flag, and which will be successful only if both the key and data value are found in the database. (Can be used to verify the presence of a record in the database when duplicate keys are allowed.) More info...

    get_byteswapped()

    May be used to determine if the database was created on a machine with the same endianess as the current machine. More info...

    get_size(key, txn=None)
    Return the size of the data object associated with key.
    get_type()

    Return the database’s access method type. More info...

    join(cursorList, flags=0)

    Create and return a specialized cursor for use in performing joins on secondary indices. More info...

    key_range(key, txn=None, flags=0)

    Returns an estimate of the proportion of keys that are less than, equal to and greater than the specified key. More info...

    open(filename, dbname=None, dbtype=DB_UNKNOWN, flags=0, mode=0660, txn=None)

    Opens the database named dbname in the file named filename. The dbname argument is optional and allows applications to have multiple logical databases in a single physical file. It is an error to attempt to open a second database in a file that was not initially created using a database name. In-memory databases never intended to be shared or preserved on disk may be created by setting both the filename and dbname arguments to None. More info...

    put(key, data, txn=None, flags=0, dlen=-1, doff=-1)

    Stores the key/data pair in the database. If the DB_APPEND flag is used and the database is using the Recno or Queue access method then the record number allocated to the data is returned. Partial data objects can be written using dlen and doff. More info...

    remove(filename, dbname=None, flags=0)

    Remove a database. More info...

    rename(filename, dbname, newname, flags=0)

    Rename a database. More info...

    set_encrypt(passwd, flags=0)

    设置 Berkeley DB library 所使用的口令来执行加密和解密操作。Because databases opened within Berkeley DB environments use the password specified to the environment, it is an error to attempt to set a password in a database created within an environment. More info...

    set_bt_compare(compareFunc)

    Set the B-Tree database comparison function. This can only be called once before the database has been opened. compareFunc takes two arguments: (left key string, right key string) It must return a -1, 0, 1 integer similar to cmp. You can shoot your database in the foot, beware!Read the Berkeley DB docs for the full details of how the comparison function MUST behave. More info...

    set_bt_minkey(minKeys)

    Set the minimum number of keys that will be stored on any single BTree page. More info...

    set_cachesize(gbytes, bytes, ncache=0)

    Set the size of the database’s shared memory buffer pool. More info...

    set_get_returns_none(flag)
    Controls what get and related methods do when a key is not found.
    • See the DBEnv set_get_returns_none documentation. 将会返回之前的设置值。
      set_flags(flags)

      Set additional flags on the database before opening. More info...

      set_h_ffactor(ffactor)

      Set the desired density within the hash table. More info...

      set_h_nelem(nelem)

      Set an estimate of the final size of the hash table. More info...

      set_lorder(lorder)

      Set the byte order for integers in the stored database metadata. More info...

      set_pagesize(pagesize)

      Set the size of the pages used to hold items in the database, in bytes. More info...

      set_re_delim(delim)

      Set the delimiting byte used to mark the end of a record in the backing source file for the Recno access method. More info...

      set_re_len(length)

      For the Queue access method, specify that the records are of length length. For the Recno access method, specify that the records are fixed-length, not byte delimited, and are of length length. More info...

      set_re_pad(pad)

      Set the padding character for short, fixed-length records for the Queue and Recno access methods. More info...

      set_re_source(source)

      Set the underlying source file for the Recno access method. More info...

      set_q_extentsize(extentsize)

      Set the size of the extents used to hold pages in a Queue database, specified as a number of pages. Each extent is created as a separate physical file. If no extent size is set, the default behavior is to create only a single underlying database file. More info...

      stat(flags=0, txn=None)
      Return a dictionary containing database statistics with the following keys.
      For Hash databases:

    magic

    Magic number that identifies the file as a Hash database.

    version

    Version of the Hash database.

    nkeys

    Number of unique keys in the database.

    ndata

    Number of key/data pairs in the database.

    pagecnt

    The number of pages in the database.

    pagesize

    Underlying Hash database page (& bucket) size.

    nelem

    Estimated size of the hash table specified at database creation time.

    ffactor

    Desired fill factor (number of items per bucket) specified at database creation time.

    buckets

    Number of hash buckets.

    free

    Number of pages on the free list.

    bfree

    Number of bytes free on bucket pages.

    bigpages

    Number of big key/data pages.

    big_bfree

    Number of bytes free on big item pages.

    overflows

    Number of overflow pages (overflow pages are pages that contain items that did not fit in the main bucket page).

    ovfl_free

    Number of bytes free on overflow pages.

    dup

    Number of duplicate pages.

    dup_free

    Number of bytes free on duplicate pages.

    • For BTree and Recno databases:

    magic

    Magic number that identifies the file as a Btree database.

    version

    Version of the Btree database.

    nkeys

    For the Btree Access Method, the number of unique keys in the database.For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted.

    ndata

    For the Btree Access Method, the number of key/data pairs in the database.For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted.

    pagecnt

    The number of pages in the database.

    pagesize

    Underlying database page size.

    minkey

    Minimum keys per page.

    re_len

    Length of fixed-length records.

    re_pad

    Padding byte value for fixed-length records.

    levels

    Number of levels in the database.

    int_pg

    Number of database internal pages.

    leaf_pg

    Number of database leaf pages.

    dup_pg

    Number of database duplicate pages.

    over_pg

    Number of database overflow pages.

    empty_pg

    Number of empty database pages.

    free

    Number of pages on the free list.

    int_pgfree

    Num of bytes free in database internal pages.

    leaf_pgfree

    Number of bytes free in database leaf pages.

    dup_pgfree

    Num bytes free in database duplicate pages.

    over_pgfree

    Num of bytes free in database overflow pages.

    • For Queue databases:

    magic

    Magic number that identifies the file as a Queue database.

    version

    Version of the Queue file type.

    nkeys

    Number of records in the database.

    ndata

    Number of records in the database.

    pagesize

    Underlying database page size.

    extentsize

    Underlying database extent size, in pages.

    pages

    Number of pages in the database.

    re_len

    Length of the records.

    re_pad

    Padding byte value for the records.

    pgfree

    Number of bytes free in database pages.

    first_recno

    First undeleted record in the database.

    cur_recno

    Last allocated record number in the database.

    • More info...

      sync(flags=0)

      Flushes any cached information to disk. More info...

      truncate(txn=None, flags=0)

      Empties the database, discarding all records it contains. The number of records discarded from the database is returned. More info...

      upgrade(filename, flags=0)

      Upgrades all of the databases included in the file filename, if necessary. More info...

      verify(filename, dbname=None, outfile=None, flags=0)

      Verifies the integrity of all databases in the file specified by the filename argument, and optionally outputs the databases’ key/data pairs to a file. More info...

    4.2. DB Mapping and Compatibility Methods

    • These methods of the DB type are for implementing the Mapping Interface, as well as others for making a DB behave as much like a dictionary as possible. The main downside to using a DB as a dictionary is you are not able to specify a transaction object.
      DB_length() [ usage: len(db) ]
      Return the number of key/data pairs in the database.
      DB_subscript(key) [ usage: db[key] ]
      Return the data associated with key.
      DB_ass_sub(key, data) [ usage: db[key] = data ]
      Assign or update a key/data pair, or delete a key/data pair if data is NULL.
      keys(txn=None)
      Return a list of all keys in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
      items(txn=None)
      Return a list of tuples of all key/data pairs in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
      values(txn=None)
      Return a list of all data values in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
      has_key(key, txn=None)
      Returns true if key is present in the database.

    5. DBCursor

    5.1. DBCursor Methods

    close()

    Discards the cursor. If the cursor is created within a transaction then you must be sure to close the cursor before commiting the transaction. More info...

    count(flags=0)

    Returns a count of the number of duplicate data items for the key referenced by the cursor. More info...

    delete(flags=0)

    Deletes the key/data pair currently referenced by the cursor. More info...

    dup(flags=0)

    Create a new cursor. More info...

    put(key, data, flags=0, dlen=-1, doff=-1)

    Stores the key/data pair into the database. Partial data records can be written using dlen and doff. More info...

    get(flags, dlen=-1, doff=-1)
    See get(key, data, flags, dlen=-1, doff=-1) below.
    get(key, flags, dlen=-1, doff=-1)
    See get(key, data, flags, dlen=-1, doff=-1) below.
    get(key, data, flags, dlen=-1, doff=-1)

    Retrieves key/data pairs from the database using the cursor. All the specific functionalities of the get method are actually provided by the various methods below, which are the preferred way to fetch data using the cursor. These generic interfaces are only provided as an inconvenience. Partial data records are returned if dlen and doff are used in this method and in many of the specific methods below. More info...

    pget(flags, dlen=-1, doff=-1)
    See pget(key, data, flags, dlen=-1, doff=-1) below.
    pget(key, flags, dlen=-1, doff=-1)
    See pget(key, data, flags, dlen=-1, doff=-1) below.
    pget(key, data, flags, dlen=-1, doff=-1)

    Similar to the already described get(). This method is available only on secondary databases. It will return the primary key, given the secondary one, and associated data More info...

    5.2. DBCursor Get Methods

    • These DBCursor methods are all wrappers around the get() function in the C API.
      current(flags=0, dlen=-1, doff=-1)

      Returns the key/data pair currently referenced by the cursor. More info...

      get_current_size()
      Returns length of the data for the current entry referenced by the cursor.
      first(flags=0, dlen=-1, doff=-1)

      Position the cursor to the first key/data pair and return it. More info...

      last(flags=0, dlen=-1, doff=-1)

      Position the cursor to the last key/data pair and return it. More info...

      next(flags=0, dlen=-1, doff=-1)

      Position the cursor to the next key/data pair and return it. More info...

      prev(flags=0, dlen=-1, doff=-1)

      Position the cursor to the previous key/data pair and return it. More info...

      consume(flags=0, dlen=-1, doff=-1)
      For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue.
      • NOTE: This method is deprecated in Berkeley DB version 3.2 in favor of the new consume method in the DB class.

        get_both(key, data, flags=0)

        Like set() but positions the cursor to the record matching both key and data. (An alias for this is set_both, which makes more sense to me...) More info...

        get_recno()

        Return the record number associated with the cursor. The database must use the BTree access method and have been created with the DB_RECNUM flag. More info...

        join_item(flags=0)

        For cursors returned from the DB.join method, returns the combined key value from the joined cursors. More info...

        next_dup(flags=0, dlen=-1, doff=-1)

        If the next key/data pair of the database is a duplicate record for the current key/data pair, the cursor is moved to the next key/data pair of the database, and that pair is returned. More info...

        next_nodup(flags=0, dlen=-1, doff=-1)

        The cursor is moved to the next non-duplicate key/data pair of the database, and that pair is returned. More info...

        prev_nodup(flags=0, dlen=-1, doff=-1)

        The cursor is moved to the previous non-duplicate key/data pair of the database, and that pair is returned. More info...

        set(key, flags=0, dlen=-1, doff=-1)

        Move the cursor to the specified key in the database and return the key/data pair found there. More info...

        set_range(key, flags=0, dlen=-1, doff=-1)

        Identical to set() except that in the case of the BTree access method, the returned key/data pair is the smallest key greater than or equal to the specified key (as determined by the comparison function), permitting partial key matches and range searches. More info...

        set_recno(recno, flags=0, dlen=-1, doff=-1)

        Move the cursor to the specific numbered record of the database, and return the associated key/data pair. The underlying database must be of type Btree and it must have been created with the DB_RECNUM flag. More info...

        set_both(key, data, flags=0)

        See get_both(). The only difference in behaviour can be disabled using set_get_returns_none(2). More info...

    6. DBTxn

    6.1. DBTxn Methods

    abort()

    Aborts the transaction More info...

    commit(flags=0)

    Ends the transaction, committing any changes to the databases. More info...

    id()

    The txn_id function returns the unique transaction id associated with the specified transaction. More info...

    prepare(gid)

    Initiates the beginning of a two-phase commit. Begining with Berkeley DB 3.3 a global identifier paramater is required, which is a value unique across all processes involved in the commit. It must be a string of DB_XIDDATASIZE bytes. More info...

    discard()
    This method frees up all the per-process resources associated with the specified transaction, neither committing nor aborting the transaction. The transaction will be keep in “unresolved” state. This call may be used only after calls to “dbenv.txn_recover()”. A “unresolved” transaction will be returned again thru new calls to “dbenv.txn_recover()”.
    • For example, when there are multiple global transaction managers recovering transactions in a single Berkeley DB environment, any transactions returned by “dbenv.txn_recover()” that are not handled by the current global transaction manager should be discarded using “txn.discard()”.

      More info...

    7. DBLock

    • The DBLock objects have no methods or attributes. They are just opaque handles to the lock in question. They are managed via DBEnv methods.

    8. DBSequence

    • Sequences provide an arbitrary number of persistent objects that return an increasing or decreasing sequence of integers. 打开一个序列句柄,并将之与数据库中的一条记录相关联。The handle can maintain a cache of values from the database so that a database update is not needed as the application allocates a value.

      More info...

    8.1. DBSequence Methods

    DBSequence(db, flags=0)

    Constructor. More info...

    open(key, txn=None, flags=0)

    Opens the sequence represented by the key. More info...

    close(flags=0)

    Close a DBSequence handle. More info...

    initial_value(value)

    Set the initial value for a sequence. This call is only effective when the sequence is being created. More info...

    get(delta=1, txn=None, flags=0)

    Returns the next available element in the sequence and changes the sequence value by delta. More info...

    get_dbp()

    Returns the DB object associated to the DBSequence. More info...

    get_key()

    Returns the key for the sequence. More info...

    remove(txn=None, flags=0)

    Removes the sequence from the database. This method should not be called if there are other open handles on this sequence. More info...

    get_cachesize()

    Returns the current cache size. More info...

    set_cachesize(size)

    Configure the number of elements cached by a sequence handle. More info...

    get_flags()

    Returns the current flags. More info...

    set_flags(flags)

    Configure a sequence. More info...

    stat(flags=0)
    Returns a dictionary of sequence statistics with the following keys:

    wait

    The number of times a thread of control was forced to wait on the handle mutex.

    nowait

    The number of times that a thread of control was able to obtain handle mutex without waiting.

    current

    The current value of the sequence in the database.

    value

    The current cached value of the sequence.

    last_value

    The last cached value of the sequence.

    min

    The minimum permitted value of the sequence.

    max

    The maximum permitted value of the sequence.

    cache_size

    The number of values that will be cached in this handle.

    flags

    The flags value for the sequence.

    9. History

    • This module was started by Andrew Kuchling (amk) to remove the dependency on SWIG in a package by Gregory P. Smith who based his work on a similar package by Robin Dunn which wrapped Berkeley DB 2.7.x.

      Development then returned full circle back to Robin Dunn working in behalf of Digital Creations to complete the SWIG-less wrapping of the DB 3.x API and to build a solid unit test suite. Having completed that, Robin is now busy with another project (wxPython) and Greg has returned as maintainer. Jesus Cea Avion is the maintainer of this code since February 2008.

      This module is included in the standard python >= 2.3 distribution as the bsddb module. The only reason you should look here is for documentation or to get a more up to date version. The bsddb.db module aims to mirror much of the Berkeley DB C/C++ API.