16.3. Programming with the RPM Database
Prev	Chapter 16. Programming RPM with Python	Next

16.3. Programming with the RPM Database

Compared to the RPM C API, discussed in Chapter 16, the Python API is much simpler and requires many fewer programming statements to get your job done.

Just about every Python RPM script needs a transaction set. Create a transaction set with rpm.TransactionSet:

import rpm

ts = rpm.TransactionSet()

The transaction set will automatically open the RPM database if needed.

Note

The code examples in this chapter follow the Red Hat conventions for naming variables, such as ts for a transaction set. This is to make it easier to read the Python examples in the RPM sources, along with Red Hat installer programs written in Python.

You will need a transaction set in just about every Python script that accesses RPM functionality.

16.3.1. Accessing the RPM database

Transaction sets provide a number of methods for working with the RPM database at the database level. Use these methods if you need to interact with the database as a whole, as opposed to accessing individual packages in the database. For example, you can initialize or rebuild the RPM database with these methods. You can also use a handy trick for accessing another RPM database instead of the default system database.

16.3.1.1. Setting the Database Location

A transaction set will open the RPM database assuming the default location. To specify a different RPM database location, call addMacro, as shown following:

rpm.addMacro("_dbpath", path_to_rpm_database)

You can work with more than one RPM database by setting the _dbpath macro, creating a transaction set, and then removing the macro. After doing this, you can create another transaction set for the default RPM database, allowing your script to work with more than one database. For example:

# Open the rpmdb-redhat database

rpm.addMacro("_dbpath", "/usr/lib/rpmdb/i386-redhat-linux/redhat")

solvets = rpm.TransactionSet()

solvets.openDB()

rpm.delMacro("_dbpath")

# Open default database

ts = rpm.TransactionSet()

This example uses the rpmdb-redhat package, which holds a database of all Red Hat Linux packages. The explicit call to openDB opens the RPM database. In most Python scripts, though, you do not want to call openDB. Instead, a transaction set will open the database as needed.

The call to delMacro removes the _dbpath macro, allowing the next call to TransactionSet to use the default RPM database.

Note

Do not call closeDB on a transaction set. This method does indeed close the RPM database, but it also disables the ability to automatically open the RPM database as needed.

16.3.1.2. Initializing, Rebuilding, and Verifying the Database

The transaction set provides an initDB method to initialize a new RPM database. This acts like the rpm --initdb command.

ts.initDB()

The rebuildDB method regenerates the RPM database indices, like the rpm --rebuilddb command:

ts.rebuildDB()

The rebuildDB method regenerates the RPM database indices, like the rpm --rebuilddb command.

The verifyDB method checks that the RPM database and indices are readable by the Berkeley DB library:

ts.verifyDB()

Calling this method is the same as running the db_verify command on each of the database files in /var/lib/rpm.

Cross Reference

See Chapter 5 for more on initializing, rebuilding, and verifying RPM databases.

Once you have a transaction set, you can start querying the RPM database.

16.3.2. Querying the RPM database

Call dbMatch on a transaction set to create a match iterator. As with the C API, a match iterator allows your code to iterate over the packages that match a given criteria.

A call to dbMatch with no parameters means to set up a match iterator to go over the entire set of installed packages. The basic format follows:

import rpm

ts = rpm.TransactionSet()

mi = ts.dbMatch()

for h in mi:

# Do something with header object...

In this example, the call to dbMatch returns a match iterator. The for loop iterates over the match iterator, returning one header each time.

In addition to this syntax, you can call next on the match iterator to get the next entry, a header object that represents one package. For example:

import rpm

ts = rpm.TransactionSet()

mi = ts.dbMatch()

while mi:

h = mi.next()

# Do something with the header object

The explicit call to next on the match iterator will likely no longer be supported in a future version of the RPM Python API, since the PEP-234 (Python Enhancement Proposal) calls for one means or the other for iterating, but not both.

For example, Listing 17-1 shows a Python script to print out the name, version, and release information for all installed packages.

Listing 17-1: rpmqa.py

#!/usr/bin/python

# Acts like rpm -qa and lists the names of all the installed packages.

# Usage:

# python rpmqa.py

import rpm

ts = rpm.TransactionSet()

mi = ts.dbMatch()

for h in mi:

print "%s-%s-%s" % (h['name'], h['version'], h['release'])

When you call this script, you should see output like the following, truncated for space:

$ python rpmqa.py

libbonoboui-2.0.1-2

attr-2.0.8-3

dhclient-3.0pl1-9

file-3.37-8

hdparm-5.2-1

ksymoops-2.4.5-1

imlib-1.9.13-9

logwatch-2.6-8

mtr-0.49-7

openssh-clients-3.4p1-2

pax-3.0-4

python-optik-1.3-2

dump-0.4b28-4

sendmail-8.12.5-7

sudo-1.6.6-1

mkbootdisk-1.4.8-1

telnet-0.17-23

usbutils-0.9-7

wvdial-1.53-7

docbook-dtds-1.0-14

urw-fonts-2.0-26

db4-utils-4.0.14-14

libogg-devel-1.0-1

Note

If you set the execute permission on this script, you can skip the explicit call to the python command. For example:

$ ./rpmqa.pyr

16.3.3. Examining the package header

The code in Listing 17-1 introduces the package header object, an object of the hdr class. This represents a package header, and contains entries such as the name, version, pre- and post-installation scripts, and triggers.

16.3.3.1. The hdr Class

You can access each entry in the header using Python's dictionary syntax. This is much more convenient than calling headerGetEntry in C programs. The basic syntax to access header entries follows:

value = h['tag_name']

For example, to get the package name, use the following code:

name = h['name']

You can also use a set of predefined RPMTAG_ constants that match the C API. These constants are defined in the rpm module. For example:

name = h[rpm.RPMTAG_NAME]

Note

Using the rpm constants such as rpm.RPMTAG_NAME is faster than using the strings such as 'name'.

For header entries that hold an array of strings, such as the list of files in the package, the data returned is a Python list. For example:

print "Files:"

files = h['FILENAMES']

for name in files:

print name

You can use file info sets to achieve more compact code. For example:

print "Files:"

fi = h.fiFromHeader()

print fi

The requires, provides, obsoletes, and conflicts information each appear as three separate but related lists for each set of information, with three lists for the requires information, three for the provides information, and so on. You can extract this information using Python dependency sets using the simple code following:

print h.dsFromHeader('providename')

print h.dsFromHeader('requirename')

print h.dsFromHeader('obsoletename')

print h.dsFromHeader('conflictname')

Cross Reference

The rpminfo.py script in Listing 17-3 shows how to print out this information.

16.3.3.2. Printing Header Information with sprintf

In addition to using the Python dictionary syntax, you can use the sprintf method on a header to format data using a syntax exactly the same as the query format tags supported by the rpm command.

Cross Reference

Chapter 5 covers query formats.

The basic syntax is as follows:

h.sprintf("%{tag_name}")

You can also use special formatting additions to the tag name. For example:

print "Header signature: ", h.sprintf("%{DSAHEADER:pgpsig}")

print "%-20s: %s" % ('Installed on', h.sprintf("%{INSTALLTID:date}") )

You can combine this information into functions that print out header entries with specific formatting. For example:

def nvr(h):

return h.sprintf("%{NAME}-%{VERSION}-%{RELEASE}")

Note that you only really need to use sprintf when you need the format modifiers, such as date on %{INSTALLTID:date}. In most other cases, Python’s string-handling functions will work better.

16.3.4. Querying for specific packages

When you call dbMatch on a transaction set object, passing no parameters means to iterate over the entire set of installed packages in the RPM database. You can also query for specific packages using dbMatch. To do so, you need to pass the name of a tag in the header, as well as the value for that tag that you are looking for. The basic syntax follows:

mi = ts.dbMatch(tag_name, value)

For example, to query for all packages named sendmail, use code like the following:

mi = ts.dbMatch('name', 'sendmail')

The call to dbMatch returns an rpmdbMatchIterator. You can query on any of the tags in the header, but by far the most common query is by name.

Note

Some matches are fast and some are much slower. If you try to match on a tag that is indexed in the RPM database, the matches will perform much faster than for those tags that are not indexes. To determine which tags are indexed, look at the files in /var/lib/rpm. For example, Name and Requirename are files in /var/lib/rpm. These tags are indexed and will therefore match quickly.

Listing 17-2 shows an example Python script which queries for a particular package name and then prints out the name, version, and release for all matching packages.

Listing 17-2: rpmq.py

#!/usr/bin/python

# Acts like rpm -q and lists the N-V-R for installed

# packages that match a given name.

# Usage:

# python rpmq.py package_name

import rpm, sys

ts = rpm.TransactionSet()

mi = ts.dbMatch( 'name', sys.argv[1] )

for h in mi:

print "%s-%s-%s" % (h['name'], h['version'], h['release'])

When you call this script, you need to pass the name of a package to query, which the python interpreter will store in sys,argv[1] in the call to dbMatch. For example:

$ python rpmq.py sendmail

sendmail-8.12.5-7

16.3.5. Printing information on packages

You can create the equivalent of the rpm –qi command with a small number of Python commands. Listing 17-3 shows an example. This script queries for a particular package name, as shown previously in Listing 17-2. Once a package is found, though, rpminfo.py prints out a lot more information, similar to the output from the rpm –qi command.

Listing 17-3: rpminfo.py

#!/usr/bin/python

# Lists information on installed package listed on command line.

# Usage:

# python rpminfo.py package_name

import rpm, sys

def printEntry(header, label, format, extra):

value = header.sprintf(format).strip()

print "%-20s: %s %s" % (label, value, extra)

def printHeader(h):

if h[rpm.RPMTAG_SOURCEPACKAGE]:

extra = " source package"

else:

extra = " binary package"

printEntry(h, 'Package', "%{NAME}-%{VERSION}-%{RELEASE}", extra)

printEntry(h, 'Group', "%{GROUP}", '')

printEntry(h, 'Summary', "%{Summary}", '')

printEntry(h, 'Arch-OS-Platform', "%{ARCH}-%{OS}-%{PLATFORM}", '')

printEntry(h, 'Vendor', "%{Vendor}", '')

printEntry(h, 'URL', "%{URL}", '')

printEntry(h, 'Size', "%{Size}", '')

printEntry(h, 'Installed on', "%{INSTALLTID:date}", '')

print h['description']

print "Files:"

fi = h.fiFromHeader()

print fi

# Dependencies

print "Provides:"

print h.dsFromHeader('providename')

print "Requires:"

print h.dsFromHeader('requirename')

if h.dsFromHeader('obsoletename'):

print "Obsoletes:"

print h.dsFromHeader('obsoletename')

if h.dsFromHeader('conflictname'):

print "Conflicts:"

print h.dsFromHeader('conflictname')

ts = rpm.TransactionSet()

mi = ts.dbMatch( 'name', sys.argv[1] )

for h in mi:

printHeader(h)

Note

You should be able to simplify this script. The extensive use of the sprintf method is for illustration more than efficiency. You generally only need to call sprintf when you need a format modifier for a tag. In the rpminfo.py script, sprintf was also used to ensure that all entries are text, which allows for calling strip.

The printEntry function takes in a header sprintf tag value in the format of "%{NAME}". You can also pass in more complex values with multiple header entries, such as "%{NAME}-%{VERSION}".

When you run this script, you need to pass the name of a package. You'll see output like the following:

$ python rpminfo.py jikes

Package : jikes-1.18-1 binary package

Group : Development/Languages

Summary : java source to bytecode compiler

Arch-OS-Platform : i386-Linux-(none)

Vendor : (none)

URL : http://ibm.com/developerworks/opensource/jikes

Size : 2853672

Installed on : Mon Dec 2 20:10:13 2002

The IBM Jikes compiler translates Java source files to bytecode. It

also supports incremental compilation and automatic makefile

generation,and is maintained by the Jikes Project:

http://ibm.com/developerworks/opensource/jikes/

Files:

/usr/bin/jikes

/usr/doc/jikes-1.18/license.htm

/usr/man/man1/jikes.1.gz

Provides:

P jikes

P jikes = 1.18-1

Requires:

R ld-linux.so.2

R libc.so.6

R libc.so.6(GLIBC_2.0)

R libc.so.6(GLIBC_2.1)

R libc.so.6(GLIBC_2.1.3)

R libm.so.6

R libstdc++-libc6.2-2.so.3

16.3.6. Refining queries

The pattern method on a match iterator allows you to refine a query. This narrows an existing iterator to only show the packages you desire. The basic syntax follows:

mi.pattern(tag_name, mode, pattern)

The two main uses of the pattern method are to query on more than one tag, such as the version and name, or to narrow the results of a query, using the rich set of pattern modes. The mode parameter names the type of pattern used, which can be one of those listed in Table 17-2.

Table 17-2 Pattern modes for the pattern method

Type	Meaning
rpm.RPMMIRE_DEFAULT	Same as regular expressions, but with \., .*, and ^..$ added
rpm.RPMMIRE_GLOB	Glob-style patterns using fnmatch
rpm.RPMMIRE_REGEX	Regular expressions using regcomp
rpm.RPMMIRE_STRCMP	String comparisons using strcmp

Cross Reference

For more on these patterns, see the online manual pages for fnmatch(3), glob(7), regcomp(3), regex(7), and strcmp(3). The pattern method calls rpmdbSetIteratorRE from the C API, covered in the “Database Iterators” section in Chapter 16.

To query for all packages starting with py, for example, you can use code like the following:

import rpm

ts = rpm.TransactionSet()

mi = ts.dbMatch()

mi.pattern('name', rpm.RPMMIRE_GLOB, 'py*' )

for h in mi:

# Do something with the header...

Listing 17-4 shows an example for glob-based querying.

Listing 17-4: rpmglob.py

#!/usr/bin/python

# Acts like rpm -q and lists the N-V-R for installed packages

# that match a given name using a glob-like syntax

# Usage:

# python rpmglob.py "package_fragment*"

import rpm, sys

ts = rpm.TransactionSet()

mi = ts.dbMatch()

if not mi:

print "No packages found."

else:

mi.pattern('name', rpm.RPMMIRE_GLOB, sys.argv[1] )

for h in mi:

print "%s-%s-%s" % (h['name'], h['version'], h['release'])

When you run this script, you’ll see output like the following:

$ python rpmglob.py "py*"

pyxf86config-0.3.1-2

python-devel-2.2.1-17

pygtk2-devel-1.99.12-7

pygtk2-libglade-1.99.12-7

pygtk2-1.99.12-7

pyOpenSSL-0.5.0.91-1

python-optik-1.3-2

python-docs-2.2.1-17

python-2.2.1-17

python-tools-2.2.1-17

In addition to working with the RPM database, the Python API also provides access to RPM files.

Prev	Up	Next
16.2. The Python API Hierarchy	Home	16.4. Reading Package Files