initCommon(); $template->displayHeader(); ?>

2. Planning and Setup

2.1. The Distribution Structure

The Fedora distribution, which is the collection of all Fedora-related files, uses the directory tree in Example 1, “Fedora directory tree”. It may include multiple versions of Fedora Core. The tree design makes it easier to "trim" unnecessary or undesired files. When you set up a mirror, duplicate this tree exactly, or as closely as possible. If you duplicate the tree, it will be easier to automate nightly updates.

fedora
+-- linux
    +-- core
        |-- 1 
        |   ... 
        +-- 5 
        |   +-- SRPMS 
        |   +-- i386 
        |   |   +-- debug 
        |   |   +-- iso 
        |   |   +-- os 
        |   |       +-- Fedora 
        |   |       +-- SRPMS 
        |   |       +-- images 
        |   |       +-- isolinux 
        |   +-- x86_64 
        +-- development 
        |      ...
        +-- test 
        |      ...
        +-- updates 
            +-- 1 
            |   ... 
            +-- 5 
            |   +-- SRPMS 
            |   +-- i386 
            |   +-- x86_64 
            +-- testing 
                +-- 1 
                |   ... 
                +-- 5 
                    +-- SRPMS 
                    +-- i386 
                    +-- x86_64

Example 1. Fedora directory tree

[Note]Naming conventions

Throughout the rest of the document, /var/www/mirror represents the folder where all your mirrored files are stored. You may substitute a different location. This location simplifies sharing your mirror, due to the shipping configuration of Fedora Core. See Section 3, “Server Configuration” for more information. The site name mirror.example.com represents the upstream mirror.

The fedora/linux/core/5/arch/os directory contains a copy of all the original distribution files for Fedora Core 5. They are the same files found on the DVD and CD-ROM version of the distribution. The Fedora subfolder contains all the files that are necessary for installation, including the entire collection of Fedora Core RPM packages. The images folder contains copies of any floppy diskette or CD-ROM images that boot a system into installation or rescue modes. The fedora/linux/core/5/arch/iso folder contains images of the CD-ROM version of the distribution.

[Note]RPM packages

RPM, originally the Red Hat Package Manager and now the RPM Package Manager, is not just a file format. RPM is also a system that tracks and interconnects software and version information. The RPM system is quite popular, and many other Linux distributions use RPM as well. Read more information on RPM at http://www.rpm.org/.

The SRPMS folders under architecture-specific branches are links that point to the main SRPMS folder for that distribution. For example, fedora/linux/core/2/i386/os/SRPMS is a link that points to fedora/linux/core/2/SRPMS.

A Fedora mirror consists of at least the original ISO images or the distribution files. If possible, include both, provided you have sufficient disk space and/or bandwidth.

2.2. Copying the Original Distribution

If you already have reliable CD-ROM installation discs of a distribution, reduce your initial bandwidth and time spent mirroring by copying the files from the discs to your server. Copy all files from Installation Disc 1 into the fedora/linux/core/5/arch/os folder. Then copy all files from the Fedora folder of each of the remaining Installation discs into the fedora/linux/core/5/arch/os/Fedora folder on the server.

Copy all the files from the SRPMS folder on each of the "Sources" discs to the fedora/linux/core/5/SRPMS folder on the server. Make a link in the os folder that occurs under each architecture. Follow this example:

cd /var/www/mirror/fedora/linux/core/5/i386/os/Fedora
ln ../../SRPMS SRPMS

The documentation for anaconda, the Fedora Core installation program, calls this directory structure an exploded tree. This is because the package data on each CD is extracted, or exploded, to a large directory tree with a predetermined structure. The anaconda installer expects this structure to some extent.

If you only include CD images, create a mirror suitable for installation services by mounting each CD image under the arch/os/ directory. Make a directory for each disc, naming them disc1, disc2, and so on. Mount each disc on the appropriate folder, and add entries to /etc/fstab to perform this mount automatically in case of a reboot. Each entry looks like this:

/path/i386/iso/FC5-i386-disc1.iso  /path/i386/os/disc1  iso9660  defaults  0 0

The anaconda installer application automatically detects these folders and uses them properly. In addition, system configuration tools such as system-config-packages also continue to work properly when pointed at the parent of the ISO image mount points.

There are drawbacks to using CD ISO images in this fashion. For instance, no one directory contains the entire distribution of RPM packages. Soft links circumvent this problem, but your server security policies may not permit them. Fedora Core also comes in a ISO format DVD image, which alleviates this problem. Users who do not have DVD burning hardware, however, cannot use this image to make discs for their own use.

You only need a single line in /etc/fstab for mounting the Fedora Core DVD ISO image. The entry looks like this:

/path/i386/iso/FC5-i386-DVD.iso  /path/i386/os  iso9660  defaults  0 0

2.3. Trimming Branches

You may omit almost any branch of the tree that you do not plan to use. Consider carefully the impact of excluding that folder. Branches you might trim from your mirror include:

Older versions of Fedora Core (any numbered directory).

Before you exclude an old version, ensure this does not adversely affect any of your users. These adverse affects can come in many forms. For example, the level of support for certain hardware sometimes changes between releases of Fedora Core. Users who cannot install a previous version may not be able to use Fedora Core. Your users might need to perform software-related tasks such as building packages for different Fedora Core releases. Always remain aware of the needs of your users during the planning stage.

Folders for architectures your site does not support.

If you do not have any x86-64 hosts to support, trimming these folders eliminates several gigabytes of extra files. If you support x86-64 hosts later, though, you must restore mirroring of these branches.

The development folder (formerly "Rawhide").

This folder contains all the latest "bleeding-edge" packages from the Fedora Project. If you participate in active Fedora development, you should not trim this branch. Fedora development moves at a rapid pace and requires frequent updates to the latest development package versions. However, the frequent updates cause your mirror to download significant amounts of material during the regular update cycle.

The testing folders.

These branches contain updates that are being subjected to quality assurance through public testing, as well as the test or "pre-release" versions of the Fedora Core distribution. The testing folder under the main core tree is where test versions of the distribution, such as Fedora Core 6 test2, are kept. (Users of Fedora Core test distributions are often directed to use the development branch to update packages.) The testing folder, under updates, contains package updates that have not yet passed the public testing phase.

The debug folders.

These folders contain packages that enable developers and skilled users to interpret data created when a program crashes or encounters a bug. If you participate actively in Fedora development, you should not trim these folders. If you trim this branch, you may still download individual packages as needed from a nearby public mirror site.

The SRPMS folders (and links thereto).

These folders contain the original source for all the binary RPM packages in the distribution. You may download these packages individually as needed to save space on your local mirror.

Unless your site closely manages workstation configuration, you should probably not trim any of the updates branches for the distributions you support. These locations contain packages with bug fixes, security patches, and errata updates that your users probably want.

2.4. Downloading the Files

Locate a public mirror site for Fedora Core by referring to the main project site's mirror page, http://fedora.redhat.com/Download/mirrors.html. Once you have selected a nearby mirror site, note what services it offers (FTP, HTTP, and/or rsync). A mirror is usually servicing a large number of users. Choose off-peak hours, when possible, to download a large set of files. Be aware of any timezone differences when estimating off-peak hours.

2.4.1. Download Using HTTP or FTP

To download via HTTP or FTP, use either the wget or lftp command. The wget command recurses subdirectories automatically and pulls down entire trees of data with a single command. If you are not careful, however, it is possible to pull down much more data than you intended. The following commands mirror the entire current Fedora Core distribution:

cd /var/www/mirror 
wget --mirror -np -nH --cut-dirs=2 http://mirror.example.com/pub/mirror/fedora/linux/core/5/

Note the options used above:

  • --mirror turns on recursion (descends into all subdirectories), and duplicates file timestamps;

  • -np prevents wget from ascending into the parent directory;

  • -nH prevents wget from writing a directory named after the host (in this case, mirror.example.com);

  • --cut-dirs=n truncates the first n directories in the path. In the example above, --cut-dirs=2 prevents wget from writing the /pub/mirror portion of the path into your mirror.

The same syntax works for both HTTP and FTP upstream mirrors. It is possible that you may download some extraneous files if the HTTP site formats its pages for browser viewing. These files can be safely deleted, but return each time the mirror updates unless you exclude them using special options. See the wget man pages for more information.

The lftp command works like the wget command, and mirrors the content of a HTTP or FTP server. The wget command, however, does not delete old files locally. This feature is important for update repository mirrors to stay synchronized to upstream mirrors. New files are created and old files are automatically removed from the upstream mirrors on a frequent basis.

The lftp command synchronizes files and directories from a remote host like rsync, but uses HTTP or FTP protocols. Use the following command to mirror the entire Fedora Core distribution with lftp:

cd /var/www/mirror && \
lftp -c "open http://mirror.example.com/pub/mirror/linux/core/5/i386/ && \
mirror --delete --verbose"

The -c parameter executes a set of commands in a lftp process. Commands are separated with && to prevent the lftp command from executing if the cd command fails. The commands in the lftp command set work the same way. The command syntax A && B is often shorthand for "if A returns success, run B." An explanation of the lftp commands follows:

  • open connects to the site and changes directory automatically.

  • mirror fetches all files and directories recursively in the current directory. The --delete option excludes all local files that are not in the remote directory. The --verbose option prints some information in the screen and is optional.

The lftp command above mantains an exact copy of the directory for you. It downloads only new or changed files, and deletes only those that no longer exist on the upstream mirror.

As with wget, it is possible you may download some unwanted files. The lftp command supports regular expressions for excluding files within a mirror command. The command below shows how to mirror an current Fedora Core distribution updates repository, excluding debug and repodata directories:

cd /var/www/mirror && \
lftp -c "set mirror:exclude-regex 'debug\/|repodata\/' && \
open http://mirror.example.com/pub/mirror/linux/core/updates/5/i386/ && \
mirror --delete --verbose"

Consult the lftp man pages for more details and usage options.

[Tip]Using Proxy for HTTP or FTP retrieval

If you are behind a proxy or firewall, you may need to use a HTTP proxy to mirror files. To do this, export the environment variables http_proxy and ftp_proxy before you run the wget or lftp commands:

export http_proxy=http://username:password@host:port
export ftp_proxy=http://username:password@host:port

2.4.2. The rsync Command

Use the rsync command to synchronize a set of files and/or directories with a remote host. It operates in much the same way as rcp, but it is usually faster. One reason for the speed is that rsync has a special protocol that evaluates and skips files (or portions of files) that are already downloaded.

Begin by identifying the modules available on the upstream mirror site you have chosen. Note that the double colon "::" is always used after the host name to separate it from the rest of the rsync path. The following command generates a list of "modules" on the upstream mirror.

rsync mirror.example.org::

These modules are roughly equivalent to top-level directories, and they follow the same rules. To list any subdirectory of the upstream mirror, add the directory path to the command above. For example, on many mirrors, the fedora-linux-core module is equivalent to the fedora/linux/core path found at the Fedora Project main download server. To list the contents of the Fedora Core 5 distribution folder on the upstream server, issue the following command. Do not forget the trailing slash "/". Without it, you only receive a listing of a folder name that matches the last component of the remote path.

rsync mirror.example.org::fedora-linux-core/5/

2.4.3. Downloading Using rsync

To download via rsync, add a destination path on your system to the end of the command line. The resulting tree of files from the listing you perform are downloaded to the local path you specify. Remember, if you leave off the trailing slash on the remote path, then the last component of that path is created as a folder, and its contents are copied.

rsync filehouse.example.org::files/misc/ /var/www/misc/

When downloading using rsync for mirror purposes, use some of the command line switches to improve performance and feedback. The switches -PHav enable the following rsync features:

-P

recover partially-downloaded files, and show a progress meter

-H

preserve hard links

-a

recurse all directories, and preserve as much file information as possible, including timestamps, ownership, permissions, device files (if you are running as root), and soft links

-v

give verbose feedback to the screen

Remove the -v switch if you run this mirroring process as part of a script, or have no need to monitor progress. The following example mirrors all available versions of Fedora Core from an upstream site.

[Caution]Example command downloads many gigabytes of files

This command downloads many gigabytes of files, and is intended for use as an example only. Do not run this command if you do not understand the consequences.

rsync -PHav mirror.example.org::fedora-linux-core/5/ /var/www/mirror/fedora/linux/core/5

The -n switch performs a "dry run" using the other given parameters. Use this switch to test any rsync command if you are unsure what files you will receive. See also Possible data loss.

The -z switch enables compression during the rsync process. The server compresses data before transmission, and the client decompresses the data before writing it to disk.

[Tip]Compression using rsync

The vast majority of the Fedora Core distribution consists of RPM files, which are already compressed data. Therefore, additional compression does not save time, and instead induces an unnecessary load on the upstream mirror CPU. As a courtesy, do not use the -z switch for this purpose.

The next section features some additional switches that can be used to automatically trim branches from the tree of downloaded folders. With proper usage, they result in a mirror that is exactly as organized and full-featured as any high-volume public upstream site.

[Warning]Possible data loss

If you are not exceedingly careful in using these switches, it is possible to delete large portions of your mirrored data. Fixing this problem might require performing the copying steps outlined in Section 2.2, “Copying the Original Distribution” above. On the other hand, if you are also careless about your destination path, and you are running as root, you could put your entire system at risk. Know your environment before using these switches:

  • What is your current working directory? Use pwd to find out, if you are unsure.

  • Are you logged in as root? If you are using SELinux extensions, what is your current security context?

  • Have you tested this command using the -n switch (see Section 2.4.3, “Downloading Using rsync)?

Use the --exclude switch, along with a simple pattern, to disallow download of certain files and/or folders. For instance, --exclude "*.iso" excludes the download of any file whose name ends with the string ".iso".

Use the --delete switch, again with a pattern, to remove any file from the local system which does not have a match on the upstream mirror. This switch prevents unwanted file debris from cropping up in your mirror. You can also use it to retroactively trim branches of the tree which you no longer wish to maintain or download.

Wildcards are permitted with rsync commands, including the asterisk *, question mark ?, and brackets [ ]. The question mark and brackets work as in the shell; the former matches any single character, while the brackets define a set of characters to be matched. Asterisks are especially powerful when combined with a portion of a file name. The double asterisk ** pattern matches any character, including slashes; a single asterisk * matches any character, but stops at a slash. Therefore, be judicious about using either. The double asterisk is very useful for mirroring a tree that includes multiple instances of directories and files that contain a pattern. A good example is mirroring several versions of Fedora Core, where certain folder names appear in every version.

[Tip]Pattern matching wildcards

Use double asterisks to trim out directories that repeat throughout a mirrored tree. For example, when mirroring for a site that only uses i386 architecture machines, you may trim all files and folders marked for x86_64 architecture, using the switch --exclude "**x86_64**". This matches not only folders marked x86_64, but also files such as ISO images for x86_64, which are indicated by file names such as FC5-x86_64-disc1.iso.

Process a long list of exclusions and deletions with the --exclude-from and --delete-from options. Follow each tag with a file name that includes a list of patterns, one per line, to be matched by the appropriate option.

These syntax hints only scratch the surface of rsync, but suffice to make your first mirror. Once you have selected your site and formulated your excludes and deletes, run your rsync command with the -n option. Redirect output to a file so you can examine the resulting list of files in the editor or pager of your choice.

The following example mirrors the entire Fedora Core 5 distribution, with --exclude options that avoid downloading:

The -n switch is included for testing purposes. Backslashes at the ends of lines indicate this example is a single command line.

rsync -Pan --delete --exclude "**x86_64**" --exclude "**headers**" \
  --exclude "**debug**" --exclude "**iso**" \
  mirror.example.com::fedora-linux-core/5/ \
  /var/www/mirror/fedora/core/5

2.5. Maintaining Your Mirror

Fedora mirrors are even more useful when they are more than just a snapshot of the distribution at release time. Most mirror administrators also choose to carry updates and errata packages. Repositories of updates or development trees change daily, and your mirror should reflect these changes.

[Important]rsync etiquette

If you plan to do regular updates of your mirror that include large amounts of data, you should ask permission from the administrator of the upstream mirror. Downloading nightly package updates for the official releases of Fedora Core 5 should not require notification, as they are rarely more than a few megabytes. However, the development tree routinely turns over several hundred megabytes nightly. Take these factors into consideration before putting any maintenance scripts into effect.

Once your rsync command is working as desire, you may want to place it in a nightly cron script. The cron system allows you to schedule regularly-occurring jobs on your system. The intervals are highly configurable, but a nightly run keeps your mirror synchronized with updates and errata. Make sure your nightly cron job follows some simple guidelines:

  • If your upstream mirror only synchronizes once or twice daily, run your job after the upstream mirror completes its update. This insures your mirror not only gets the freshest material, but also does not interfere with the upstream server's bandwidth while it runs its job. If you do not know this time, it is usually safe to plan your downloads for pre-dawn hours.

  • Be sure you have sufficient disk space for additional packages. The updates tree in particular grows over time as more errata packages are released.

  • Always test your script thoroughly before allowing it to run automatically. Use a -n or -v switch in the rsync command line for testing, and then remove it once you have completed testing. Remember that the results are e-mailed to your account on your system unless you specify differently. Read the crontab(5) man pages for additional information, with the command man 5 crontab.

displayFooter('$Date: 2006/08/15 03:19:37 $'); ?>