apt-mirror-updater: Automated Debian/Ubuntu mirror selection

https://travis-ci.org/xolox/python-apt-mirror-updater.svg?branch=master https://coveralls.io/repos/xolox/python-apt-mirror-updater/badge.svg?branch=master

The apt-mirror-updater package automates robust apt-get mirror selection for Debian and Ubuntu by enabling discovery of available mirrors, ranking of available mirrors, automatic switching between mirrors and robust package list updating (see features). It’s currently tested on Python 2.7, 3.5+ and PyPy (although test coverage is still rather low, see status).

Features

Discovery of available mirrors
Debian and Ubuntu mirrors are discovered automatically by querying the Debian mirror list or the Ubuntu mirror list (the applicable mirror list is automatically selected based on the current platform).
Ranking of available mirrors
Discovered mirrors are ranked by bandwidth (to pick the fastest mirror) and excluded if they’re being updated (see issues with mirror updates).
Automatic switching between mirrors
The main mirror configured in /etc/apt/sources.list can be changed with a single command. The new (to be configured) mirror can be selected automatically or configured explicitly by the user.
Robust package list updating
Several apt-get subcommands can fail if the current mirror is being updated (see issues with mirror updates) and apt-mirror-updater tries to work around this by wrapping apt-get update to retry on failures and automatically switch to a different mirror when it looks like the current mirror is being updated (because I’ve seen such updates take more than 15 minutes and it’s not always acceptable to wait for so long, especially in automated solutions).

Status

On the one hand the apt-mirror-updater package was developed based on quite a few years of experience in using apt-get on Debian and Ubuntu systems and large scale automation of apt-get (working on 150+ remote systems). On the other hand the Python package itself is relatively new: it was developed and published in March 2016. As such:

Warning

Until apt-mirror-updater has been rigorously tested I consider it a proof of concept (beta software) so if it corrupts your system you can’t complain that you weren’t warned! I’ve already tested it on a variety of Ubuntu systems but haven’t found the time to set up a Debian virtual machine for testing. Most of the logic is exactly the same though. The worst that can happen (assuming you trust my judgement ;-) is that /etc/apt/sources.list is corrupted however a backup copy is made before any changes are applied, so I don’t see how this can result in irreversible corruption.

I’m working on an automated test suite but at the moment I’m still a bit fuzzy on how to create representative tests for the error handling code paths (also, writing a decent test suite requires a significant chunk of time :-).

Installation

The apt-mirror-updater package is available on PyPI which means installation should be as simple as:

$ pip install apt-mirror-updater

There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).

Usage

There are two ways to use the apt-mirror-updater package: As the command line program apt-mirror-updater and as a Python API. For details about the Python API please refer to the API documentation available on Read the Docs. The command line interface is described below.

Usage: apt-mirror-updater [OPTIONS]

The apt-mirror-updater program automates robust apt-get mirror selection for Debian and Ubuntu by enabling discovery of available mirrors, ranking of available mirrors, automatic switching between mirrors and robust package list updating.

Supported options:

Option Description
-r, --remote-host=SSH_ALIAS Operate on a remote system instead of the local system. The SSH_ALIAS argument gives the SSH alias of the remote host. It is assumed that the remote account has root privileges or password-less sudo access.
-f, --find-current-mirror Determine the main mirror that is currently configured in /etc/apt/sources.list and report its URL on standard output.
-b, --find-best-mirror Discover available mirrors, rank them, select the best one and report its URL on standard output.
-l, --list-mirrors List available (ranked) mirrors on the terminal in a human readable format.
-c, --change-mirror=MIRROR_URL Update /etc/apt/sources.list to use the given MIRROR_URL.
-a, --auto-change-mirror Discover available mirrors, rank the mirrors by connection speed and update status and update /etc/apt/sources.list to use the best available mirror.
-u, --update, --update-package-lists Update the package lists using “apt-get update”, retrying on failure and automatically switch to a different mirror when it looks like the current mirror is being updated.
-x, --exclude=PATTERN Add a pattern to the mirror selection blacklist. PATTERN is expected to be a shell pattern (containing wild cards like “?” and “*”) that is matched against the full URL of each mirror.
-m, --max=COUNT

Don’t query more than COUNT mirrors for their connection status (defaults to 50). If you give the number 0 no limit will be applied.

Because Ubuntu mirror discovery can report more than 300 mirrors it’s useful to limit the number of mirrors that are queried, otherwise the ranking of mirrors will take a long time (because over 300 connections need to be established).

-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.

Issues with mirror updates

Over the past five years my team (at work) and I have been managing a cluster of 150+ Ubuntu servers, initially using manual system administration but over time automating apt-get for a variety of use cases (provisioning, security updates, deployments, etc.). As we increased our automation we started running into various transient failure modes of apt-get, primarily with apt-get update but incidentally also with other subcommands.

The most frequent failure that we run into is apt-get update crapping out with ‘hash sum mismatch’ errors (see also Debian bug #624122). When this happens a file called Archive-Update-in-Progress-* can sometimes be found on the index page of the mirror that is being used (see also Debian bug #110837). I’ve seen these situations last for more than 15 minutes.

My working theory about these ‘hash sum mismatch’ errors is that they are caused by the fact that mirror updates aren’t atomic, apparently causing apt-get update to download a package list whose datafiles aren’t consistent with each other. If this assumption proves to be correct (and also assuming that different mirrors are updated at different times :-) then the command apt-mirror-updater --update-package-lists should work around this annoying failure mode (by automatically switching to a different mirror when ‘hash sum mismatch’ errors are encountered).

Publishing apt-mirror-updater to the world is my attempt to contribute to this situation instead of complaining in bug trackers (see above) where no robust and automated solution is emerging (at the time of writing). Who knows, maybe some day these issues will be resolved by moving logic similar to what I’ve implemented here into apt-get itself. Of course it would also help if mirror updates were atomic…

Contact

The latest version of apt-mirror-updater is available on PyPI and GitHub. The documentation is hosted on Read the Docs and includes a changelog. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.

License

This software is licensed under the MIT license.

© 2020 Peter Odding.