Backing up in Linux

It is often said that if you don't back up, you can never move forward. But what method is best? Should you unmount your backup device after use? How can you identify your USB drive? I will answer these questions in this tutorial on backing up from the Linux command line.

Comparison of Backup media

The choice of backup media depends on the cost of the media vs the cost of replacing the files. The best choice is different for everybody.

Medium	Lifespan (years)	Cost/TB	Notes
Spinning magnetic disk	<5–10	$19	Crashes with little warning, losing all information. Sensitive to movement and bumping
SSD drive	<1– 2	$50	Can last for years, but reported to lose information within a year unless continuously powered up
Tape	<5–10	$2.20	Complicated, needs changing, becoming obsolete
Cloud	<0–20	$$$	Secondary backup only. Never rely on the cloud being available when needed. If you forget to pay your bill, or if you say something on the Internet they don't like, or if the company goes under, it will disappear without warning
r/w DVD	<10–20	$0.34	Lifespan depends on humidity and exposure to light. Backing up a big PC takes many many DVDs
M-Disk	>1000	$115	Supposed to last 1000 years. Disks are expensive and you may need a new DVD writer to use them. Backing up a whole PC is impractical; useful mainly for the most valuable data. An M-disk reader will certainly not last 1000 years
Hard copy	<10–100	$0.02	Bulky. Restoring from backup is extremely slow (requires re-typing). Pictures printed on injket printer are instantly ruined by a drop of water.

That's not a typo. A blank M-disk currently costs 338 times more than a blank DVD, but it also lasts about 300 times longer, so it saves you from re-copying it many times over the course of a millennium. M-disks will undoubtedly get cheaper in the future.

Then there's physical security. There are special external drives sealed in an aluminum box that are claimed to be resistant to fire and flood.

Why you should keep your backup device unmounted

Being mounted all the time is as tiring for a computer as it would be for a person.

Malware and ransomware can't get to it when it's offline.
If your server crashes, it will try to file-check anything that's attached when it comes up, which can take hours for a big USB drive. Many people yank the USB cable out of any external drives when booting up to prevent this.
A network interface with a fixed IP address never changes. Distance is not a problem: you could back up onto a network device from thousands of miles away. A local one can be mounted over NFS. However, NFS causes problems if a device goes down while it's still mounted—another reason to unmount after use.
Electronic stuff is more resistant to spikes and experiences less wear and tear when it's powered down. However, repeated startups and shutdowns are rough on spinning magnetic hard drives.

In earlier times, people backed up on tape. Tape has a lot of disadvantages. You need to use tar to keep the tape running, but file size limits force you to granularize your backups. That means in most cases you use a special program to handle the details, and you have to hope that the program still works when you upgrade your system.

Many people now just copy file-by-file to a bulk device, which makes it a lot easier if a user wants to retrieve a specific file but forgot the filename and doesn't remember what's in it, how big it is, or when they created it. This happens more than you might think.

Writing a good backup script

While there are programs that can do backups for you, a common way to do backups is to write a command-line script, make it executable, and run it weekly from crontab. The advantage is never having to worry about some software disappearing, changing its parameters, or failing to compile in the future. Here is a sample backup script in Linux.

	Command (bash)	Command (tcsh, if different)
1	`#!/bin/bash`	`#!/bin/tcsh`
2	`lsblk \| grep sdb2 > /dev/null`
3	`if [ $? -gt 0 ]; then`	`if ($status != 0) then`
4	`logger Cannot find sdb2, backup aborted`
5	`exit`
6	`fi`	`endif`
7	`mount /dev/sdb2 /mnt`
8	`a=/mnt/oxygen/`	`set a = "/mnt/oxygen"`
9	`sync`
10	`logger Backing up /home`
11	`cp -pRuv /home $a`
12	`umount /dev/sdb2`
13	`logger Finished backing up`

Note that both tcsh and bash are very particular about spaces.

In line 3, the script checks the exit status of lsblk from line 2 to make sure the partition exists before trying to mount it. If it doesn't, it uses logger to notify you. This isn't a great solution these days because many programs write piles of useless junk to the system logs, making the error message easy to miss. Something like xmessage An error occurred &, or (if you're the only user) wall -g groupname, which sends a text string to every terminal running by the specified group, or even sending an email to yourself might be better, though these aren't perfect either.

In tcsh, set -e could be used to force the script to stop if a command hits an error, but it's better to check all the commands explicitly the same way we did in line 3 and exit with a specific error message if something goes wrong. A complete script will also verify that copying actually occurred. That's not shown here.

But there's a bigger problem. Unlike Windows, which uses fixed letters like ‘Z:’, device names change unpredictably in Linux. If you use a script to mount the drive or to write to it, your backup will go to the wrong place (or maybe nowhere) when the device name changes.

If you add or remove a drive, or if your external drive gets plugged into a different port, your USB drive could suddenly become /dev/sdc2 instead of /dev/sdb2. The best case scenario is that the mount command fails. If your script tests whether the mount command works, no problem: it exits and the backup fails. If it doesn't test (as in the example), all the files are written to your mount point and fill up the partition, and you run out of disk space. In a worst case scenario, your script could even start writing to the same partition you're trying to back up.

Another gotcha is that mount doesn't set a non-zero exit status if you specify a mount point that's already being used. So it will happily change the name of an existing directory and dump your backups there. Unfortunately, commands like mountpoint and findmnt don't do what we want. The only reliable test I know of is ls /mnt | wc - l, where /mnt is where you want your backups to go. If it prints a 0, you know there's nothing there already. This isn't a problem unless you mount two backup devices at the same point. So the solution is: don't do that.

One last gotcha is to make sure you don't try to backup /dev or especially /proc, where there are dragons. This means you should avoid backing up the root directory ( / ). Of course, this is true for any backup software, not just a homebrew script.

How to not do that

All these problems are prevented if we change line 7 in the example script so that it mounts the backup device by its ID or UUID instead of its name. If you know the device name, there are many ways to find the ID. Most will work if the device is attached and powered up, even if it's not mounted. The easiest way is to use lsblk -f:

# lsblk -f sdb |_sdb1 |_sdb2 ext4 1.0 b8676072-2da8-4ee5-a67f-9fbec7965670

The UUID is the string shown in red.

Another way is to use blkid:

# /sbin/blkid | grep sdb2 /dev/sdb2: UUID="b8676072-2da8-4ee5-a67f-9fbec7965670" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="Basic data partition" PARTUUID="693bcdb5-475e-4942-8ea4-699fb82c036e"

Now you have the uuid, so the mount command is:

mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt

To get even more information, type # lsblk -O -P | grep sdb2 (letter O)

NAME="sdb2" PATH="/dev/sdb2" UUID="b8676072-2da8-4ee5-a67f-9fbec7965670" PTUUID="6db9ad18-a15c-4ce5-93d0-4c0fff06257f" PARTTYPE="ebd0a0a2-b9e5-4433-87c0-68b6b72699c7" PARTUUID="693bcdb5-475e-4942-8ea4-699fb82c036e" WWN="0x5000c500c94575b7"

I've edited out a lot of additional stuff, leaving only the ID numbers. This command tells you the UUID (in red), the PARTUUID (in orange), the WWN number (in blue), and the PTUUID (in green), so you can also mount it with this command:

ls -l /dev/disk/by-id/* | grep sdb2 /dev/disk/by-id/wwn-0x5000c500c94575b7-part2 -> ../../sdb2 mount /dev/disk/by-id/wwn-0x5000c500c94575b7-part2 /mnt

ls -l /dev/disk/by-uuid/* | grep sdb2 /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 -> ../../sdb2 mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt

How to get the drive name

By now, you're probably thinking: Well, that's really interesting, but what if I don't know the drive name?

For that, there are two ways. One way is to match the PTUUID (shown in green above) with whatever fdisk -l (lowercase L for 'list') tells you. Fdisk prints lots of information, including the manufacturer, to help you distinguish them. The other way is to use lsblk. Both can find the drive name from the PTUUID. Here's how to do it with lsblk:

# lsblk -O -P | grep 6db9ad | grep NAME NAME="sdb2" ...

I used a couple of greps to narrow down the mountain of data. It doesn't matter because you only need to do this once. Now that you know the drive name, you can get the ID or UUID you need to construct a mount command, as we did above.

mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt mount /dev/disk/by-id/wwn-0x5000c500c94575b7-part2 /mnt

Don't forget to document in your script what these IDs are for. But really, nobody gets the drive name this way. Most people just power-cycle the USB drive or plug it in and use df to see what's different (assuming the drive automounts), or else watch the logs:

tail -f /var/log/messages . . . oxygen kernel: EXT4-fs (sdb2): mounted filesystem with ordered data mode

That is a lot easier, provided you're next to the computer. If you're two hundred miles away from it, you have to do it the hard way. And as we all know, doing things the hard way is what Linux is all about.

dec 09 2023, 3:24 am. updated dec 12 2023