randombio.com | Science Dies in Unblogginess | Believe All Science | I Am the Science Saturday, December 09, 2023 | computer tutorial Backing up in LinuxYou can always do backups manually, but it's better to have them run automatically. There are certain things you need to know first |
It is often said that if you don't back up, you can never move forward. But what method is best? Should you unmount your backup device after use? How can you identify your USB drive? I will answer these questions in this tutorial on backing up from the Linux command line.
The choice of backup media depends on the cost of the media vs the cost of replacing the files. The best choice is different for everybody.
Medium | Lifespan (years) | Cost/TB | Notes |
---|---|---|---|
Spinning magnetic disk | <5–10 | $19 | Crashes with little warning, losing all information. Sensitive to movement and bumping |
SSD drive | <1– 2 | $50 | Can last for years, but reported to lose information within a year unless continuously powered up |
Tape | <5–10 | $2.20 | Complicated, needs changing, becoming obsolete |
Cloud | <0–20 | $$$ | Secondary backup only. Never rely on the cloud being available when needed. If you forget to pay your bill, or if you say something on the Internet they don't like, or if the company goes under, it will disappear without warning |
r/w DVD | <10–20 | $0.34 | Lifespan depends on humidity and exposure to light. Backing up a big PC takes many many DVDs |
M-Disk | >1000 | $115 | Supposed to last 1000 years. Disks are expensive and you may need a new DVD writer to use them. Backing up a whole PC is impractical; useful mainly for the most valuable data. An M-disk reader will certainly not last 1000 years |
Hard copy | <10–100 | $0.02 | Bulky. Restoring from backup is extremely slow (requires re-typing). Pictures printed on injket printer are instantly ruined by a drop of water. |
That's not a typo. A blank M-disk currently costs 338 times more than a blank DVD, but it also lasts about 300 times longer, so it saves you from re-copying it many times over the course of a millennium. M-disks will undoubtedly get cheaper in the future.
Then there's physical security. There are special external drives sealed in an aluminum box that are claimed to be resistant to fire and flood.
Being mounted all the time is as tiring for a computer as it would be for a person.
Malware and ransomware can't get to it when it's offline.
If your server crashes, it will try to file-check anything that's attached when it comes up, which can take hours for a big USB drive. Many people yank the USB cable out of any external drives when booting up to prevent this.
A network interface with a fixed IP address never changes. Distance is not a problem: you could back up onto a network device from thousands of miles away. A local one can be mounted over NFS. However, NFS causes problems if a device goes down while it's still mounted—another reason to unmount after use.
Electronic stuff is more resistant to spikes and experiences less wear and tear when it's powered down. However, repeated startups and shutdowns are rough on spinning magnetic hard drives.
In earlier times, people backed up on tape. Tape has a lot of disadvantages.
You need to use tar
to keep the tape running, but file size
limits force you to granularize your backups. That means in most cases
you use a special program to handle the details, and you have to
hope that the program still works when you upgrade your system.
Many people now just copy file-by-file to a bulk device, which makes it a lot easier if a user wants to retrieve a specific file but forgot the filename and doesn't remember what's in it, how big it is, or when they created it. This happens more than you might think.
While there are programs that can do backups for you, a common way to do backups is to write a command-line script, make it executable, and run it weekly from crontab. The advantage is never having to worry about some software disappearing, changing its parameters, or failing to compile in the future. Here is a sample backup script in Linux.
Command (bash) | Command (tcsh, if different) | |
---|---|---|
1 | #!/bin/bash | #!/bin/tcsh |
2 | lsblk | grep sdb2 > /dev/null | |
3 | if [ $? -gt 0 ]; then | if ($status != 0) then |
4 | logger Cannot find sdb2, backup aborted | |
5 | exit | |
6 | fi | endif |
7 | mount /dev/sdb2 /mnt | |
8 | a=/mnt/oxygen/ | set a = "/mnt/oxygen" |
9 | sync | |
10 | logger Backing up /home | |
11 | cp -pRuv /home $a | |
12 | umount /dev/sdb2 | |
13 | logger Finished backing up |
Note that both tcsh and bash are very particular about spaces.
In line 3, the script checks the exit status of lsblk
from line 2
to make sure the partition exists before trying to mount it. If it doesn't, it uses
logger
to notify you. This isn't a great solution these days
because many programs write piles of useless junk to the system logs,
making the error message easy to miss.
Something like xmessage An error occurred &
, or (if you're
the only user) wall -g groupname
, which sends a text string
to every terminal running by the specified group, or even sending an email
to yourself might be better, though these aren't perfect either.
In tcsh, set -e
could be used to force the script to stop
if a command hits an error, but it's better to check all the commands
explicitly the same way we did in line 3 and exit with a specific error
message if something goes wrong. A complete script will also verify that
copying actually occurred. That's not shown here.
But there's a bigger problem. Unlike Windows, which uses fixed letters like ‘Z:’, device names change unpredictably in Linux. If you use a script to mount the drive or to write to it, your backup will go to the wrong place (or maybe nowhere) when the device name changes.
If you add or remove a drive, or if your external drive gets plugged into a
different port, your USB drive could suddenly become /dev/sdc2
instead of /dev/sdb2
. The best case scenario is that the mount
command fails. If your script tests whether the mount command works, no
problem: it exits and the backup fails. If it doesn't test (as in the example),
all the files are written to your mount point and fill up the partition,
and you run out of disk space. In a worst case scenario, your script could
even start writing to the same partition you're trying to back up.
Another gotcha is that mount
doesn't set a non-zero exit status
if you specify a mount point that's already being used. So it will happily
change the name of an existing directory and dump your backups there.
Unfortunately, commands like mountpoint
and findmnt
don't do what we want. The only reliable test I know of is
ls /mnt | wc - l
, where /mnt is where you want your backups
to go. If it prints a 0, you know there's nothing there already.
This isn't a problem unless you mount two backup devices at the same
point. So the solution is: don't do that.
One last gotcha is to make sure you don't try to backup /dev
or especially /proc
, where there are dragons. This means you
should avoid backing up the root directory ( / ). Of course, this is true
for any backup software, not just a homebrew script.
All these problems are prevented if we change line 7 in the example script so
that it mounts the backup device by its
ID or UUID instead of its name. If you know the device name, there are many
ways to find the ID. Most will work if the device is attached and powered
up, even if it's not mounted. The easiest way is to use lsblk -f
:
# lsblk -f
sdb
|_sdb1
|_sdb2 ext4 1.0 b8676072-2da8-4ee5-a67f-9fbec7965670
The UUID is the string shown in red.
Another way is to use blkid
:
# /sbin/blkid | grep sdb2
/dev/sdb2: UUID="b8676072-2da8-4ee5-a67f-9fbec7965670"
BLOCK_SIZE="4096"
TYPE="ext4"
PARTLABEL="Basic data partition"
PARTUUID="693bcdb5-475e-4942-8ea4-699fb82c036e"
Now you have the uuid, so the mount command is:
mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt
To get even more information, type # lsblk -O -P | grep sdb2
(letter O)
NAME="sdb2"
PATH="/dev/sdb2"
UUID="b8676072-2da8-4ee5-a67f-9fbec7965670"
PTUUID="6db9ad18-a15c-4ce5-93d0-4c0fff06257f"
PARTTYPE="ebd0a0a2-b9e5-4433-87c0-68b6b72699c7"
PARTUUID="693bcdb5-475e-4942-8ea4-699fb82c036e"
WWN="0x5000c500c94575b7"
I've edited out a lot of additional stuff, leaving only the ID numbers. This command tells you the UUID (in red), the PARTUUID (in orange), the WWN number (in blue), and the PTUUID (in green), so you can also mount it with this command:
ls -l /dev/disk/by-id/* | grep sdb2
/dev/disk/by-id/wwn-0x5000c500c94575b7-part2 -> ../../sdb2
mount /dev/disk/by-id/wwn-0x5000c500c94575b7-part2 /mnt
or
ls -l /dev/disk/by-uuid/* | grep sdb2
/dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 -> ../../sdb2
mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt
By now, you're probably thinking: Well, that's really interesting, but what if I don't know the drive name?
For that, there are two ways. One way is to
match the PTUUID (shown in green above) with whatever fdisk -l
(lowercase L for 'list') tells you. Fdisk prints lots of information,
including the manufacturer, to help you distinguish them.
The other way is to use lsblk
.
Both can find the drive name from the PTUUID. Here's how to
do it with lsblk:
# lsblk -O -P | grep 6db9ad | grep NAME
NAME="sdb2" ...
I used a couple of grep
s to narrow down the mountain of
data. It doesn't matter because you only need to do this once.
Now that you know the drive name, you can get the ID or
UUID you need to construct a mount command, as we did above.
mount /dev/disk/by-uuid/b8676072-2da8-4ee5-a67f-9fbec7965670 /mnt
mount /dev/disk/by-id/wwn-0x5000c500c94575b7-part2 /mnt
Don't forget to document in your script what these IDs are for.
But really, nobody gets the drive name this way. Most people just
power-cycle the USB drive or plug it in and use df
to
see what's different (assuming the drive automounts), or else watch
the logs:
tail -f /var/log/messages
. . .
oxygen kernel: EXT4-fs (sdb2): mounted filesystem with ordered data mode
That is a lot easier, provided you're next to the computer. If you're two hundred miles away from it, you have to do it the hard way. And as we all know, doing things the hard way is what Linux is all about.
dec 09 2023, 3:24 am. updated dec 12 2023