This past weekend I had an experience that I'd like to share with you. It's a lesson in how our assumptions can lead to unintended consequences, and how to undo them when they occur.
I had a command running that would check a directory to see if there were any old files in it. Rather than delete those old files I instead opted to use gvfs-trash
to move them to the trash on my Linux system (there are other options like gio
and trash-cli
that will do the same.
This script was set up as such:
#!/bin/bash
# Need trailing slash for links
TARGET=/path/to/directory
DAYS=120
DELETE_DAYS=180
cd $TARGET
echo "Checking for files older than $DAYS days..."
find . -type f -ctime +$DAYS -exec stat -c "%n %y" {} \; | sort
echo "Deleting files older than $DELETE_DAYS days..."
find . -type f -ctime +$DELETE_DAYS -print -exec gvfs-trash {} \; | sort
Now some of you might already be groaning with seeing code like this, and can guess what happened. Now that I've seen what can fail in this code I also see it too, but for those who haven't experienced how this can fail I can elaborate on how this particular script failed me, and denote the disastrous consequences that followed.
This script ran perfectly fine for many months. Each time it would cd
to the target directory, run the find command, and then let me know what it was about to delete. Sometimes I would go into that directory and touch the files to keep it from moving those files to the trash. Life was good.
The path that this script looks at is located on a NFS mount. It's actually a Western Digital MyBook connected via USB to a Synology NAS. The location is under /mnt/usbshare
, and the directory under TARGET is a link to that particular directory (in my home directory).
The details aren't quite as important in this scenario. What is important is the question "what happens when /mnt/usbshare
isn't available.
That becomes a more interesting issue. Since the mount for /mnt/usbshare
wasn't connected the cd
command fails. So the current working directory (cwd) is not set to the location that I was expecting it to be set. Instead the current working directory is set by other means.
In this case it was my home directory.
In some languages a command failing will cause the whole script to abort. An unhandled exception in Python will abort the script. In a bash
script a command like cd
failing just returns a non-zero result. If you're not checking for that result then it just gets discarded. The script continues on unabated.
I hope that sound you're making is the sound of enlightenment, because I got a rather large dose of it this past Saturday.
The rest of the script not only checked all of the files that were older than $DAYS
, it also began running gvfs-trash
on any file that hadn't been touched in over 180 days.
The system it's been running on has files that are most assuredly more than 180 days old. Files like my ssh keys, my gpg keys, git repos that haven't been touched in a while, pictures, gnome configuration files, etc. etc.
How I noticed what was happening was because I couldn't ssh into the system. My authorized_keys
file was now in .local/share/Trash
.
How I knew things were going into the trash was because I ran ps
and noticed that gvfs-trash
was still running.
How I freaked out when I realized that all my ssh keys (save for three) were now in the trash.
the find .
command had dutifully run in the current working directory, which was my home directory, and had proceeded to trash almost one million files (336GB of files altogether).
Luckily GNOME doesn't automatically start purging files when the trash gets to a certain size, or I would have been restoring from backup. Also GNOME has a relatively easy way to figure out where a file came from. The only problem I had was one of scale, and an issue getting anything GNOME-related working because the configuration files for GNOME were now all in the trash.
There's a command-line program called trash-cli that is a godsend. It's a command-line tool to work with the GNOME trash can. It'll move files there, empty the trash, and restore files. Unfortunately the restore process believes that you'll only be restoring from a trash directory that is in the hundreds max. It never actually got to the point where I could restore anything using the stock system.
Fortunately it was written in Python so I was able to adapt it to allow me to short-circuit the user interface and just restore the files as it encountered them.
My Saturday was spent babysitting this process, ensuring that it worked, and checking the status of the backup restore that I was doing with duplicity at the same time.
On Sunday morning I checked the status. It had finished, and after bringing up Nautilus I was able to mop up the remaining files that hadn't been automatically restored.
So far things appear to be working.
The moral of the story is don't use relative paths (.
, ..
, etc. ) for commands like find, especially if they're doing actions like deleting or moving files. Chances are they'll work most of the time, but when they don't work is when you'll be in big trouble.
To quote Jerry Pournelle of Chaos Manor: "I do these silly things so you don't have to".