This past weekend I had an experience that I'd like to share with you. It's a lesson in how our assumptions can lead to unintended consequences, and how to undo them when they occur.
I had a command running that would check a directory to see if there were any old files in it. Rather than delete those old files I instead opted to use
gvfs-trash to move them to the trash on my Linux system (there are other options like
trash-cli that will do the same.
This script was set up as such:
1 2 3 4 5 6 7 8 9 10 11 12
Now some of you might already be groaning with seeing code like this, and can guess what happened. Now that I've seen what can fail in this code I also see it too, but for those who haven't experienced how this can fail I can elaborate on how this particular script failed me, and denote the disastrous consequences that followed.
This script ran perfectly fine for many months. Each time it would
cd to the target directory, run the find command, and then let me know what it was about to delete. Sometimes I would go into that directory and touch the files to keep it from moving those files to the trash. Life was good.
The path that this script looks at is located on a NFS mount. It's actually a Western Digital MyBook connected via USB to a Synology NAS. The location is under
/mnt/usbshare, and the directory under TARGET is a link to that particular directory (in my home directory).
The details aren't quite as important in this scenario. What is important is the question "what happens when
/mnt/usbshare isn't available.
That becomes a more interesting issue. Since the mount for
/mnt/usbshare wasn't connected the
cd command fails. So the current working directory (cwd) is not set to the location that I was expecting it to be set. Instead the current working directory is set by other means.
In this case it was my home directory.
In some languages a command failing will cause the whole script to abort. An unhandled exception in Python will abort the script. In a
bash script a command like
cd failing just returns a non-zero result. If you're not checking for that result then it just gets discarded. The script continues on unabated.
I hope that sound you're making is the sound of enlightenment, because I got a rather large dose of it this past Saturday.
The rest of the script not only checked all of the files that were older than
$DAYS, it also began running
gvfs-trash on any file that hadn't been touched in over 180 days.
The system it's been running on has files that are most assuredly more than 180 days old. Files like my ssh keys, my gpg keys, git repos that haven't been touched in a while, pictures, gnome configuration files, etc. etc.
How I noticed what was happening was because I couldn't ssh into the system. My
authorized_keys file was now in
How I knew things were going into the trash was because I ran
ps and noticed that
gvfs-trash was still running.
How I freaked out when I realized that all my ssh keys (save for three) were now in the trash.
find . command had dutifully run in the current working directory, which was my home directory, and had proceeded to trash almost one million files (336GB of files altogether).
Luckily GNOME doesn't automatically start purging files when the trash gets to a certain size, or I would have been restoring from backup. Also GNOME has a relatively easy way to figure out where a file came from. The only problem I had was one of scale, and an issue getting anything GNOME-related working because the configuration files for GNOME were now all in the trash.
There's a command-line program called trash-cli that is a godsend. It's a command-line tool to work with the GNOME trash can. It'll move files there, empty the trash, and restore files. Unfortunately the restore process believes that you'll only be restoring from a trash directory that is in the hundreds max. It never actually got to the point where I could restore anything using the stock system.
Fortunately it was written in Python so I was able to adapt it to allow me to short-circuit the user interface and just restore the files as it encountered them.
My Saturday was spent babysitting this process, ensuring that it worked, and checking the status of the backup restore that I was doing with duplicity at the same time.
On Sunday morning I checked the status. It had finished, and after bringing up Nautilus I was able to mop up the remaining files that hadn't been automatically restored.
So far things appear to be working.
The moral of the story is don't use relative paths (
.., etc. ) for commands like find, especially if they're doing actions like deleting or moving files. Chances are they'll work most of the time, but when they don't work is when you'll be in big trouble.
To quote Jerry Pournelle of Chaos Manor: "I do these silly things so you don't have to".