This post is based on one originally published on a different site in November 2018.
You back up your photos, right? They are your precious memories and you’d not want to lose them, right? What if I told you that backups are not enough to protect your photos — do I have your attention?
There are many ways to “back up” your photos. You may use Apple Time Machine, or Super Duper or Carbon Copy Cloner, or a cloud backup service like Backblaze. Podcaster Casey Liss mentioned on Accidental Tech Podcast that he does an “exact duplicate” of all his photos to a hard drive once a month, which he then takes to his parents’ house. That statement was the catalyst to write the initial version of this post.
This all sounds like a good idea, but none of that stopped me losing nearly 700 photos. I did manage to recover them, by sheer luck, and in the process I learned the important difference between backups and archives.
One day, I was spending some time cleaning up my keyword hierarchy in Adobe Lightroom and I decided I had it in good enough shape that it was finally time to take the plunge – go back and retrospectively add keywords to ALL of my aviation photos, using my new system . Every aircraft that is identifiable has its registration or serial recorded, and all types, operators and locations are recorded (where possible), too.
I decided to start at the beginning, 2004, when I had bought my first digital camera. There weren’t many photos that year so I felt I was making good progress, even though I was staring at the relatively huge numbers in later years. I plodded on, bit by bit, ticking off each year… 2004, 2005, 2006, 2007, 2008, 2009.
When I got to 2010, something wasn’t right. I had folders for the first nine months of the year, but nothing for October, November, or December. I knew this couldn’t be right. I looked through my various backups, local and cloud, only to discover that this anomaly was perfectly replicated in every backup, including all historical versions. I was crestfallen! I had lost three months’ photos.
It occurred to me that I had migrated from Lightroom to Aperture in early 2011, so that was probably when it happened. Or perhaps it was when I later migrated back from Aperture to Lightroom, when Apple killed off Aperture. In any case, they were GONE.
It was only a couple of days later that I got the idea to check every hard drive I had lying around in my study. I have a motley collection of bare drives that I use from time to time to move stuff around, or just to get data off my main drives. I was delighted to find that I did indeed have a full copy of my photos as at January 2011, including the missing three months. I can only surmise I took a “just in case” copy prior to my migration to Lightroom. Just as well I had, and just as well I hadn’t cleaned house since!
I got the missing photos back into their correct location in my main photos drive and made sure my backups picked them up. Then I determined to do something very specific to protect against this kind of problem in the future. I began down the path of creating an archive of my photos.
The words “backup” and “archive” may seem similar, and I am sure some will argue my definitions, but they serve to differentiate two different processes.
A “backup” is Casey’s “exact duplicate”. If you run Apple’s Time Machine, it faithfully backs up what is currently stored on your computer. If you change a file, that changed version is backed up, replacing the prior version. If you delete a file, it no longer appears in the backup.
Now… Time Machine, Backblaze, and many others, do offer versioning. This is where old versions of changed or deleted files are kept. However, this is almost always time limited. Backblaze’s default is 3 months, or you can pay extra for a year. Time Machine keeps as much as it has space for, but you can’t control exactly which copies it keeps. In my case, the photos had been missing for nearly 8 years.
This is a key point. The versioning provided by backup services is only of value if you realise you need them soon after the “loss”. With something as precious as your photos, you’d need to be vigilant!
An “archive” is something different. Nothing ever gets deleted from an archive. Depending on how strict you are, nothing ever gets changed, either. There is some wiggle room in my definition, as you will see below, but the key thing is nothing is ever removed by default. If you want to remove something from the archive, you must do it deliberately.
While Backblaze’s “Computer Backup” service won’t fit the bill, their cloud storage service B2 will. You could also use any other cloud storage provider. What follows is how I use B2, which could easily translate to Amazon AWS and other similar services. I chose B2 because of my existing relationship with Backblaze, and their pricing. Yes — offsite archiving will cost you.
At this point, I feel I should describe my local setup. I have a single folder called Master Photos which currently lives on a fast external SSD. Within that folder are folders for each year and within those one for each month. The month folders hold my original photos, plus some additional files. The originals are mostly RAW files, though in early years they were JPEGs.
The additional files are mostly “sidecar” files that are used by applications like Adobe Lightroom and DxO PhotoLab to hold extra data about the photos, including edits that have been performed. Also, some photo editing tasks require that I generate intermediate TIFF files and I store those here as well.
When I have completed edits, I export the photos as downsized JPEGs, into a temporary folder. From there I add them to Apple Photos and upload them to Flickr. The JPEGs are then deleted, as I can always regenerate them.
In general, the RAW (and early JPEG original) files are never changed, though for me there is an exception to this — keywords. There are people who believe you should never alter your original files. In the case of proprietary RAW files, that’s usually difficult, anyway. My RAW files are camera-native DNG files which most software will happily write metadata into. My view is I’d rather the extensive keywords I add were in the original file than stored in sidecars, as there is the possibility for sidecars to become divorced from their RAW files.
With B2 added to my Backblaze account, I set up a “bucket” to hold my photos. I originally used the application Transmit, from Panic Software, to upload all of my photos. If you take a strict no changes approach, this is perfectly adequate, as all you are ever doing is adding new files. In my case, I occasionally change my keywords, so I want to overwrite the files in B2. I had trouble with Transmit for this task, so switched to the rclone utility, as described below.
rclone is a command line utility, so you don’t get an easy drag-and-drop interface, but once I worked out the setup, and experimented a bit with the main copy command to tune performance, it’s easy to use on an ongoing basis.
For setup, I’m not going to rehash what is already well covered on the rclone site. This creates a “service” definition called backblaze
which can then be referenced by numerous rclone subcommands.
The command I have settled on is as follows. It should be entered all on one line but for clarity I have split it here.
rclone copy
-P
--stats-one-line
--stats-unit bits
--transfers 64
--checksum
--ignore-case
--include "*.{TIF,TIFF,DNG,JPG,JPEG,DOP,XMP}" /Volumes/Vogon/Master\ Photos/2024
backblaze:zkarj-photos/ARJ/2024
I am using the copy
subcommand, which has the important attribute of never deleting anything. This, alone, fulfils the basic requirement of the archive process. It will allow overwriting, which is what I want, but if you add the additional option --immutable
it will fail if local files are modified.
The --stats-one-line
and --stats-unit bits
options are a personal choice. Having the stats on one line means you get a concise summary of progress that looks like this.
2.599 GiB / 2.599 GiB, 100%, 215.663 Mibit/s, ETA 0s
However, try leaving that off to see everything that is going on. You might need a tall Terminal window if you use the number I have in the next option!
Using Megabits per second as the unit puts the speed in terms of how your internet connection is most likely described.
The --transfers 64
option tells rclone to perform up to 64 simultaneous file transfers. You will probably want to tune this number to your situation. I have a fibre connection to the internet which can often achieve 300 Mb/s. I found with low numbers like 8 or 16, my connection was not being used to its full. In general, lots of small files are less efficient to transfer than a few large files given the same total size. Allowing rclone a high number of connections can combat slowness. If you have a slow internet connection, try starting with 8 and then increase from there. If you have a fast one, try 32 and up.
The --checksum
option is an important one. When I used Transmit to update files, I could tell it to compare either the create or the modification time of each file. I think there was some mismatch between local and remote times that caused files to be transferred, or not, when they shouldn’t, or should be. It was all rather confusing. Using this option causes a bit of a slow down, but rclone will generate a checksum based on the contents of the file, both remote and local. If the checksums match, the file is considered up to date and not sent.
The --ignore-case
option may not be needed. At some point I had some files modified that changed their file extension. For example, from dng
to DNG
. This caused extra files to be uploaded that were in fact duplicates. Ignoring case solves this.
The --include
option tells rclone which files on the local disk to consider. You can see my list is fairly exhaustive. Originally, I did not include the sidecar files (XMP, DOP) but later decided these were worth archiving too. The sidecar files are more likely to change than the actual image files, so you might want to run two separate commands — one for sidecars that allows changes and one for RAW files that does not. Note, also, the different versions of the extensions TIF
and TIFF
, and JPG
and JPEG
. You just never know which one software will use!
The final two lines define the local and remote folders to process. Be very careful when specifying these. If I accidentally put “2023” in the local name, then all of my 2023 photos would be re-uploaded, but to the 2024 folder on B2. You can get around this by always specifying the top level folder, but as I have well over 40,000 photos archived, that is a slooow process for me.
The first parameter just uses a standard shell reference to the local path. You will either need to escape any spaces as in the example, or “quote” the value.
The second parameter specifies the service backblaze
and the bucket zkarj-photos
followed by a subpath. I have ARJ
as the first part of my subpath, as I also back up scans of my father’s photos.
When you run the command, rclone will check through all of the matching files to work out what it needs to send, then send only those files. I was worried that requesting a checksum of the remote files would mean downloading them all, but many services, including B2, have the ability to generate the checksum on the server and just return the value.
I mentioned above that my command will overwrite files that have changed. B2 also has versioning. You can set the versioning policy in the B2 web interface under Lifecycle Settings where you can choose anything from no versioning to the default of keeping all versions forever.
If you have some level of versioning enabled but you wish to manually clean up old versions, then this command will do the job.
rclone cleanup backblaze:zkarj-photos/ARJ/2024
Finally, when you’re first getting things going with rclone, it can be incredibly useful (and stress reducing) to add the --dry-run
option, which will do all of the checks and tell you what would happen, but with out actually touching any files.