Backing Up, Part 3
(Marc continues his series on backup strategies.)
Historical Archiving
So far we've covered two types of backup: full/automatic and offsite. But there is one more which can get extremely complicated: archiving.
The purpose of archiving is not as a backup for an emergency situation. The other solutions we discussed are better suited such incidents. Archiving is about long-term storage.
There are three key aspects to think about when considering long-term storage. One, the durability of the media itself. How long will it last before it fades or crumbles?
Two, the durability of the file formats of the data. Will future software programs be able to open those files?
Three, will the structure of the data allow for easy comprehension and retrieval in the distance future by people other than yourself?
Let's look at these more closely.
For media, the best choice for durability seems to be DVD/CD. Unlike magnetic media (floppy disks, tapes, or hard drives), DVDs/CDs can't be easily erased and the data doesn't degrade with time. However, the discs themselves are easily cracked and prone to scratching. Even moderate heat can warp the discs beyond readability. There are questions as to how long the discs will actually last -- the testing is at best a guess since the technology hasn't been around for hundreds of years for us to test it empirically.
One idea is to make multiple copies of each archive disc; if one copy gets scratched or destroyed, no big deal. A solution for longevity is to recopy the discs to fresh media every decade or so. That approach is not a bad idea as for all we know, today's DVDs/CDs will be as archaic twenty years from now as floppies are today. By burning new discs every decade, you could move the files to a new medium as needed.
An intriguing idea for long-term backups I haven't seen mentioned anywhere is flash memory. A few years ago that would have been absurd, but today one gigabyte flash drives are cheap and multi-gigabyte drives are common. An 8GB drive holds more data than a DVD and while it costs more than a blank disc, it doesn't take up much space and is presumably more durable. (I've seen many stories about digital cameras munched in accidents and drowned or burned but the photos on the memory card were still retrievable.) It may not be a good idea, but it might be worth considering.
A much bigger question than the media is the file formats. As we all know, software programs come and go, and what is wildly popular and a "standard" one decade, is forgotten by the next. Do you have any Lotus 123 files? Dbase? Appleworks? I know some people still hanging on to Apple II documents! In my garage I've got an old NeXT Station. It's right next to my Mac II and my old Sanyo "semi-compatible" DOS PC. I've got boxes of 5.25" DOS floppies and 3.5" Mac floppies: I'm sure I've got valuable writing I'd like to have stored on some of those. That's not smart archiving: I don't have a way to read the media and even if I did, who knows if I could find any program to read the actual files?
A year ago I got smart and converted a lot of old WriteNow files to text. WriteNow was a terrific little Mac word processor that was written in 68K assembly and was wicked fast (it originated on the NeXT, actually). But because WriteNow was written in assembly, it never even made the transition to PowerPC, and though it amazingly would run under Classic in Mac OS X, Classic's death in the move to Intel meant no more WriteNow. Realizing that, I converted a batch of old WriteNow documents I had because who knows: in a few years I might not even have a computer capable of running Classic! In fact, when I did that I only had one Intel Mac and five PPC Macs. Just a year later I have three Intels and four PPC. (Don't get too excited: most are ancient G3 machines and I bought most of the Intels used.)
The question of file formats becomes complicated. How do you know what formats will be readable in the future? Just because something is popular and ubiquitous today does not mean it will stay that way or even be remembered twenty years down the road. The answer is that you do not know what the future will be like. Everything could be very different.
My usual approach to this problem is to look at it from the perspective of, what do you want to do with the data in the future? Do you just want to view it? Or will you want to edit it? Those are two different tasks. Editing documents requires the underlying file format to be deeply understood: for instance, a drawing could be made up of hundreds of individual objects, or a page layout could have complex style sheets that define the appearance of the pages.
If you just need to view the documents, that's much easier: something like a snapshot of a document can easily be saved into a standard format that would be viewable in the future. PDF is an excellent format for preserving the state of old files: it can hold multiple pages, include resolution-independent graphics and photographs, text with full styling and typefaces, and other information, and it's cross-platform.
But PDF is horrible for editing: photos may not be full resolution and may have additional (destructive) compression applied, fonts may be subsets, and text is no longer in complete paragraphs and sentences but chopped up into chunks of characters making extensive editing impossible as the text won't reflow.
My recommendation is to archive files in multiple formats: save it in PDF, perhaps even in multiple PDFs (one compressed and portable, the other high resolution and not-so-portable), save it the original native application, plus export it to two or three other formats as well. You just never know what will be the most readable in the future. Besides, every conversion misses a few things, and you don't know what aspect of the data will be important to you in the future.
If possible, save things in the simplest formats you can: export word processing documents as plain text, databases as tab-delimited, etc. You may lose some formatting or valuable metadata, but some readable data is better than no data at all.
But beyond the mere issue of file formats, there's a whole planning aspect to long-term storage that most people ignore. It's very difficult to put yourself in the shoes of someone fifty years in the future, but it must be done. You don't know that person: that person may not be you and may have very different life and outlook than you. You must anticipate that and create an archive plan that anyone can understand.
Have you ever tried to use someone else's filing system? Many businesses, for example, struggle when a long-time secretary retires or passes away -- without a clear standard written down it's impossible to know that client Mary Brown is still stored under Mary White because that was her maiden name and she was a client for years and no one wanted to move all the records.
When you are archiving, you are essentially creating a long-term filing system. While digital files can be easily searched with software, that doesn't mean you don't need to organize the materials. It's important to organize it for several reasons. For one, it may not be you going through the files decades later. For another, what seems logical to you today may not years down the road as you've gained experience and become a different person. And even digital searching won't help if you're inconsistent, calling the client "mbrown" in some places and "Mary B." in others.
Most significantly, pertinent project details you remember today you won't know even a few years in the future. For example, I often create temporary files while I'm working on a project: alternate versions, temporary backups, test files, etc. While the project is active, those are important, but they need to be thrown away or at least carefully labeled before archiving. Otherwise a few years later I might not remember why there are two nearly identical versions of a client's logo and I'll have no way of knowing which is the one I'm supposed to use.
File names are one area that help with organizing. When a project is active, I might refer to it with a shorthand name (i.e. "Bob's brochure") which is fine (I'm unlikely to be working on multiple brochures at once for Bob). But once the project is archived, that perspective changes. Several years later when I'm looking for a specific brochure I did for Bob I don't want to find ten files called "Bob's Brochure.id" and not know which is which. So before archiving I might rename the file "Bob's 2008 Memorial Day Sale Brochure.id" which is much clearer.
The specifics of how you organize are beyond the scope of this article as it is a huge topic (there are entire books on file systems) and the techniques vary according to the kind of files you are archiving. For instance, I'm a graphic designer so I often archive design projects which can consist of hundreds of related files: a page layout file, imported graphics in various formats, and so on. If you're a home user and just need to archive some family photos or your writing journal, your needs may be much simpler.
You should also keep in mind the issue of metadata (metadata is simply data about data). A perfect example is adding comments or keywords to photos in iPhoto. That's adding metadata. That's extremely helpful and highly recommended as it will much things much easier for the future. For instance, writing down the names of relatives in family photos is crucial -- in the future you may not be able to identify people (especially if you have older photos and elderly relatives who could identify people have passed away).
But what happens if iPhoto metadata can't be read by future software? Probably the picture formats are fine, but you might lose your iPhoto data (comments, keywords, albums, sortings, etc.). Keep that in mind. One solution is to organize the photos into folders in your archive instead of relying exclusively on iPhoto's structure (or some other software). You might save comments and photo descriptions as simple plain text files stored along with the photo files.
Duplicate files are another problem with archiving. If you're paranoid like me, it's too easy to end up with multiple backups and that can make the archives confusing. Which is the most recent? Is a logo stored with one client's project the same as another in a different project? If possible, keep your duplicates to as few as possible. They waste space and add confusion.
Don't forget fonts: it's not a bad idea to store the fonts used in a project with the archive of the project. You might think that since these are standard fonts, the same fonts you always use, there's no reason to archive them, but in the future it might not even be you needing to work with the file -- there's no guarantee that the future system has those required fonts. You might have even upgraded your fonts but an older project expects an older version of the fonts. Now there's also no guarantee that those fonts will even work on the new system, but it can't hurt to try.
Another very good idea is to simply save some notes with the project. Describe the project in detail, and describe the organization system used. You'd be surprised how different something looks even just a few years later. For myself, my workflows often change. For example, it used to be page layout programs didn't support the importing of Photoshop (PSD) files directly, so I always had to preserve my original PSD file plus an exported version in a different format (like TIFF or EPS). Nowadays InDesign imports PSD files and can preserve the original's layer options and transparency, so I only need one file. Other times I might have two versions of a project, one for print with grayscale photos and another for a digital edition that has full-color photos. With either of these situations it's important to document the procedure so the future you will know what's going on.
The Bottom Line: Plan your archive strategy carefully with copious notes. Rename files, remove superfluous ones, and save in several different formats if you can. Be clear in your descriptions, assuming someone other than yourself will be the one trying to make sense of your files 50 years in the future.