Tag Archives: Dirac

The Digital Dark Ages

I’ve been paying my mortgage for about three years now. Unless I change something, I’m going to keep paying on it for another 27 years. I try not to think about the fact that although I have an actual physical copy of the mortgage agreement, with real pen-and-ink signatures, I don’t have any proof that I’ve ever made a payment.

At the risk of sounding like a Luddite, it bothers me that I have to trust the bank’s computer system to keep track of all 360 payments I’ll have made by the time it’s over. I’m not just being paranoid. I had an issue where a bank said my wife still owed money on a loan we had paid off three years earlier. We didn’t have anything in writing for each payment. The bank couldn’t even tell us the history of the loan; just that the computer showed we still owed money. And if a bank says you owe money, unless your lawyers are bigger than their lawyers, then you owe them money.

If you go to museums, you’ll see ledgers from banks in the 1800s and earlier. Over two hundred years later and we still know who paid their bills and when. But five years in the past … it doesn’t exist.

This could change with new regulations and retention requirements. But the big difference is what is standard vs. what you have to work at. A hundred years ago everything was written down. If you wanted to get rid of records you had to make an effort to identify what you wanted to delete, somehow separate it from the rest, and physically destroy it. Today, we only keep data as long as we have to. We only bother with long-term storage when the law or financial necessity makes us.

Let’s assume we have some data that we really want to keep “forever”. What is that going to take?

First, you’ll want to store it on something that doesn’t degrade quickly. Burning it to a CD or DVD seems to offer better longevity than VHS. Well, maybe. Second, you want to store it in a format that you’ll be able to read when you want to. This might be a harder problem than the physical longevity, when you start to consider how much data goes into a modern file format.

Look at the problem from the user’s perspective: The document format (the same applies to music and video) is just a way of saving the document in a way that it can be opened and look the same way at a later time, maybe on the same computer maybe not. When Windows 97 handles table formatting and text reflow around images a certain way for instance, the document format has a way of capturing the choices the user made.

If I open that Word 97 document in Word 2003, either the tables, text and images look the same or they don’t. If they look the same, it’s because there’s an import filter that understands what the old format means, and Word 2003 has a way of representing the same layout. If I then save as Word 2003, while the specific way to represent the layout has changed, the user doesn’t see the difference nor care.

If, on the other hand, that Word 97 document doesn’t look the same in Word 2003, it really doesn’t matter to the user if problem is a bad import filter or if Word 2003 doesn’t support the same features from Word 97. (Maybe they used flame text.) So a format that technically captures all the information needed to exactly recreate a document is utterly useless without something that can render it the same way.

Okay, so we need long-term media, and we need to choose a format that is popular enough that there will still be import filters for it in the foreseeable future. Eventually we’ll still reach the end of those paths. Either the disks will degrade, or the file format will be so out of date that no one makes import filters any more. When that happens, the only way to keep our data will be to copy it to new media, and potentially in a new format.

What should that format look like? We’ve already got PDF, which is based on how something looks in print. We’ve got various audio and video formats, which deal with playing an uninterrupted stream. But what about interactive/animated documents designed for online viewing?

Believe it or not, I’m going to suggest a Microsoft solution, though it’s one they haven’t thought to apply this way: PowerPoint. Today nearly everyone has a viewer, but not so long ago most of the slideshows I got were executables. If you had PowerPoint installed you could open the executable and edit the slideshow the same way you can edit a PDF if you have Acrobat.

As much as people complain about the bloat that Word adds to simple files, I think the future of file distribution will be to package the viewer along with the file. At some point storage becomes cheaper than the hassle of constantly updating all those obsolete file formats. The only question is how low a level the viewers will be written to: OS family, processor architecture, anything that runs C, etc.