Letters from the idiot: 4 Ds of Information loss

I have the nasty feeling that I have more questions than answers but here goes...

The old days

Before technology, life in the office was simple. You have documents, and you filed them away. They were big, bulky and paper based (once stone, velum and papyrus had their days). Sometimes documents got lost (down the back of the filing cabinet), sometimes documents were destroyed (blessed be the shredder despite projects to restore shredded documents using software). Rarely did physical documents end up in the hands of the wrong person (but it happened). The came easy duplication. And then came electronic records.

Electronic records, or data to give it an even more generic name, are everywhere. Data can be automatically collected and stored. When I first raised “data loss" I simply assumed I would stay on simple technical grounds such “hard disk crash" or indeed loosing the financial data of 25 million people in the post. Some of the issues are technical, some and legal, but all are social.

Never enough

Disk drives get larger to cope with the torrent of data. Much in the same way that “you can never be too rich" it's true that “you can never have too much disk space". However... As data volume grows, our ability to weed out the what from the chaff declines. It's easy to say 'never throw out anything, in case it's needed'. It also lets you avoid the boring (and possibly compromising) task of deleting data you don't need. However, then your operational budget bloats - it costs as much to look after useless data as expensive data. If it goes on long enough, you can't do anything about it; it's possible you won't never remember what most of it is.

This is where one part of the legal framework stands. If you are, say, automatically collecting all the web sites that a certain IP address connects to, how long should you hang on to it? How long is it legally useful for? And worth keeping for? ( Digital Right Ireland have a few things to say on this.) There is also a technical problem... If an Internet access node is unsecured, is the owner of the node liable for something posted using it? At the moment, yes, but that is because it hasn't been tested in an Irish Court

Sealed with a click

Another part of this is content. Google have an archive of a precursor to the web, called Usenet on archive. This is data. Public data? Well everything was considered public a the time. So this archive is publicly available.

But what about you diary? Not your blog, but your diary. Currently you have automatic copyright protection on everything you write. The contents of your diary become public domain 75 years after your death. Does the same apply to your e-mail? Private musings are supposed to become public domain after a time. If you turn out to be a famous person (at the time of your death) someone will hang on to every scrap of paper in the hopes that it will be worth something.
However every e-mail you write is technically protected under copyright, and replying or worse, forwarding an e-mail is technically in breach of a dozen copyright laws. When should your e-mail become public domain? If that data is on your hard drive, there is some hope that it will be forgotten about, but as a Microsoft anti-trust cases showed, e-mail has a habit of copying itself in other places than your drive. After all, there are the recipients, and all the server between (and a few that shouldn't have gotten it in the first place).
When should this mail become public domain? 75 years after your and every contributor's death? Something like that is impractical. 100 years after the message is sent? 50 years? And what if the message contains still confidential information (like the secret recipe for Snickerdoodles & Chocodoodles)?

Silly idea? Old medical records do go “public", but these are usually stored in archives of interest to few (usually medical students and researchers who would be qualified to have access to the information in the first place).
“Would it be morally right to give public access to email & messaging accounts 100 years after they were last accessed ? How interested would the historians of the future be in a copy of bebo.com from 2005 ? Or the contents of the mailbox of a famous serial killer 50 years after they died ? I don't think we have the option of letting that sort of data lapse. It will be the clearest echo of society's global digital consciousness."

This is the first time that the general public have had their personal messages (not just) information stored. Should I be retailed for your grandchildren (but hidden from your prospective employer)? When should an e-mail be considered an orphaned work?

Backing away

Along with the problem of how long data should be retained, lets look at the actual retention problem. If you 'never throw out anything, in case it's needed', you have an increased storage problem. I hear the call of “backups"?

“As data volumes grow, you either have to put all your eggs in one basket, or have multiple baskets. From experience, it's so tempting to try consolidate your data in one place, to reduce admin overhead. Hopefully that one system won't have a buggy motherboard that's silently corrupting everything it writes. And it's really painful if someone accidentally deletes a few petabytes of data - copying from backups takes ages, for a start."
Or “bugs in archival software ("Yup, that's archived. Oh, wait. No..it isn't. The machine had a bad disk, software crashed, and reported 'everything OK' when it restarted...") and freaky network instability (guys doing rewiring, restarting cluster routers and maybe some dodgy cables) resulting in more than one machine reporting as being the 'one true repository' for a certain type of data."

So the backups might be a problem....

But let's assume that the backups are valid. Then you have 2 format problems.
We don't have the hardware which can read the tapes anymore.
This actually happened to me professionally. I remembered when the archives were made, and indeed the data was found. Documented in place A where where the off-site storage utility had the backups. However, the tape drives had been scrapped years before.
And those of you that remember the Domesday project know tha the BBC fell in to a similar problem.

But let's assume that the anarchic backup archive tape could get it's contents loaded on to a system you can use... can you read the data format?

Earlier this year, Microsoft released a service pack which purposefully disabled older file formats. So your carefully restored data might be unreadable to the world, and worse, yourself. In a business case, the original specifications (or recipe) might be needed. Or your great grandfather's proposal on an on-line forum to the woman you've come to know as your great grand aunt.

Is there a “fix" for this? Well making the older formats fall in to the public domain would help. After all, if you're not using them...

So who deserves the credit, and who deserves the blame

So the disk has crashed, who do you sue? It should be simple, but it ain't. Much like a delayed or canceled air flight is not the cause of refunds if the cause of the problem is beyond the control of the airline, there are ways a disk can go. Legally.

Usually a hard disk will crash in infancy (within a day or two of starting life), meaning little if anything has been lost and it's under warranty of the manufacturer. Or the disk will die was it approaches the end of it's predicted life (well after warranty). The fact that the computer is usually obsolete long before you take it out of the box isn't something to be considered.

And while I'm sure that back-up software and hardware has warranties, the legal click through probably covers some lost data. But since the cost a new hard disk is usually less than the lost of the backup measures... home backing up is rare.

In a corporate setting, the party that looses the data should be held liable, but I don't know of any cases in Irish law on data crashes. Data gong missing however...

it's a steal, it's a loss

Credit card data gets stolen. It's an identifiable crime. Who (other than the criminals) is liable?
Well was a reasonable attempt made to protect the data? If so, was it reasonable enough? Can you sue for loss of data? (and given the ability to reconstruct shredded credit card bills (cited at the start) are you the cause of the data breach?)

Apparently no. If data is lost (in the post) or stolen, there is no case until the data is used and a victim can be shown to have damages (or have lost money) from the act. If personal data goes missing, is there a lawsuit? Liable or slander is not applicable since the data suggests if not proves that the information about the victim is true. There are privacy charges, but currently there is no privacy law in Ireland. Direct financial damages are possible, but the cost of the case is usually more than the loss? And there is the time it takes...

In the case of the recent UK financial data loss a lot of the data is personal data pertaining to minors. In fact everything needed for identity theft for then the minor becomes an adult. So someone sitting on the data would wait 10 to 18 years to strike. Is there a statute of limitations (or similar) for data theft? Or in this case, identity stolen almost a generation ago?

Well, I have asked more questions than I've answered...

Anyone able to answer some of these too?

Take care,
William Knott

With kind thanks to John Looney of Google (for the tech and social angles) and Simon McGarr of Tuppenceworth.ie (for the legal questions and answers)

tags : 4 Ds of Information loss, data loss, technology, hard disk crash, legal, social, data retention, Digital Right Ireland, copyright, copyright protection, public domain, orphaned work, backup, format, identity theft

Labels: 4 ds, backup, copyright, copyright protection, data, data retention, Digital Right Ireland, format, hard disk crash, identity theft, information loss, legal, orphaned work, public domain, social

Letters from the idiot

Thursday, January 24, 2008

4 Ds of Information loss – Data loss

0 Comments:

About Me

Previous Posts