Thursday, January 24, 2008

4 Ds of Information loss – Data loss

I have the nasty feeling that I have more questions than answers but here goes...

The old days

Before technology, life in the office was simple. You have documents, and you filed them away. They were big, bulky and paper based (once stone, velum and papyrus had their days). Sometimes documents got lost (down the back of the filing cabinet), sometimes documents were destroyed (blessed be the shredder despite projects to restore shredded documents using software). Rarely did physical documents end up in the hands of the wrong person (but it happened). The came easy duplication. And then came electronic records.

Electronic records, or data to give it an even more generic name, are everywhere. Data can be automatically collected and stored. When I first raised “data loss" I simply assumed I would stay on simple technical grounds such “hard disk crash" or indeed loosing the financial data of 25 million people in the post. Some of the issues are technical, some and legal, but all are social.

Never enough

Disk drives get larger to cope with the torrent of data. Much in the same way that “you can never be too rich" it's true that “you can never have too much disk space". However... As data volume grows, our ability to weed out the what from the chaff declines. It's easy to say 'never throw out anything, in case it's needed'. It also lets you avoid the boring (and possibly compromising) task of deleting data you don't need. However, then your operational budget bloats - it costs as much to look after useless data as expensive data. If it goes on long enough, you can't do anything about it; it's possible you won't never remember what most of it is.

This is where one part of the legal framework stands. If you are, say, automatically collecting all the web sites that a certain IP address connects to, how long should you hang on to it? How long is it legally useful for? And worth keeping for? ( Digital Right Ireland have a few things to say on this.) There is also a technical problem... If an Internet access node is unsecured, is the owner of the node liable for something posted using it? At the moment, yes, but that is because it hasn't been tested in an Irish Court

Sealed with a click

Another part of this is content. Google have an archive of a precursor to the web, called Usenet on archive. This is data. Public data? Well everything was considered public a the time. So this archive is publicly available.

But what about you diary? Not your blog, but your diary. Currently you have automatic copyright protection on everything you write. The contents of your diary become public domain 75 years after your death. Does the same apply to your e-mail? Private musings are supposed to become public domain after a time. If you turn out to be a famous person (at the time of your death) someone will hang on to every scrap of paper in the hopes that it will be worth something.
However every e-mail you write is technically protected under copyright, and replying or worse, forwarding an e-mail is technically in breach of a dozen copyright laws. When should your e-mail become public domain? If that data is on your hard drive, there is some hope that it will be forgotten about, but as a Microsoft anti-trust cases showed, e-mail has a habit of copying itself in other places than your drive. After all, there are the recipients, and all the server between (and a few that shouldn't have gotten it in the first place).
When should this mail become public domain? 75 years after your and every contributor's death? Something like that is impractical. 100 years after the message is sent? 50 years? And what if the message contains still confidential information (like the secret recipe for Snickerdoodles & Chocodoodles)?

Silly idea? Old medical records do go “public", but these are usually stored in archives of interest to few (usually medical students and researchers who would be qualified to have access to the information in the first place).
“Would it be morally right to give public access to email & messaging accounts 100 years after they were last accessed ? How interested would the historians of the future be in a copy of bebo.com from 2005 ? Or the contents of the mailbox of a famous serial killer 50 years after they died ? I don't think we have the option of letting that sort of data lapse. It will be the clearest echo of society's global digital consciousness."

This is the first time that the general public have had their personal messages (not just) information stored. Should I be retailed for your grandchildren (but hidden from your prospective employer)? When should an e-mail be considered an orphaned work?

Backing away

Along with the problem of how long data should be retained, lets look at the actual retention problem. If you 'never throw out anything, in case it's needed', you have an increased storage problem. I hear the call of “backups"?

“As data volumes grow, you either have to put all your eggs in one basket, or have multiple baskets. From experience, it's so tempting to try consolidate your data in one place, to reduce admin overhead. Hopefully that one system won't have a buggy motherboard that's silently corrupting everything it writes. And it's really painful if someone accidentally deletes a few petabytes of data - copying from backups takes ages, for a start."
Or “bugs in archival software ("Yup, that's archived. Oh, wait. No..it isn't. The machine had a bad disk, software crashed, and reported 'everything OK' when it restarted...") and freaky network instability (guys doing rewiring, restarting cluster routers and maybe some dodgy cables) resulting in more than one machine reporting as being the 'one true repository' for a certain type of data."

So the backups might be a problem....

But let's assume that the backups are valid. Then you have 2 format problems.
We don't have the hardware which can read the tapes anymore.
This actually happened to me professionally. I remembered when the archives were made, and indeed the data was found. Documented in place A where where the off-site storage utility had the backups. However, the tape drives had been scrapped years before.
And those of you that remember the Domesday project know tha the BBC fell in to a similar problem.

But let's assume that the anarchic backup archive tape could get it's contents loaded on to a system you can use... can you read the data format?

Earlier this year, Microsoft released a service pack which purposefully disabled older file formats. So your carefully restored data might be unreadable to the world, and worse, yourself. In a business case, the original specifications (or recipe) might be needed. Or your great grandfather's proposal on an on-line forum to the woman you've come to know as your great grand aunt.

Is there a “fix" for this? Well making the older formats fall in to the public domain would help. After all, if you're not using them...

So who deserves the credit, and who deserves the blame

So the disk has crashed, who do you sue? It should be simple, but it ain't. Much like a delayed or canceled air flight is not the cause of refunds if the cause of the problem is beyond the control of the airline, there are ways a disk can go. Legally.

Usually a hard disk will crash in infancy (within a day or two of starting life), meaning little if anything has been lost and it's under warranty of the manufacturer. Or the disk will die was it approaches the end of it's predicted life (well after warranty). The fact that the computer is usually obsolete long before you take it out of the box isn't something to be considered.

And while I'm sure that back-up software and hardware has warranties, the legal click through probably covers some lost data. But since the cost a new hard disk is usually less than the lost of the backup measures... home backing up is rare.

In a corporate setting, the party that looses the data should be held liable, but I don't know of any cases in Irish law on data crashes. Data gong missing however...

it's a steal, it's a loss

Credit card data gets stolen. It's an identifiable crime. Who (other than the criminals) is liable?
Well was a reasonable attempt made to protect the data? If so, was it reasonable enough? Can you sue for loss of data? (and given the ability to reconstruct shredded credit card bills (cited at the start) are you the cause of the data breach?)

Apparently no. If data is lost (in the post) or stolen, there is no case until the data is used and a victim can be shown to have damages (or have lost money) from the act. If personal data goes missing, is there a lawsuit? Liable or slander is not applicable since the data suggests if not proves that the information about the victim is true. There are privacy charges, but currently there is no privacy law in Ireland. Direct financial damages are possible, but the cost of the case is usually more than the loss? And there is the time it takes...

In the case of the recent UK financial data loss a lot of the data is personal data pertaining to minors. In fact everything needed for identity theft for then the minor becomes an adult. So someone sitting on the data would wait 10 to 18 years to strike. Is there a statute of limitations (or similar) for data theft? Or in this case, identity stolen almost a generation ago?

Well, I have asked more questions than I've answered...

Anyone able to answer some of these too?

Take care,
William Knott

With kind thanks to John Looney of Google (for the tech and social angles) and Simon McGarr of Tuppenceworth.ie (for the legal questions and answers)

tags : , , , , , , , , , , , , , ,

Labels: , , , , , , , , , , , , , ,

Tuesday, November 06, 2007

In the spirit of Nicolai Ivanovich Lobachevsky

"I am never forget the day I first meet the great Lobachevsky.
In one word he told me secret of success in mathematics: Plagiarize!

Plagiarize,
Let no one else's work evade your eyes,
Remember why the good Lord made your eyes,
So don't shade your eyes,
But plagiarize, plagiarize, plagiarize...
Only be sure always to call it please, 'research'". -- Tom Lehrer (MP3)

Now then, I have no doubt that most of you think I'm talking about a little kerfulle (called copyright theft) between Daimen Mulley and Ace Internet Marketing, but not quite.

First off, Nicolai Ivanovich Lobachevsky really did exist, but there is no evidence that he plagerized. The entire Tom Lehrer was written for Danny Kaye was a joke but the above attitude exists... because it's legal!

Lobachevsky and Lehrer both work in mathematics... a hard science. And simply put, you can't copyright a fact.

Think about it. What colour is the sky on your planet? "The sky is blue" is a fact and not copyrightable. The line "well, I'm from Ireland so it's mostly overcast grey" is opinion, and so is copyrightable.

If you publish a non-fiction book, then you are stating that everything in it is fact. If someone then used the facts in your book as a basis for fiction, you shouldn't be able to sue. If its a history (or even a pseudo-history) book, then your "plot premise" is probably re-producable without you getting a penny.
However is large chunks of your text have been copied (and not text attributed from another source) then you have a stronger case.

So, if you are going to steal off someone, it is better to take the facts and then reproduce it in your own words.

What about fair use... well let me put it this way, if you take content form another site, make the entire chunk a link. If your post, article, whatever is almost entirely a link... you have breached fair use.

Unless of course you do something to turn it (or them) into a completely different work. Think Andy Warhol and his soup cans. Think any of those music mashups you can think of. Usually it forces you to listen to the sources with fresh ears.

Or think about it creatively. Take other peoples footage from across the world and edit it in to a cheap music video to make a point like Sarah McLachlan does in "World on Fire"



Or mix together 32 songs and 28 different dance routines to make something humorous (laugh-out-loud in places) like Judson Laipply does with "The Evolution Of Dance".



take care,
Will

tags : , , , , , , , , , , , , , , ,

Labels: , , , , , , , , , ,

The Spirit of John Philip Sousa

When the precursor to the phonograph arrived, John Philip Sousa was not a happy musician. He fretted that people would stop spending time singing the old songs and the songs of the day.

I don't know if he meant mixing them together, but he was right.
Sort of.

The law changed in such a way to make these mixes illegal. But Mashups still live...

Yup I'm going to Mashup University (and possibly Mashup Camp too). Sousa (and perhaps Larry Lessig) would be proud.

To see what I'm talking about here is Larry Lessig talk on Creativity recorded at TED in March of this year. And yes I so want to go to one of these.

Embedded video from T.E.D. Talks series



tags : , , , , , , ,

Labels: , , , , , , ,