I had really good success when I put out the call for the Archive Team, so let’s try that again, with an entirely new idea.
I would like to declare November 2012 the very first Let’s Just Solve the Problem Month.

Here’s how it works, and what problem I want to solve.
As that sexy pontificator Clay Shirky has said on several occasions, instead of getting hung up on whether Wikipedia is great or not great, instead realize that Wikipedia represents a massive expenditure of energy recovered from not watching television. Not only that, but Wikipedia is one of what could be many different things happening that benefit the world. All you need is a dash of organization, a clear set of principles, and off you go.
I buy into this.
I also buy into the idea behind National Novel Writing Month, which has at its core that everyone has at least one (incredibly shitty, possibly unreadable, vogon-level-quality) novel inside them, and by setting aside one month of you being encouraged, forced, guilted and tortured, you will blow out one 50,000-word novel in that time. What happens next is up to you – burn it and move on, take it aside and polish it until you’re the next JK Rowling (or Hunter S. Thompson), or whatever tickles your fancy. But at the end, YOU WROTE A NOVEL BEFORE YOU DIED. Not bad.
What I know to be true is that there are a number of “problems” out there that need to be solved, that need one single thing to push them from “impossible” to “solved”, or, at least, “1.0″. And that thing that it needs is a lot of human thinking. Often rote, often boring, but necessary, to slam that thing out.
So since I got to come up with this idea, let me declare the first month, November 2012, to be SOLVE THE FILE FORMAT PROBLEM MONTH.
Here’s the problem, in more detail:
In the last couple centuries, we’ve created a number of self-encapsulated data sets, or “files”. Be they letters, programs, tapes, stamped foil, piano rolls, you name it. And while many of those data sets are self-evident, a fuck-ton are not. They’re obscure. They’re weird. And worst of all, many of them are the vital link to scores of historical information.
Everyone knows this problem. It’s why old novelists cry they can’t pull their first novel out of Wordperfect. It’s why someone who used U-matic tapes to record the first meetings of a famous protest group goes “oh well”. It’s why, in all things, someone looks at anything older than five years, and goes “bye”, figuring there’s nothing they can do.
And I’ve had to listen to the mewings about this problem for at least 20 years now, in various forms. A lot. And then the person lights up about maybe solving this problem, and then dims and says “well, we can’t really solve the problem”. Because they know – it’d take an army of people to do it.
Let’s make that goddamned army.
And before I give you a battle plan, let me say: This will solve a major issue. This will give thousands, later millions, access to a whole range of materials now shut off from each other. Stuff being made after 2012 will be scrutinized to see if it has made ways to access it clear. Stuff made before? We’ll have docs, or a thread, or even a few first steps towards understanding what it was. People writing modern software will be able to make filters or plugins that use these standards – it’ll drop from being a needless rathole to being a simple matter of writing a perl library or a javascript routine to pull the data in and make it work with the new thing. That will be very helpful indeed.
Battle Plan:
In October, I’ll be making noise about this happening. We’ll have a logo, and we’ll have some preliminary work done.
It’ll be a big wiki, with people taking various roles of the exciting and boring parts, working on a structure, yanking in what we need.
We’ll scour the internet, and online and offline worlds to pull in every potential format ever. If it sounds like a hierarchy issue, yes indeed it is… but classification’s bugbear is a distant second to acquiring the wealth of formats now extant.
We’ll acquire examples of the formats, links to programs that deal with the formats, known variations or problems with the format, and so on.
We’ll keep doing this from the low-hanging GIF and JPEG and PNG documentation, to the aforementioned piano rolls, microfiche, obscure barcode formats and disk layouts of Cray platters. We’ll just keep doing it.
At the end of the month, having had our knees on the chest of this problem for 30 days, we’ll be dragged off the problem, kicking and screaming and still punching, and see where we are.
The resulting work will be open-licensed and available to anyone.
Now, if you just read all this, let out a big “pffffff” and are having your fingers twitching with the urge to write about how this is all impossible, just get the fuck out now. The project doesn’t need you, now or ever. Just enjoy the summer, grasshopper, and come knocking on the ant’s door in December when we’re at 1.0.
But if you read this and said “Well, I could take a shot at it, might be worth a few hours”, then you’re EXACTLY what is needed.
Think what giving a month every year will do for a problem like this. There’s plenty of others – but this is one that has vital meaning to the work I’ve done with Archive Team and to the hundreds of archivists and historians I’ve met over the past few years. If this problem is in some way handled, if an OED of formats is blown out, lives will change – projects thought undoable will be doable, and the flood of old information saved will be incalculable.
So who’s with me? SEE YOU IN NOVEMBER.
For More Information

New Topic/Question



MultiQuote













|