randombio.com | computer commentary
Sunday, March 13, 2016

Email as a cloud storage mechanism

People are using their mail server as a form of online storage.


Today's blog post will be shorter than usual because it's the start of Daylight Savings Time. It takes me 23 minutes to set all my clocks forward and 23 minutes to set them back. So I gain only 37 minutes in the fall and lose 83 in the spring. The rest is wasted by the government's time tax.

They should put both time zone changes on the same day—preferably April 15. Make that one awful day an hour shorter and leave the rest alone.


L ast week at work, after creating a powerpoint the boss needed to replace the one he lost for the umpteenth time, I amused myself by checking on the server to see which one of us has the biggest inbox.[1] As I learned watching the presidential debates, everything that has a size is a proxy for the size of the one, most important, guy thing.

Surprisingly the top dog was only No. 2. The biggest belonged to the new guy, who has 2,407,618,646 bytes of stuff he hasn't read. That comes out to 421 million words, which the Internet informs me is enough to keep the average reader busy for 3.564 years.

It's not his fault. Most of it is probably not even readable content. Our users are using their inbox as indexed storage space. It's 2016 and nobody has ever successfully introduced a filesystem with searchable metadata, so our users innovate by storing their files on the mail server.

One user's home directory only has 5 files, most of which seem to have gotten there by mistake (names like the proverbial exit exit stop Ctrl-C ZZ quit how do i get out of this stupid editor and of course the infamous C:\nppdf32Log\debuglog.txt). I have 18 of these myself, but this person has nearly 100 gigabytes of stored mail[2]. For all intents and purposes, mail is his storage medium. And he's not alone.

A paper by Alexander Ames et al. (Proc 22nd IEEE MSST 2005) put it this way: ‘It's often easier to find a document on the web amongst billions of documents than on a local file system.’ They invented a new filesystem called LiFS (Linking File System) which extends filesystem metadata to include user-specified key-value pairs and links to represent relationships among files. Everything is stored in hash tables.

Many computer labs are working on things like this, like Microsoft's WinFS (Windows Future Storage), which was supposed to allow SQL-type queries. WinFS was scrubbed in favor of the cloud, and so far very few apps or utilities are metadata-friendly.

Someday we'll have true content-addressable storage in our computers and metadata won't be needed. Content-addressable storage, where everything is meta to everything else, is a prerequisite for any artificial intelligence, and artificial intelligence will soon be a prerequisite for getting any work done. We'll do away with files altogether and just store the knowledge on our computers.

But for now people make do by giving their files ridiculously long filenames. When a file gets lost we Linuxers can use grep and pdfgrep to search their content, but it's tough for Windowsers. I still end up with filenames like these:

aai-endotoxin-report-AB3856-bryostatin-for-injection-i100-ug-vial-Lot-01-208-30Apr2015BRPI-signed.pdf

patheon-form-TAQ2013-development-and-clinical-supplies-non-sterile-technical-assessment-questionnaire_0708.doc

involvement-of-cell-surface-glycosyl-phosphatidylinositol-linked-aspartyl-proteases-in-alpha-secretase-type-cleavage-and-ectodomain-solubilization-of-human-alzheimer-beta-amyloid-precursor-protein-in-yeast-komanojbs1998.pdf

If it weren't such a pain to replace spaces with dashes I'd probably put the whole abstract in the filename. We really, really need metadata but those crumb-bums in Redmond won't let us have it.

It's a corollary to Moore's law: every ten years the number of files on our computer that nobody knows what they do doubles. Without metadata their names are meaningless and everyone's afraid to delete them lest they turn out to be something important.

But our users, like the guy with 2GB in his inbox, have found a solution. They mail their powerpoints to themselves and give them subject names like powerpoint for thursday's big important meeting at xyz company. To make sure it doesn't get lost, they cc it to themselves. I'm so proud of their ingenuity I don't have the heart to tell them how fragile it is.[3]

Anyway, they need it because their backup device stopped working about a week after they cancelled their IT contract to save money. This suggests another corollary: whenever you lay somebody off, your need for that person increases in proportion to how much you thought you didn't need them.

It used to be said that every application expands until it can do mail. Now email and browsing are all people know, so those two applications expand until they can do backups, remote access, and computations. But it's not just due to lack of knowledge. The typical firewall blocks everything else because IT's idea of security is to keep turning stuff off until the boss starts to complain. All the boss knows is browsing and mail, so everything else gets blocked and all our traffic gets funneled through 80, 143, and 993.

Last week I clicked on the NLM website to check for new articles on the disease we're supposed to be curing. Almost immediately IDS alarms started going off left and right on my computer as IT hit it with another Nessus scan looking for services I forgot to block. If they didn't already know all there is to know about computers, I'd suggest to them that maybe it's not such a hot idea to put all 32,000 users at our site on the same 10.x.x.x using randomly assigned addresses, thereby making it impossible for anybody to restrict access to anybody. But then, the goal of security is not security. The goal is to make computers less useful so IT can step in and fill the newly created need, making themselves indispensable in the process.

The TSA uses the same philosophy. It's just a matter of time before IT starts groping us in the crotch like TSA when we come in to work. Not for security, but to make our lives a little more oppressed. Asserting animal-like dominance over us is how those in authority keep us in line.

Luckily the Nessus server's IP is static and it shows up in my logs. That one gets blocked. We are secure, you crumb-bums!

In the old days they used delay line memory: a piece of nickel wire would get twisted using an electromagnet. That created a mechanical torsion wave that traveled down the wire. With a long enough wire you could store almost a kilobyte by copying your data to the wire every 500 microseconds.

With the cloud that's coming back: we'll copy our files to somewhere in the cloud and then retrieve them every 500 microseconds or so. Our files will never actually be stored anywhere. They'll spend all their time nowhere in particular, moving back and forth.

Just as in Keynesianism, where money is magically created from nothing by moving it from person to person faster and faster, we'll increase our file capacity exponentially by increasing the velocity of files swirling around in the cloud. What could go wrong?

Another idea that IT got from the government. Cross-fertilization is a wonderful thing.


1. Which is perfectly legitimate, since everyone with an account has access to the list. In my case it's part of my job anyway.

2. I only know this because I am sometimes asked to fix it when something goes wrong.

3. For example, some email clients periodically ask if you want to save space by deleting all your old saved messages. Others do it automatically.

On the Internet, no one can tell whether you're a dolphin or a porpoise
Name and address
back
book reviews
home