computer commentary
esterday I rolled out a Linux production server for a small company. The total time I spent, from downloading the OS until the server went live, was about 48 hours. Even though I was using a fresh, plain vanilla OS, every single one of the programs I installed either refused to compile or crashed with fatal errors. You can see some of the problems I had here, here, here, and here. Getting them to run was a hair-pulling exercise that took an entire 12-hour day. This is too long.
First some background. This company's network was a disaster. Their management had decided to save money by refusing to maintain their infrastructure. They had no system administrator, and they were using cheap external USB drives as the only backup of their irreplaceable proprietary files. I found a large pile of dead USB drives, including an expensive one with a hot-swap hard drive, in their wire closet. Every single one of them had failed, so they just kept buying new ones. When the last one crashed, they finally gave up and stopped doing backups altogether. No one had done any security scans, maintenance, or even looked at the logs in years, while their hardware continued to deteriorate.
Management did nothing, despite repeated warnings from the technically literate members of the staff, until the main hard drive on the server crashed, causing the entire house of cards to collapse. Suddenly the boss could not read his email or surf the web. It thus became an emergency, an order for new yucko-brand servers was rushed through, and I was brought in to get them working.
For those who've never done it, setting up a real server is quite different from setting up a Linux desktop, which can be done in a couple hours. In a company, you have multiple email clients using Macs, T-bird, Webmail, and various flavors of Outlook dating back to the sixteenth century. They use different protocols and sometimes even different ports, and getting them all to talk to your server at the same time is not trivial. On the server you also have Apache and PHP scripts, antiquated custom RS-232 programs, databases, and backups that have to run exactly as before.
In this environment, the stock services included in a distro don't cut it. You have to compile almost everything, including the mail and Web servers, and deal with dependencies and libraries that have been changed, seemingly at random, which causes the new stuff to crash or fail to compile. All the while users and managers come in every five minutes, disrupting your concentration to tell you that it's not working yet. I also had to deal with the boss coming in and saying things like, “If you can't get this working, we'll lose millions of dollars, our investors will pull out, we'll lose the building, and I'll have to fire everybody.” But, umm, no pressure.
The original version of the OS wouldn't install on the new servers, so we had to switch to a different version with lots of shiny new bugs.
Oh, and did I mention they wanted to switch ISPs and reconfigure their primary DNS server at the same time? Yes, their copper lifeline to the outside had deteriorated. Whether it was the phone company or those darned chipmunks again, I don't know. But their T1 was going down practically every day for hours at a time. So they also needed me to reconfigure their router, and I drained my cell phone battery talking to tech support at the ISP. (Did I mention they saved even more money by disconnecting most of the phones?)
We always hear the stories about the poor tech support guys and the dumb questions that users ask. But boy, I could tell you some stories about the nutty things those guys at tech support at this ISP told us to do. But that's a Web page for another day.
If I have to work until 8:00 at night trying to get things to compile, it makes Linux look bad to my boss. And it makes me look bad for recommending it. I'm a research guy. Fixing other people's software isn't my job, and even if it were, these days, even if the changes were documented, I wouldn't have time to read it. And even if I did, I still can't fix that software, because if our company ever starts being profitable they're going to hire some specialist company to handle their IT. That company will not know how to do any of this, and they certainly won't have time to read my documentation. We kinda need this stuff to work, or at least compile, out of the box, on every system. A formal way for programmers to test it (maybe a standard test suite or a standard system) would help. So would more standardization in the OS. If we don't get it, watch out for an automated “solution” to come along someday and put us all out of work.
If I have to spend time guessing which version of PHP I need to install to get webmail working again, because somebody made a change to the language, or if I have to work until 8:00 at night trying to get sendmail to compile, which it suddenly stopped doing because of some uncoordinated change in some library somewhere, it makes Linux look bad to my boss. And it makes me look bad for recommending it. That's a mistake I won't make twice.
Sure, when you finish fixing everything the end users are all very grateful, and you get the same feeling that construction workers must get when they finish a skyscraper and it doesn't collapse into a pile of rubble as soon as they walk off the construction site. But here's the problem: halfway through the install, the company's boss started thinking out loud about how nice it would be to switch over to a Windows server, where you just point and click and, he thinks, everything just works out of the box.
Now, I get an outbreak of Tourette's every time I think about Windows, but managers like this one are the biggest danger that Linux faces today. Even though server sales are overwhelmingly Windows, Linux is reportedly running on between 60% and 80% of the world's Internet servers. What accounts for the discrepancy is that most servers come with some version of Windows pre-installed, and the Windows is unceremoniously wiped and replaced with Linux. But unless Linux gets a lot easier to set up on a server, it will become harder and harder to justify doing that. Managers are willing to throw $10,000 at a visible problem, but they see personnel costs and behind-the-scenes management as an ongoing expense. There are no rewards in industry for having a disaster not happen, but as long as they can find someone else to blame when it does happen, they have lots of incentives for not doing maintenance. Point-and-click has a strong appeal to them, because most of them already know how to do that.
Forget about trying to displace Windows from the desktop. Forget about making major changes to the OS which have as their main benefit cutting boot-up time from six minutes to five minutes and 45 seconds. Admin time is still measured in weeks, and that, not Microsoft, is what will kill us. Time is money. Linux has gotten too complacent about dominating the server market. If it loses the server market, it will also disappear on the desktop, because only hobbyists and experimenters will be using it. And it must advance or the managers and bean-counters, who control where the money goes, will eventually get their way, and Linux will lose the server market.
So here are some ideas.
And one more thing: we need a portable file format for LaTeX, so users can send each other LaTeX files instead of having to strip out the formatting and convert them to MS-Word. Most publishers don't accept LaTeX files, because they are not portable. Every font or package in LaTeX should either be standard on all systems, or included in the document (or, at most, one file that accompanies the document) so we can guarantee it will work and render properly on every computer. If not, LaTeX will disappear, and we will all be stuck with MS Word forever.