computer commentary

What does Linux need?

Name and address

Y esterday I rolled out a Linux production server for a small company. The total time I spent, from downloading the OS until the server went live, was about 48 hours. Even though I was using a fresh, plain vanilla OS, every single one of the programs I installed either refused to compile or crashed with fatal errors. You can see some of the problems I had here, here, here, and here. Getting them to run was a hair-pulling exercise that took an entire 12-hour day. This is too long.

First some background. This company's network was a disaster. Their management had decided to save money by refusing to maintain their infrastructure. They had no system administrator, and they were using cheap external USB drives as the only backup of their irreplaceable proprietary files. I found a large pile of dead USB drives, including an expensive one with a hot-swap hard drive, in their wire closet. Every single one of them had failed, so they just kept buying new ones. When the last one crashed, they finally gave up and stopped doing backups altogether. No one had done any security scans, maintenance, or even looked at the logs in years, while their hardware continued to deteriorate.

Management did nothing, despite repeated warnings from the technically literate members of the staff, until the main hard drive on the server crashed, causing the entire house of cards to collapse. Suddenly the boss could not read his email or surf the web. It thus became an emergency, an order for new yucko-brand servers was rushed through, and I was brought in to get them working.

For those who've never done it, setting up a real server is quite different from setting up a Linux desktop, which can be done in a couple hours. In a company, you have multiple email clients using Macs, T-bird, Webmail, and various flavors of Outlook dating back to the sixteenth century. They use different protocols and sometimes even different ports, and getting them all to talk to your server at the same time is not trivial. On the server you also have Apache and PHP scripts, antiquated custom RS-232 programs, databases, and backups that have to run exactly as before.

In this environment, the stock services included in a distro don't cut it. You have to compile almost everything, including the mail and Web servers, and deal with dependencies and libraries that have been changed, seemingly at random, which causes the new stuff to crash or fail to compile. All the while users and managers come in every five minutes, disrupting your concentration to tell you that it's not working yet. I also had to deal with the boss coming in and saying things like, “If you can't get this working, we'll lose millions of dollars, our investors will pull out, we'll lose the building, and I'll have to fire everybody.” But, umm, no pressure.

The original version of the OS wouldn't install on the new servers, so we had to switch to a different version with lots of shiny new bugs.

Oh, and did I mention they wanted to switch ISPs and reconfigure their primary DNS server at the same time? Yes, their copper lifeline to the outside had deteriorated. Whether it was the phone company or those darned chipmunks again, I don't know. But their T1 was going down practically every day for hours at a time. So they also needed me to reconfigure their router, and I drained my cell phone battery talking to tech support at the ISP. (Did I mention they saved even more money by disconnecting most of the phones?)

We always hear the stories about the poor tech support guys and the dumb questions that users ask. But boy, I could tell you some stories about the nutty things those guys at tech support at this ISP told us to do. But that's a Web page for another day.

If I have to work until 8:00 at night trying to get things to compile, it makes Linux look bad to my boss. And it makes me look bad for recommending it. I'm a research guy. Fixing other people's software isn't my job, and even if it were, these days, even if the changes were documented, I wouldn't have time to read it. And even if I did, I still can't fix that software, because if our company ever starts being profitable they're going to hire some specialist company to handle their IT. That company will not know how to do any of this, and they certainly won't have time to read my documentation. We kinda need this stuff to work, or at least compile, out of the box, on every system. A formal way for programmers to test it (maybe a standard test suite or a standard system) would help. So would more standardization in the OS. If we don't get it, watch out for an automated “solution” to come along someday and put us all out of work.

If I have to spend time guessing which version of PHP I need to install to get webmail working again, because somebody made a change to the language, or if I have to work until 8:00 at night trying to get sendmail to compile, which it suddenly stopped doing because of some uncoordinated change in some library somewhere, it makes Linux look bad to my boss. And it makes me look bad for recommending it. That's a mistake I won't make twice.

Sure, when you finish fixing everything the end users are all very grateful, and you get the same feeling that construction workers must get when they finish a skyscraper and it doesn't collapse into a pile of rubble as soon as they walk off the construction site. But here's the problem: halfway through the install, the company's boss started thinking out loud about how nice it would be to switch over to a Windows server, where you just point and click and, he thinks, everything just works out of the box.

Now, I get an outbreak of Tourette's every time I think about Windows, but managers like this one are the biggest danger that Linux faces today. Even though server sales are overwhelmingly Windows, Linux is reportedly running on between 60% and 80% of the world's Internet servers. What accounts for the discrepancy is that most servers come with some version of Windows pre-installed, and the Windows is unceremoniously wiped and replaced with Linux. But unless Linux gets a lot easier to set up on a server, it will become harder and harder to justify doing that. Managers are willing to throw $10,000 at a visible problem, but they see personnel costs and behind-the-scenes management as an ongoing expense. There are no rewards in industry for having a disaster not happen, but as long as they can find someone else to blame when it does happen, they have lots of incentives for not doing maintenance. Point-and-click has a strong appeal to them, because most of them already know how to do that.

Forget about trying to displace Windows from the desktop. Forget about making major changes to the OS which have as their main benefit cutting boot-up time from six minutes to five minutes and 45 seconds. Admin time is still measured in weeks, and that, not Microsoft, is what will kill us. Time is money. Linux has gotten too complacent about dominating the server market. If it loses the server market, it will also disappear on the desktop, because only hobbyists and experimenters will be using it. And it must advance or the managers and bean-counters, who control where the money goes, will eventually get their way, and Linux will lose the server market.

Ideas

So here are some ideas.

  1. Currently, each distro has a different, custom GUI that's used to configure the system. So many new admins no longer take the time to learn the command line. I have even seen people using a GUI just to change an IP address. This makes it like Windows, where most people need to be at the console or, at best, on a remote desktop, to make changes. That won't work when the boss calls you up while you're in your prison cell, or wherever, and you have to configure a server using only your cell phone. Give up the command line, and you surrender a vital advantage that Linux has over Windows. If a distro provides a GUI, it should parallel the command line and display every actual command it uses, so new users automatically learn them.
  2. The main reason for the success of Perl is CPAN, which automatically tells you what files you need, downloads them into the right place, and tells you exactly what to do when there's a problem. RPMs are stone-age tools by comparison. Lots of times compiling from source is actually easier than using an RPM, because unlike RPMs, which often say basically “It didn't work”—at best—the configure scripts for the source at least tell you what libraries you need. Sometimes. We need something better.
  3. We need strict rules about where various types of files should go. The current recommendations aren't being enforced. One thing that makes Windows so bad is that libraries and binaries are scattered randomly, even mixed in with user files, so when the OS craps out you have to install most programs from the CD that came with it. Linux is starting to get the same kind of rot. Just to give one example, I have seen config files in /etc, /usr/etc, /usr/local/etc, /srv/etc, /usr/srv/etc, /lib, and half a dozen other places. Even on my desktop, I have five different httpd.conf files in different locations. The only sure way to know which is the real one is to run strings against the binary.
  4. Bring back static binaries. At present, it's a royal pain creating them for any sizable program, at least using gcc. Without them the end users have to compile their own software or use whatever is included in their distribution. Windows has no problem with creating static binaries. Distros should include static versions of every library on the system.
  5. A general, standardized, command-line configuration script system for the OS and for software packages is needed. It should let you decide what features you need, check them for consistency, and then, go out on the Internet, download whatever it needs, install it, and configure it. If a problem happens, it should tell you what went wrong and what you can do about it, and fix it.
  6. Stop changing the languages! The C and C++ compilers (gcc and g++) have finally started to settle down. But if a new feature can't be added without breaking something old, that's a design flaw, and it should mean you can't add the new feature, no matter how cool it is. Don't break the old stuff that people might be using, or they won't trust the new stuff. Just try getting a ten-year-old Python-based program to run on a new computer. It's almost impossible.
  7. Be conservative with system changes. On one of my Linux boxes, the kernel page-faults when I run a certain script, because of the new systemd stuff. On another, X11 hangs whenever my browser hits a page with Flash. Sure, I can disable flash and not run the script, but these things happen because of Windows envy. If you make the system too much like Windows, there will be no reason not to use Windows.

And one more thing: we need a portable file format for LaTeX, so users can send each other LaTeX files instead of having to strip out the formatting and convert them to MS-Word. Most publishers don't accept LaTeX files, because they are not portable. Every font or package in LaTeX should either be standard on all systems, or included in the document (or, at most, one file that accompanies the document) so we can guarantee it will work and render properly on every computer. If not, LaTeX will disappear, and we will all be stuck with MS Word forever.



may 25, 2013

back