ACM Queue recently posted an article (“What can software vendors do to make the lives of sysadmins a little easier?”, Dec. 22, 2010) that lists 10 total do-and-don’ts for making a system administrator’s job easier. The article itself has some interesting points; but, I’m going to give my own perspective on it’s points it makes.
DO have a “silent install” option
– I agree with this 110%. Having to go to 1 computer to click a button or choose some options is fine; having to go to multiple computers to do the same task is annoying (as well as counter-productive). I’m sure this is at least a big reason why most (if not all) package manages like yum, apt and pacman have an auto-yes/no-prompt option in them. The only downside to this I can foresee though is if you need to customize the install for a selection of computers. But, really, even if that’s the case, it should be rather easy to put those computers in a different queue.
DON’T make the administrative interface a GUI
– Personally, I see this as a common-sense one. While for desktop users it’s great to have some sort of GUI for programs, it seems rather illogical to use any sort of window manager for a server you’re going to be VNC’d to very, very rarely. Most system admins seem to just use tools such as Putty or a SSH client.
Another viewpoint to this though is what about web interfaces? Granted, it’s not truly a GUI set up, but would those be acceptable? My own taste says yes (a good example is SafeSquid’s administration…it’s all web-based). So basically, ditch the GTK+ designing and try to work on a web-interface; your system resources will thank you.
DO create an API so that the system can be remotely administered
– This is kind of hit or miss for me. While I agree being able to remotely administrate a program is a grand idea, I don’t see how it’s any different than creating an SSH session and viewing logs. This is a short comment for this, but it’s a pretty overly-done process. APIs are meant to expand the functionality of a program (i.e.: plugins). While the article makes a good point in it that it helps extend program functionality, I feel the author just didn’t truly address this point, and for good reason.
DO have a configuration file that is an ASCII file, not a binary blob
– Another 110% belief in. The main reason for this, as the article states, is the ability to use diff to know what changes were made.
For an example on this, lets compare to the two most popular browsers not based on operating system, Mozilla Firefox and Google Chrome. The way each browser stores it’s data is different. Firefox does it similar to Internet Explorer, in that it writes to the disk your URL history. Chrome, on the other hand, decides to use a database to store it’s data in (using SQLlite I believe, to be exact). In short, Firefox stores it’s data in ASCII, and Chrome stores it’s data in binary blob (trying to open up the database info in a text editor gives you a lot of unreadable text).
DO include a clearly defined method to restore all user data, a single user’s data, and individual items
– This was a big problem where I used to work. While there was a very well developed backup system we had in place, it wasn’t very well managed (we lost a good 2-3 months worth of backups one time and no one caught it for a month). The backup system did cover both points, but if you have to also be able to set some protocols on monitoring your systems, especially your backup systems. Unless there’s a reason not to, I fail to see why a simple cron job to run a backup script and send it to a backup server wouldn’t just do the task just fine. I guess I kind of veered of topic on this, but it’s still something to note here.
DO instrument the system so that we can monitor more than just, “Is it up or down?”
– Another situation I’ve experienced. Having some system available to actively monitor services is very important and well worth the investment. My personal liking is the Nagios/OpsView option (OpsView is just a forked version it seems of Nagios). It monitors all the system resources you ask it to, and works for more than just *nix-based systems. If you have more than one computer and/or server, this is definitely something to set up, and you’ll be thankful when your server shuts down without you knowing.
DO tell us about security issues
– This is more of a double-edged sword more than anything. Yes, you help out the community and the people using your services, but you also let the enemies know of a weakness. While I agree with the article in letting people know publicly, it’s always a tough call for any vendor to know how much is too much. However, it’s never a good idea to wait for a fix to occur before releasing a notice of an error, because then the enemy almost always has already won.
DO use the built-in system logging mechanism
– Yes, using syslog and Event Viewer (depending on the server’s OS) is great…perhaps even greater is it makes everything run a lot smoother. There’s really no excuse to not use a system’s already-available functionality for this, especially since they will always be more optimized for the system at hand.
DON’T scribble all over the disk
– This seems to be more of an issue with Windows servers than any *nix one, but still a good point made. *nix servers seem to have a pre-defined structure that Windows neglects to make it more user friendly, which also makes it harder to diagnose the hidden settings when something goes wrong.
DO publish documentation electronically on your Web site
– While this is whole-heartedly true for a Windows machine, for any system using manpages, it shouldn’t be necessary. The only positive side to using this instead of manpages is if the server is in an unbootable state of some sort. But, that point, I’m sure it’s still going to be no use unless you happen to have a LiveCD around somewhere.
All in all, the article was a good read, but in an opinionated view, made some mistakes in it’s points. Here’s a question to the readers of this blog, though…what do you feel is a do and/or don’t to make a system administrator’s job easier?