Modern Network Architecture

Jan 16 2013   8:45PM GMT

Technical failures: Are they really technical failures?

James Murray James Murray Profile: James Murray

Walking to a new client site as a Seattle IT Consultant, it would be easy to blame the last guy for all the mistakes.  I have a 6 month window to repair the failures.  I also know that I can sell replace equipment that will last at least another 18 months before those systems fail.  So it’s easy to take potshots for a year or so, rather than identify the real problem.  I know I’ve said this before, but I believe all technical failures are actually business failures in disguise.  The reason we as IT experts don’t realize this is because we are IT experts not business experts.

In my blog article “The biggest root cause for IT failure…” I gave an example of how putting a server on a 5 year depreciation cycle could have saved a business from a catastrophic server failure.  Not just the failure, but also the lost productivity while the workers were fixing the problem.  We, as technicians, are all consciously aware that down time, means lost worker productivity.  The business owner realizes that lost productivity means lost profitability.  Since we don’t think in terms of profits, we often don’t think about the consequences of our behavior on the bottom line.  Since technology seems like magic to many our managers (and sometimes our technical peers) perhaps we should understand the consequences a little better.

Statistically 70% of all IT failures are due to human error.  30% are due to hardware failure.  Usually we see the hardware failures before we put the systems into production.  Later on we see hardware failures again after the server has passed its life design specifications.  If this true, then for the first 5 years in production, the majority of failures are caused by human error.  Should we be condemning the IT department for these human failures?

If, as I say, all technical failures are actually business failures in disguise, then I would have to say no.  Human error is part of the risk that managers are trained to account for in all business systems.  Ultimately and IT system, IT is actually a business system supporting a business process.  If we look at other departments we see that there are systems for reducing or eliminating the effect of human error.

For example: In accounting the CPA for the firm is held accountable by the Controller for the organization.  Management is held accountable by the board of directors and the stockholders.  Each business system includes a check and balance to keep everyone honest.  In most business departments if there is a failure, it’s usually because a business check was not being maintained.  In the IT department though, these types of business checks are not always in place.

As an example: When I started RAID arrays were monitored by the technician assigned to the server.  One of the technician’s job was to check the green lights on the servers.  If there was a red light, the drive had failed and needed to be replaced.  Yet year after year of seeing green lights, a technician might miss a red light.  Not just miss the light, but miss the light month after month.  Until finally the second drive in the array also failed.  Then the technician was often fired.  While this was big mistake, was it appropriate business process?

On a ship during a war, there was always more than one sailor searching the horizon for submarines.  Why?  Because one tired sailor could make a mistake.  We also can’t depend on radar or Sonar to always be right.  Sometimes bleeding edge technology can outfox cutting edge technology leaving the ship open to surprise attacks. Our fired technician may have functioned just fine up until that moment.  In know I’ve missed a red light periodically.  Thank goodness I had more than one set of eyes looking with me.  As a result we always caught the failed drive before it became a problem.  Perhaps the real problem wasn’t that the technician missed the red light, but that there weren’t many eyes reviewing the lights.

Today we have automated systems doing much of the review, but is it wise to depend exclusively on those systems?  We know that systems report only what they are programed to report.  So as part of our business process perhaps an additional audit of the systems might identify additional risk.  It’s when we assume that weakest link will never fail, eventually something will fail.  Whether that system is a technical or a manual process there will be failures.  Ultimately it’s not the technical systems fault any more than it is the technician’s fault.  There is a business process that should be catch failure before the system actually falls over.  This is why I say… all technical failures are actually business failures in disguise.

4  Comments on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Maddog555
    Sorry, I don't agree. You need to know that the business owners rely on you to define, design, acquire, monitor, assess, plan and schedule the replacement of their IT systems and assets. Technology is put up to businesses like it is magic, just as you say. You are the magician - keep the magic happening, and guide the business, don't blame them. Blame poor definition, poor design, poor acquisition etc etc, not the poor dumb schmucks you just hoisted your magic onto expecting they will suddenly become both enchanted and enlightened by your brilliance.
    10 pointsBadges:
  • TomLiotta
    You need to know that the business owners rely on you to...That is known. The point of the article is that it needs to be modified to follow sounder business practices.There is no reason to separate IT into a category of its own. The need for precision is no greater for an IT technician than for a CPA working on the business' federal taxes. There is no reason to hold IT technicians to a higher standard of perfection. A tax accountant can be expected to miss one, two or more possible deductions or taxes. Tax code is complex and constantly changing.There are business models that actually give control over planning and acquisition of IT-related equipment such as desktop systems for all users. But actual control over the budgeting and sufficient human resources are often held outside of IT.If that's the model, then accountability ought to stay directly linked with those authorities.However, underlying point isn't that there are zero technical problems. The point is simply that many problems that are labelled as "technical" should reexamined. For those that can in fact be traced to flaws in business processes, the resolution should be a business procedural change.A business-critical server that has long passed its expected life needs to be retired as a capital asset. If approval of costs involved in such a project is not forthcoming, the failure of the server shouldn't be viewed as a "technical" problem because it isn't. Asset management is where the business problem is.Tom
    125,585 pointsBadges:
  • James Murray
    I appreciate your comment.  It's hard to know what you are disagreeing with but it sounds like you are saying that the business owner should depend on you (or me) as the architect to keep the technology magical.Technology is like the rudder of a ship.  Managements job is to guide the ship.  The technology team's (internal or external) job is to maintain the integrity of the rudder, not "guide" the ship.  The C-Level executives need to be able to define the business functions of the rudder.  The work of the rudder should be measurable in a way that the IT team can be held accountable for the technology.In this way, the business management team can be held accountable for the failure when the ship ends up in the wrong place.  If it turns out the IT department has been "guiding" the ship in the name of maintaining "the magic" too often the ship ends up stuck on a reef.  The reef of course is catastrophic system failures, business loss or even bankruptcy.  Failures that could have been predicted and accounted for if the management teams didn't see technology as magic.  I will have to say though, you are not alone in your feelings.  My position is a controversial one.  I am a huge proponent of ITIL and other types of open business processes in IT teams.  I appreciate your comments and your point of view.      
    1,795 pointsBadges:
  • James Murray
    Tom,Thanks for you comment.  I think we are in agreement. As you said, too often we take on too much ownership of technical failures, when basic business best practices could have solved the problem. This sets and unrealistic expectation for the management team.  Now a failed server is "Luck" or "Magic" that nobody can do anything about rather than setting up a system that supports the planned obsolesce of the system.
    1,795 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: