Data center incident or disaster?

25 pts.
Tags:
DataCenter
Disaster Recovery
Incident response
I am trying to develop the process which my company will follow in the event of a data center disaster.

Part of that process is to question whether or not a major incident has the characteristics of a true disaster and if a disaster should be declared (which would obviously kick off or recovery proceeses).

Can anyone advise on questions or things to consider when evaluating the impact, risk, and exposure of a major data center incident?

Thanks for any help!

Answer Wiki

Thanks. We'll let you know when a new response is added.

Problem management, which is closely related to incident management, analyzes the causes of incidents and identifies trends so that solutions to reduce the volume of incidents in the future may be developed.

How much time is involved in the data center for incident and problem management? Our partner, Metrics Based Assessments, recently did an analysis, based on data collected from hundreds of data center benchmark studies, to specifically identify the percentage of data center staff time spent on incident and problem management. The study analyzed these activities in following categories:

Incident Management Level 1: first level resolution, such as calls to the help desk.

Incident Management Level 2-3: actual resolution of production problems.

Problem Management: identifying underlying problems behind incidents, including root cause and trend analysis.

This is definetly what you’ll need to start developing the processes you need. These are the pillars your processes will be based on.

Discuss This Question: 4  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Pammyp33
    Thanks Bitraptor. If the Incident/Problem management teams determine that production will be affected for greater than 4 hours, we must start our disaster recovery processes as we will take a good 24 hours to recover to our hotsite and begin to take on production traffic again. So what I'm really after is some kind of criteria that an incident has reached, or is presumed to be on target to reach, which will trigger a disaster declaration.
    25 pointsBadges:
    report
  • Meandyou
    My place uses an estimated 24 hours of outage as the determination of a disaster. We do not have a "hot site" for fail over; we have to go to a cold site and reconstruct our systems. During D/R testing it takes 24 hours to build our "tier one" (critical) systems. While I understand trying to develop good procedures and plans, I personally do not get too bogged down in the semantics of incident vs disaster. It isn't the "word", it is the "outcome". After all, geologists refer to a volcanic eruption as an "event."
    5,220 pointsBadges:
    report
  • Sunsetrider
    We use a multi-pronged approach to defining a disaster. It involves an estimate of down-time, application priority, impact size, hydro power and hardware availability. If our online access is down for more that 30 minutes, or if a high priority application (inventory management) is experiencing problems, or our email system crashes (affects 100's of users), or we loose electrical power for more than 15 minutes, or if we don't have a duplicate replacement piece of equipment that can be instjalled within 4 hours, we quickly communicate with our disaster co-ordination team, discuss options and implement the appropriate recovery plan. This communication process usually involves about 3 - 5 people and is finished within 10 - 20 minutes. We have pre-defined issues that will cause us pain, along with solultions. As we encounter issues not covered by our solutions, we update our processes accordingly. As you can see, this is a never ending process. We also contact our senior and departmental managment to inform them of our decisions/progress.
    860 pointsBadges:
    report
  • Pammyp33
    This is really good discussion everyone! I really appreciate responses thus far...
    25 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following