I am researching the following for an all Windows Servers: I need a toolset that will enable me to quickly (within a few minute) determine the root cause of an outage AND proactively monitor performance of all applications to respond to potential service interruption issues. Some pain points are-Event log correlation-What product can help me correlate all event logs from each of the 1200 servers in my DC to a central management database to alert me and the staff 24/7 that something is wrong. Configuration Management-Over coming issue where the customer changes something on the server breaking it and then they state they did nothing to cause the outage. Since I don’t have the luxury of a fully integrated AD domain the solution would have to be able to work even in a work group environment. Thank you, _John Ingram
Software/Hardware used:
HP DL Servers with 2003 and 2008 Operating Systems
ASKED:
February 22, 2011 11:18 PM
UPDATED:
February 23, 2011 1:59 PM
kind of a hard order to fill. If a unit is off-line (disconnected from the network), best chances are that no application can probe it to tell you why. Unfortunately, there is only so many preventive and reactive processes available that can lessen the “human Error” factor.
Like Saturno stated, Splunk is a great program, and most “monitoring” applications give you info “after the fact”…
I HAVE USED WHATSUPGOLD WHEN I WORKED IN A ISP. IT IS GOOD SOFTWARE TO MONITOR ALL NETWORK.
You can use MS SCOM as well for all your Microsoft Operating Systems, but like saturno said finding the root cause like network that was down etc, i haven’t seen something that can do this at the moment,
Regards