Posted by: ITKE
Andrew Kutz, disaster recovery, Linux Done Right, UNIX
Log files may be the most important piece of forensic information we have when determining why a server or application crashes. However, warnings of such a distaster are available to IT administrators. They just have to know where to look (hint: what do you think log files are for?)
Looking for a repeating pattern in a list one thousand items long might seem daunting, but luckily there is help. There’s no need to fear, Splunk is here.
Splunk is an amazing little web application (currently at version 3.1.3) that indexes just about any type of log file you can think of. Not only does Splunk index the information, but it presents it as a beautiful, easy-to-use, web application (purists need not worry, you can access the information from a terminal as well.) So you say, what is the big deal about searching log files? You say that you can do that with grep. That is true, but Splunk is hundreds of times more powerful and excels in four areas:
Splunk can index logs from a number of sources:
- Files and directories
- FIFO queues (pipes)
- Network ports (syslogging directly to Splunk)
Splunk enables you to tail log files, the contents of entire directories, pipes, and even open ports for applications to send their logs directly to Splunk itself (although I recommend using a separate syslog server in order to maintain a file-based log rotation history.)
Also, Splunk is more physically appealing than grep (no offense, grep). To give you an idea of what data looks like in Splunk take a gander at this screenshot:
Looking at log files has never been so much fun!
This is where Splunk really outshines its command line competition. Imagine you wanted to comb your log files to figure out which VM has had the most number of VMotion events in your VMware Infrastructure? With Splunk that is as easy as pie — a pie chart, that is:
Splunk allows you to easily query the data using SQL in order to build complex analysis reports. And if that was not enough…
Splunk not only allows administrators to easily determine the goings-on of their servers through log file analysis, Splunk also allows administrators to share their logs with the rest of the Splunk community. Imagine this scenario: a major website’s web servers are crashing and the website’s administrators cannot figure out why. As an interner business, their primary point-of-sale is the web; so if their web servers go offline that is very bad. The administrators are pulling out their hair trying to figure out the problem when one of them realizes they haven’t checked Splunk. Because the administrators at Amazon are participating in SplunkBase they can analyze not only their log files but also the logs of anyone else who uploads logs to Splunk’s community. Bingo! They discover that the problem was a lock that was not getting destroyed.
By themselves, the administrators did not have a large enough data set to determine the problem, but because others had generated similar logs and figured out the problem already, the website admins were able to quickly resolve the issue.
I’ll say it again, Splunk is great. Apart from VMware Server, Splunk may be my favorite server application to come along in the past few years. I cannot imagine running an enterprise data center without Splunk. See you on SplunkBase!