April 9, 2009 3:16 AM
Posted by: Derek Kuhr
Over the past few months, I have had several monitoring tool discussions with various clients. I am going to pull together some of my thoughts, experiences, and discoveries to walk through a framework for identifying monitoring needs and objectives. Hopefully this can help you think a bit out of the box and focus on the customer needs rather than a set of feature/function requirements.
Step 1 – Define Business Objectives
Before just diving into tools and features, it is important to build a prioritized list of business pains or objectives that will guide the monitoring requirements. In many cases, these objectives can have many common elements or can even be the same from business to business. Yet, every business has some unique aspects that need addressed. Often, this means stepping back away from “monitoring” and looking at what concerns or pains started the whole discussion. Examples of common business requirements may be:
- Contracts require timely responses to customer requests
- Sales are lost if systems are not available
- Decrease the amount of time users spend waiting for systems to respond
- Prevent unplanned system outages
- Enforce acceptable use policies
- Provide legal discovery capabilities
- Provide supporting information for employee termination
Step 2 – Convert Business Objectives to Technical Requirements
These concerns or requirements can then be translated into a combination of technical requirements. You need to look into the “who, what, when, where, why, and how“ of each concern to really understand them and all the components that are involved. For example, let us look at the first objective from above. You can use this list of questions or areas of discussion to go deep and get past the geek speak.
Who
1. Who accesses the application/system?
a. Internal users
b. External users
c. Clients
d. Business partners
e. Support staff
2. Who is the customer?
a. Industry
b. Location
i. Geographic
ii. Time zone
c. Financial impact on business
i. % of gross sales
ii. $ Gross sales
iii. Contribution towards net profit
iv. Overall impact of loss of customer (out of business, profit loss, etc)
d. Technical capabilities
e. Special requirements/needs
f. Overall number of users
g. Average concurrent users
h. Proximity to business
i. Relationship status (key, new, old/longstanding, family, parent/child corporation, etc)
3. Who is responsible for the various systems involved/impacted?
4. Who is responsible for maintaining the various systems involved/impacted?
5. Who is responsible for handling issues with the systems?
6. Who is responsible for handling issues with the customer?
7. Who with the customer reports on issues?
What
1. What is the purpose of this system(s)/application(s)?
2. What defines a customer request?
a. Process of request
b. Inputs provided
c. Outputs expected
d. People involved
e. Primary data sources
f. Supporting data sources
3. What defines timely?
a. Time to response available threshold(s)
b. Time to response generated threshold(s)
c. Time to response transmitted to customer threshold(s)
d. Time to log in threshold(s)
e. Time to wait on hold (if phone involved) threshold(s)
f. Response times between system components
g. Other timing components as can be determined by looking at the process.
4. What are the various systems involved/impacted?
a. Switches, Routers, Wireless, WAN, Internet, or other network components
b. Application Services, Databases, DNS, or other services
c. Power, cooling, and other environmental controls
d. Firewalls, IPS, Web Filters, Proxies, or other security services/equipment
5. What is the impact of the outage
a. Financial
i. Fines
ii. Fees
iii. Contract lost
iv. Refund
v. Other
b. Regulatory
i. Legal action
ii. Civil action
iii. Compliance
iv. Certifications/Authorizations/Registrations
c. Customer loses business
d. Customer retention
e. Bad PR
f. Public Outrage
g. Direct staff terminations (disciplinary action)
h. Layoffs
i. Other
When
1. When does the customer use the system?
a. 24×7?
b. Peak times
c. Off times (holidays)
d. Do they have their own customer waiting on them?
2. When can you perform maintenance?
a. Available 24×7, except during defined maintenance windows
b. When are regular non-outage maintenance windows
c. When are regular short outage maintenance windows
d. Is there a regular longer outage maintenance window
e. Is there a time of year when things are “slow” or less busy
2. When does the contract go into effect, or when did it go into effect?
3. When does the contract get renewed/re-evaluated/renegotiated?
Where
1. Where are systems located?
a. Centralized/Decentralized
b. Data Center/Business Office
c. Basement/Utility Closet/Server Room/Hardened Server Room
2. Where are internal users located?
a. Domestic
b. International
c. Business Office
d. Branch Offices
e. Data Center
f. Network Operations Center
g. Home
h. Mobile
i. PDA’s
ii. Laptops
iii. Kiosks
iv. Hotels
v. Airports
vi. Coffee shops
vii. Customer sites
viii. Others
i. Other
3. Where are the customers located?
a. Domestic
b. International
c. Single Site
d. Multiple Sites
e. Other
4. Where are customer users located?
a. Domestic
b. International
c. Business Office
d. Branch Offices
e. Data Center
f. Network Operations Center
g. Home
h. Mobile
i. PDA’s
ii. Laptops
iii. Kiosks
iv. Hotels
v. Airports
vi. Coffee shops
vii. Customer sites
viii. Others
Why
1. Why are systems located where they are?
a. Always been there
b. Disaster Recovery / Business Continuity
c. Budget/Expenses
d. Compliance
e. Other
2. Why are responsibilities assigned to individuals/groups
a. Historical (individual always been responsible)
b. Subject matter expert
c. Job Descriptions
d. Special skills
e. Business unit/organizational hierarchy
3. Why are systems configured as they are
a. Always been this way
b. Best practice recommendations
c. Consultant recommendation
d. Application vendor requirements
e. Required to be supported by vendor
f. Disaster Recovery / Business Continuity
g. Budget/Expenses
h. Compliance
i. Other
How
1. Do customers get to the system/application?
a. Dedicated circuits/connectivity
b. Onsite with system/application
c. Internet based
2. How is the system/application secured?
a. Username/Password
b. Token
c. One-time-password
d. Smartcard
e. SSL
f. IPSec
g. Firewall
h. IPS
i. IDS
j. Firewall ports opened
k. NAT
l. Proxy
m. Other
3. How is the application accessed
a. Web app (HTTP/HTTPS)
b. Fat app (Installed on customer system)
c. Thin client (RDP/Citrix/etc)
d. Other
Now that you have gathered all this wonderful data, you should have a better understanding of the business, the application, and the customer as related to this particular application or system. In reality, I would probably ask even more questions based on the responses and trying to keep it conversational rather than a survey, but I can only cram so much into this in a short amount of time. It may have been a painful and time-consuming process depending on the scenario, but the customer will likely appreciate that you have taken the time for a thorough evaluation and be looking forward to your recommendations. Also, you can reuse the majority of these questions on the other business goals as well. Yes, it’s rinse and repeat time. This was just the first business objective.
From the responses, you should be able to get a better idea of what is truly important versus assumed to be important or important based on individual perspective. This allows you to then start identifying the key SLA’s and target monitoring solutions to those SLA’s.
Lessons learned
· Always start with business objectives, and then break them down into components to discover all the components behind the objective that need fulfilled.
· Ask questions rather than jump to a solution, or you risk overlooking important details.
· A good rule of thumb is to try asking three questions for every question a customer/client asks you.
Until next time, keep on learning and asking questions!
April 3, 2009 4:24 AM
Posted by: Derek Kuhr
FirewallsEvery journey begins with a first step. With this post, I begin a new journey of self reflection and discovery. Going forward, I intend to regularly share experiences from working with clients and other IT pro’s. I’ll include a healthy amount of technical information, along with the design decisions driving the architecture and overall solution discussed. I also hope to step beyond the geek speak at times to help bridge the gap between technical and business goals/objectives, as they don’t always line up directly.
My hope is that what I share can help others if they encounter similar scenarios or issues. I will never claim to be the expert at everything, and I will call out others who have helped out along the way. It’s important to realize that behind every successful individual, there is usually a team that has played a part in the success, whether directly or indirectly. In many cases, individual success reflects the success of the supporting team as well.
For an initial thought to consider, how important is security-in-depth within a network architecture?
Today I presented an executive overview of an edge network restructuring project to a customer to finally wrap up the project. At the end of the presentation, the CIO openly stated that he did not care for the security services between network zones/segments. In this case, it is a higher education customer, with student, internal, and other zones. We had created zones on the firewall (SonicWall NSA e7500) to segment the traffic and provide UTM to prevent students from hacking the internal network. SonicWall Unified Threat Management includes Firewall Integrated IPS/AV/AntiSpyware/Content Filtering/Traffic Policies and Wikipedia has a more general description of UTM. This was all described in the original project scope as approved 6 months ago, and the firewall has been in place for over 3 months.
Some more discussion uncovered that they were concerned about bottlenecks that a 3rd party network analysis had reported, in particular regarding the new firewall. The 3rd party recommended architecture was to have a core switch do the routing between segments instead of the firewall. Well, in a traditional corporate network, I would likely agree. In this case though, there are real security concerns about using basic ACL’s on the core switch to control the traffic flow. With UTM monitoring the traffic allowed between zones, application access can be granted, yet traffic is scanned to verify it is not malicious first.
Additionally, this firewall is sized to handle the production traffic, which basic network link reporting backed up. In reality, the existing Cisco switches in the edge network were only 100MB, and the firewall is running with 1 GB interfaces. Thus, the there were times that that monitoring showed the switch interfaces to the firewall maxing out, thus the switches were the bottleneck.
Finally, some of the 3rd party testing results were inconsistent (I question their test point placement), and some of the recommendations were made based on assumptions regarding firewall performance. From the recommendations, I suspect the 3rd party is not familiar with the actual equipment installed, since they were recommending a competitive platform.
Another point of customer concern, and my mutual concern was a traffic shaper that was put in, but not performing properly and causing network interruptions. This unit had been down several times over the past few months, and it happened to go down yesterday and was powered off, since it still was causing slowness when in bypass mode. Basically, the customer said that they can not handle more troubleshooting in the production network and either it had to go or we did. Also, they expressed concern about their primary LOB vendor not being involved in the shaper deployment, as they had said they could not provide support for the application if there was a shaper on the network. This should have been addressed during the pre-sales cycle, and personally if this was a concern, they should have brought it up months ago before signing onto the project. We said we would have a hard discussion with the manufacturer in the morning, and moved on…
At this point, the discussion then shifted to the customer needing a proposal for an internal network architecture review and planning session by 5pm tomorrow (less than 24 hours)… This was something I’d already discussed with various members of the technical team, and apparently staff had been having separate discussions at a management level. Interestingly, the two paths crossed, and now we shift from defending the architecture to an opportunity to provide architecture recommendations for the rest of the campus network within a 15 minute span. Amazing how an open discussion and willingness to listen to the customer can change the tone.
Lessons learned
- Make certain that proposals are approved beyond middle-management and meet C-level expectations and business requirements. Technical features are not always valued by management unless they directly solve a business issue. For every technical requirement in a proposal, there needs to be a corresponding business requirement to clearly link the value proposition.
- Be open minded and willing to listen to additional concepts or architectural options. UTM is a relatively new concept, and it has not received the blessing of all vendors (primarily due to performance impact if not backed by solid hardware). This has lead to customer confusion and a secondary post-install sales cycle to convince the customer that the features are beneficial…
- Wrap everything with monitoring so the customer can tell what is going on and be able to respond appropriately. In this case, the customer hadn’t fully discovered all the monitoring capabilities, thus was not able to validate/invalidate the 3rd party reports of firewall overloading. The monitoring also provides the benefit of being able to validate your own design/architecture.
- Learn about competitive products and alternative architectures. There are times you may need to work with them, and there are other times you will need to be able to clearly distinguish between the architectures in a competitive situation. Just saying your way is better doesn’t cut it, you need to be able to build a point-by-point case. If you can’t, then maybe you aren’t offering the best solution… The concept of walking away from an opportunity because the competition has a better solution will be an interesting discussion for another day…
Another day gone
I hope this has been a useful post. Look for more soon. Until then, keep on learning!