Last week, Eric Bush, Data Center Operations Supervisor at hosting giant The Planet, posted a checklist for data center perimeter patrols.
The perimeter patrol is an integral part of our data center operations designed so our staff can constantly monitor and control the data center’s status and its operational readiness. Each patrol takes around an hour to perform and is a top-to-bottom inspection of our facility and server environment.
The list provides a great guideline for ensuring your data center’s health.
Link VIA Rich Miller’s Twitter feed.
The Green Grid has another tool for data centers to measure their companies’ efficiency, saying it could be a valuable way to communicate with upper management on the state of the data center.
The IT efficiency-focused group has published a new paper on the “productivity indicator.” Christian Belady, the principal power and cooling architect at Microsoft who was the driving force behind the PUE/DCIE metric, edited the paper and said it should be used as “a communication tool” between various members of a company – IT workers, data center facility folks and company executives.
“What this does is give you a quick visual of how you’re doing, especially if you’re communicating up to executives,” he said.
The paper suggests building a radial graph with five “spines,” with each spine representing a metric:
- Server utilization: The activity of the server processors relative to its maximum ability in the highest frequency state.
- Data center utilization: The amount of power drawn by the IT equipment relative to the actual capacity of the data center.
- Network utilization: Percentage of bandwidth used compared to bandwidth capacity.
- Storage utilization: The percentage of storage used compared to the overall storage capacity.
The paper doesn’t say how to come up with each of these numbers, but there are tools and software out there to get the data points for each of them (see the definition of storage utilization, for example). And if for some reason you are still trying to figure out how to measure network utilization, for example, you can still plot a graph using the productivity indicator, but with fewer spines. Here is a sample picture of a productivity indicator radial graph:
Belady and John Tuccillo, a Green Grid member from APC, said businesses can add target lines if they want as well. They could have targets for six months out, a year, and 18 months out. Companies can use it for whatever data center metrics they’re actually using. So if it’s not a pentagon, it might be a square or a triangle with four or three data points, respectively. In which case it might look like this:
Companies can also break down one of those categories, such as data center utilization, into a more detailed radial graph all its own, such as this:
They emphasized that this isn’t something that companies should use to compare to other companies. Instead, it’s a way for businesses to realize their existing energy situations and set target goals for themselves down the road.
“Different companies have different risk thresholds. A business may say, ‘You know what? My storage utilization, because of my business plan, should only be at 50%,'” Tucillo said. “One of the strengths of this tool is that it allows for the end user to weigh the spines to what their business practice is.”
Jeremy Porter, Senior Internet Data Center Architect at data center and managed services provider Core NAP, has developed a very low-cost thermal monitoring system for the company’s newest data centers. Porter has bussed a system of low-voltage thermal sensors together over Cat5 cable. The monitors report back to a database that can map data center temperatures in real time. Porter plans to be able to put multiple monitors in cabinets, under floors and in the cable runs above the racks. The sensors from Maxim IC report to USB readers plugged into Linux hosts. The hosts log data to a local Web server, and Core NAP plans to combine that info with Visio maps of the data center.
“We bid out the price to buy some of these thermal mapping products,” Porter said. “The systems start around $1,000 and cost around $100 per sensor. We’re able to deploy our system for well under $25 per sensor, including bus and reader. The software is fully supported in the Linux kernel so we don’t have to write any drivers. When I told management how much it would cost it didn’t take me long to get them to fund the project.”
Porter says Core NAP customers are interested in high-density server configurations, and modern blade servers can throw off hot-cold aisle set ups, so thermal mapping is critical to staying on top of customers’ density demands.
Yesterday I blogged about the Uptime Institute’s criticisms of PUE, specifically that the Green Grid metrics whitepaper does not explicitly discuss the need to gather PUE data over time. Microsoft’s Christian Belady (one of the key developers of the PUE metric) responded with the following:
“Ken [Brill of The Uptime Institute] has some valid points, clearly there needs to be more clarity and refinement in the definition to make it a rock solid benchmark. PUE is a “living metric” that the industry and in particular the Green Grid is working. But with all of the issues that are in the process of being resolved, here are three basic facts:
1) All metrics can and will be gamed regardless of the crispness of their definition. Show me a metric and I can come up with a way to game it.
2) Companies that are measuring PUE are improving their PUE over time. So they are improving their efficiency
3) Companies that are not measuring, are likely not improving. So these companies will be at a competitive disadvantage.
Microsoft is a company that is measuring (since 2004) and improving our PUE benchmarking against ourselves. There is no reason any other company cannot do this. Comparison with other companies is useful but less important to us as long as we demonstrate continuous improvement in our PUE. We hope that the issues people have with PUE for external benchmarking will be cleaned up in time but we do not plan on waiting until then for continuous improvement in our own operations.”
Here are a few key points to consider in the ongoing evolution of PUE:
Gaming PUE is going to happen
A lot of data center providers have included PUE ratios in press releases lately, many of them incredibly low. Rich Miller at Data Center Knowledge says he’s seen it before. “That’s pretty much what happened with the Uptime Tier System, which set forth a four-tier rating system for data center reliability. Data centers began describing themselves as equivalent to ‘tier three-plus’ or even ‘tier five.'”
PUE will need to evolve into a dynamic quality control metric
Dave Ohara at GreenM3 has a great explanation of how data center pros should use PUE in a dynamic way. “What helped me to think of PUE as a dynamic number is to think of it as quality control metric. The quality of the electrical and mechanical systems and their operations over time are inputs into PUE. As load changes and servers will be turned off the variability of the power and cooling systems influence your PUE. So, PUE can now have a statistical range of operation given the conditions. This sounds familiar. It’s statistical process control.”
Standards and training needed on how and when to measure PUE
Data center managers getting started with a PUE measurement program need some guidance — where, when and how do you take the most meaningful measurements? Microsoft’s Mike Manos and Belady have put together an excellent PUE Strategy post on their blog, The Power of Software. This checklist takes PUE newbies from measuring by walking around with a clipboard to data center chargeback. The Uptime Institute’s Pitt Turner has a great webcast on how to measure PUE on UPS and PDU equipment. The next step will be to get everybody doing this in the same way — which is where ASHRAE TC 9.9 comes in. The organization supports PUE and announced plans to develop a publication that would standardize PUE measurement methodology in November 2007, but no word so far on the progress of that project.
Uptime Institute executive director Ken Brill warned panelists at an online seminar today to be wary of very low Power Usage Effectiveness (PUE) ratios touted by some data center operators. “If your management begins to benchmark you against someone else’s data center PUE, you need to be sure what you’re benchmarking against,” Brill said.
Brill said he’s seen companies talking about a PUE of 0.8 — which is physically impossible. “There is a lot of competitive manipulation and gaming going on,” Brill said. “Our network members are tired of being called in by management to explain why someone has a better PUE than they do.”
If you’re going to compare your PUE against another company, you need to know what the measurement means. “You need to know what they’re saying and what they’re not saying,” Brill said. “Are you going to include the lights and humidification system? If you’re using free cooling six months of the year, do you report your best PUE?”
Brill conceded that The Green Grid’s PUE whitepaper has gained traction in the industry, spurring more action and debate than any other efficiency effort so far. But Brill takes issue with the measurement’s use of the term “power”. According to Brill, the fundamental problem with PUE is that it’s a snapshot in time. Power by definition is a spot measurement, Brill said. Power over time is “energy”. So power is measured in kilowatts, energy is measured in kilowatt hours.
Proponents of PUE like Microsoft’s Christian Belady have advocated measuring PUE over time, but Brill said that is not expressed explicitly in the standard.
I think it’s a bit of a stretch to assume C-Level execs are even aware of PUE (let alone calling data center staff out on the carpet about it).
I recently wrote an article about a data center manager that made huge efficiency improvements at a massive facility, saving hundreds of thousands of dollars through engineering projects. I asked him what his CIO thought about the data center efficiency he was achieving, and he told me the CIO had no idea. He’d never actually met the CIO…
Nonetheless, Brill makes a very important point. The first goal of PUE is to make a ratio to improve on internally. But the larger goal is to use the metric to compare data centers — as a benchmark against competitors, or as a way to compare various configurations, geographical locations, and technologies. Without standardization, comparative measurements will be meaningless.
Are your executives measuring you against competitors’ PUE? We’d like to hear from you.
There has been a significant slump in the IT job market recently. InfoWorld (via Slashdot) is reporting nearly 50,000 IT jobs lost in the last year. The Wall Street Journal, Gartner, and others are all reporting that tech-spending is slowing, and a slumping economy means less hiring. Systems and network admins, helpdesk employees and other IT workers are reporting hard times on the U.S. job market.
“We have not seen extreme measures being taken by IT organizations, such as hiring freezes, but we do expect to see enterprises take a more conservative and ‘wait-and-see’ approach to staffing for the rest of 2008,” said Gartner research vice president Lily Mok in a recent report.
Nonetheless, data center facility manager jobs are still in high demand. The New York Times reported on it recently and I’m still getting emails from Google’s recruiters asking if I know anybody looking for a job on with the Google facility engineering team.
Equinix, the large data center hosting company, announced today that LinkedIn, an online social network for professionals, is expanding their presence within the company by renting out space in one of Equinix’s Chicago data centers.
LinkedIn already rents space in two of Equinix’s Silicon Valley area locations. The release didn’t say how much space in Chicago LinkedIn would be renting out, or in which Chicago data center it would be in. Equinix now has three data centers totaling about 500,000 square feet of space in the Chicago area.
Equinix’s newest location in the Chicago area is located in a northwestern suburb called Elk Grove Village. We went on a video tour of that Equinix facility earlier this year.
Internet search company Yahoo partnered with wireless data center monitoring startup Synapsense to tune its hot-aisle/cold-aisle configuration, implementing cold aisle containment strategies and raising the inlet air temperatures on servers.
The Web giant reduced data center cooling energy use 21%, and reduced its PUE (Power Usage Effectiveness) ratio from 1.52 to 1.44. According to a presentation given by Christina Page at Yahoo and Troy Mitchell, of SynapSense at the recent Silicon Valley Leadership Group Data Center Energy Summit in San Francisco, the project will save Yahoo $563,000 annually on its data center energy bill.
The study took place in 8,000 square foot room within Yahoo’s 40,000 square foot data center in Santa Clara, Calif. The room featured a hot-aisle cold-aisle configuration, 3-foot raised floor plenum, 12-foot ceilings, seven computer room air handlers, and four PDUs.
Folsom, Calif.-based Synapsense conducted process of optimizing Yahoo’s cooling systems as a high profile proof of concept project.
Synapsense builds battery powered monitors to track data center environmental conditions. The monitors use low power wireless [2.4 gigahertz] to communicate that data to a server. Synapsense’s software synthesizes that information and displays it as a live image, which allows data center managers to look at real time maps of their data center and view air pressure distribution, humidity and temperature.
The wireless monitors measure the temperature on inlet and discharge of the racks, the temperature at the inlet and discharge of the Computer Room Air Coniditoner (CRAC) units, humidity at the CRAC units, and sub-floor air pressure.
Experts say data centers often supply up to three times the cooling that the servers actually need, and the majority of that air is being wasted. In order to make that cooling more efficient at Yahoo, Synapsense isolated the cold aisles, slowed fan speeds, and raised inlet air supply temperatures to the servers to 72 degrees.
The slides below are from Yahoo and Synapsense’s presentation at the SVLG event. Slide one shows Yahoo’s cold aisle containment system. Slide two shows Synapsense’s LiveImaging feature. Slide three shows where Synapsense places the sensors in the data center. Slide four shows the layout of Yahoo’s server room.
Yahoo had been supplying air to servers at 51 degrees in the cold aisles. But even with hot-aisle cold-aisle configuration, there was a lot of air mixing going on in the facility, and air in the cold aisle began to warm at the top of the racks, coming in nine degrees warmer. By containing the cold aisles, the variance from the bottom to top of the rack was reduced from nine degrees to two degrees.
Synapsense installed Variable Speed Fan Drives in all of the racks in this proof of concept area. Fans were reduced to 80% of their full speed. According to Synapsense’s Mitchell, a 20% fan speed reduction results in 50% reduction in energy use.
Managing Yahoo’s carbon footprint
Page, formerly of the Rocky Mountain Institute (an environmental engineering consulting firm), is responsible for managing Yahoo’s carbon footprint. She says Yahoo has tried various cooling strategies in other locations, including cold aisle containment and economizers.
Page works with Yahoo’s facility managers, finding ways to mitigate the company’s carbon emissions across office buildings and data centers. Page says the data centers account for well over half of Yahoo’s carbon footprint.
Yahoo committed to carbon neutrality in 2007 — making changes to its operation for efficiency and buying the remainder on the carbon offset market. According to Yahoo, the annual CO2 abatement for this project was 1,670 metric tons.
When asked if Yahoo intended to apply Synapsense sensors and cold aisle containment across the remaining 32,000 square feet of raised floor, Page said Yahoo is waiting for final results of the proof of concept to make a decision. “As the temperatures settle in the facility, we expect to have new results in the middle of August,” Page said. “It’s currently under review whether to expand the rest of the data center. We’re excited to see final results.”
By way of Data Center Knowledge is this video on IBM’s green expansion of its data center facility in Boulder, Colo. The video is of IBM’s grand opening of the expansion last month. A lot of the event, as expected, was self-congratulatory back-patting (it even includes a ribbon-cutting ceremony!) but it’s a good clip nonetheless:
[kml_flashembed movie="http://www.youtube.com/v/9TZIPIgwbuo" width="425" height="350" wmode="transparent" /]
- Though IBM didn’t say exactly what its power usage effectiveness (PUE) would be, it said the facility would be twice as efficient as the industry average, which is 2.5 according to the Uptime Institute. If that Boulder facility has a PUE of 1.25, that’s pretty darn good.
- Free cooling for 75% of the year using a water-side economizer
- Will use about 1 million kilowatt hours per year of wind power purchased from Xcel Energy
- Variable-speed pumps and motors in the cooling systems
- Low-sulfur diesel in the backup generators to help reduce emissions
The HP Performance Optimized Data Center (POD) is essentially a me-too product, following on the heels of Sun’s Project Black Box and Rackable’s ICE cube. But a recent post from Gordon Haff at Illuminata about the HP POD containerized data center makes a good case for HP’s offering. Haff brought up a great point I hadn’t seen in other coverage of the new Hewlett-Packard data center trailer thus far.
Haff says HP’s strengths are in volume server design and supply chain, “And that’s the reason HP is likely to be as successful with this type of product as anyone—if not more than most… It’s the IT gear within the container, how it’s delivered, how it’s serviced, and how it’s upgraded that matter most to potential customers.”
In recent articles, HP execs have positioned the POD as a way for companies running out of data center space to add capacity quickly, reducing the time it takes to build out brick-and-mortar space.
If data center managers are looking for on-demand capacity, then a company like Sun might not be the best option. While the Sun Modular Data Center has all the engineering bells and whistles of the HP POD, can you afford to rely on a company that’s had serious hardware supply chain issues when you’re in a capacity crunch?