As my father said: From some, you learn what to do; from others, what not to do.
Oh oh – it seems someone is having trouble managing a project, and it’s a big one.
The Social Security Administration (SSA) is presently getting by in a 30-year old outdated facility, known as the National Computer Center (NCC) in Woodlawn, MD. Some of its support infrastructure, such as the UPS, is so old that there are no longer replacement parts available for maintenance.
Nearly $500 million in stimulus funding has been dedicated to building a new data center. As often happens, the project is a year behind schedule and the lag appears that it will get worse. Meanwhile, the old facility is filled with problems. This is no mundane “data center” – it is a facility that delivers annual payments of $700 billion to 56+ million Americans.
Fortunately, the General Services Administration (GSA) has found a location for the new home of a new state-of-the-art data center. It’s rather interesting to note that a significant part of the delay in prepping this new site is a concern over cost of electric power: government auditors “expressed concern” that not enough consideration was given to this cost.
I’m a little confused: Power is power (a kilowatt hour is empirical, no?), a data center is a data center (a proper project knows the size, scope, and power demands… no? Um, well, I guess, “no”), and the project is supposed to manage according to schedule and reality – right? That’s what a project does – that its whole purpose. Otherwise we wouldn’t waste our time shuffling all these schedules, resources and people.
I’m kinda guessing that the new site might be a bit removed from ready-access to efficient, affordable, power: Maybe they need relay stations, or boosters, or who knows what – but this would seem to be a failure of proper survey for where they are – what they really need, and where they’re going and thus resultant trouble in the middle: Getting there.
Kelly Croft, Deputy Commissioner for Systems at the SSA, provided some telling Congressional testimony this past February 11th. She cited the “dire need” for the new data center: “Without a long-term replacement, the NCC will deteriorate to the point that a major failure to the building systems could jeopardize our ability to handle our increasing workloads without interruption.” Further: “Despite all of our best efforts to preserve the NCC for as long as necessary, there is always the potential that a critical facility infrastructure system could suddenly fail.”
Risks and incidents are further illuminated by Croft’s recent testimony:
– There is No True Dedicated Power: “Employee office spaces in other areas of the building share the same power lines and HVAC system as the data center. This design problem means that a potentially isolated issue in an area outside the data center, such as a minor receptacle overload at someone’s workstation, could temporarily shut down some power to the data center and HVAC system.”
– There is an Aging Custom UPS System: “The UPS is not an off-the-shelf product; it was designed specifically for the building. While we have extended our service contract with the UPS maintenance vendor over the years, the vendor recently advised us that it could not guarantee repairs in the near future. The necessary parts are simply no longer available. If the UPS failed, we would have to bypass the system and deliver unconditioned power to the data center equipment, which could quite potentially damage the equipment. Replacing the UPS would require significant downtime at the NCC.
– Critical Cabling Problems: “Tangled cables can block the under-floor airflow that cools our servers, and we cannot work on the cables safely without shutting down the affected systems. Similarly, troubleshooting problems is difficult when we cannot isolate cable pairs easily to determine whether problems exist in the cables or in the IT equipment. There is also an elevated risk of data corruption, because electro-magnetic interference from the electrical wires that are located too close to the telecommunication wires can distort data transmission.”
– Leaking Water in the Data Center: “Last year, our facilities staff noticed water on the floor of one of the large battery rooms in the NCC. They quickly traced the source to a leaking water pipe in the room. Any water in close proximity to high-voltage batteries presents a serious hazard to the building and its personnel. In order to fix the leak, plumbers needed to expose the pipe and cut off the water supply. Unfortunately, without redundant systems, cutting off the water supply to the pipe also required cutting off the water supply to the large air handling equipment that is responsible for cooling our computing space. Since the air handling equipment had to be turned off, we had to actually shut down a portion of our national computing operations while making the repairs.”
Here in the Weave, I hope it’s obvious that there had to have been a failure in an ongoing survey of Where We Are (where they were) for the SSA. Always understand where you are, thus knowing where you need to go, and thus knowing how to get there – sanctioned and known projects, with assigned budget, resources, responsibilities, and sized expectations – all done on time, in time.
Knowing where you are – the status of systems, their longevity, their safety and security, their update, their schedule for replacement – is a critical factor in any organization’s surety. You must lead change, not mount it in a burst when critical infrastructure is failing: finding that water is not only near critical power sources, but leaking to boot; upon discovery that cables are tangled and unlabeled – what happened to “wire management” here? And so on…
On this day: On February 27th,1967 Pink Floyd released their 1st single “Arnold Layne”