From the Editors of CIOSC
Whether planned or unplanned, downtime poses a constant, costly challenge to IT managers. Researcher IDC estimates that downtime costs organizations billions of dollars each year worldwide. It's not surprising, then, that many IT managers are in the vanguard of the effort to eliminate, or at least lessen, the impact of downtime.
This article looks at how today's IT organizations -- which continue to be asked to do more with less -- can take a new, more effective approach to reducing both planned and unplanned downtime.
The costs of complexity
Few would disagree that managing the flow of information across an organization has become increasingly complex in recent years. Many IT departments are running Windows on some systems, Solaris on others, and perhaps Linux as well. There are legacy applications running alongside new software. There are different versions of software. And then there are the seemingly endless combinations of hardware and software that need to be managed, including desktops and laptops, PDAs, and wired and wireless networks.
Supporting this complex environment can be costly. In the area of storage and server management alone, managing the complexity created by heterogeneous server and storage platforms can require as many as 50 tools in a large data center.
By its very nature, this complex environment requires increased downtime. The challenge for IT departments, of course, is to schedule time to reconfigure systems, perform upgrades, and apply patches without impacting productivity.
IT's marching orders
At the same time, organizations are increasingly seeking ways to manage the impact of downtime as part of a push to reduce costs and to provide reliable, more responsive IT infrastructures. For IT departments in particular, the marching orders are clear: they must support the business goals of the enterprise by ensuring the safety and accessibility of its information assets. Anything that disrupts this safety and accessibility creates downtime, and downtime costs the company money. When disruptions do occur, IT must get the enterprise restarted, and restored to the "moment before" state, as rapidly as possible.
That's why many organizations have begun implementing clustering, volume management, and storage replication technologies as a primary line of defense against unplanned downtime -- i.e., server failures, site outages, cyber attacks, and other events that threaten customer service levels. This "virtual environment software" can help IT departments more effectively manage disruptions.
But as IDC has observed, "these technologies can also be leveraged to reduce the costs and outage windows of planned downtime events -- a significant ROI bonus."
IDC goes on to say that organizations that institute processes and procedures allowing critical functions to continue during downtime are not likely to lose customers or the productivity of their staffs due to system outages.
"The newest generation of virtual environment software allows organizations to increasingly see their systems as a pool of shared resources that appears to be both self-healing and self-managing. In the end, this allows organizations not only to protect their investments in hardware and software but also to optimize those investments. A completely virtual environment allows established applications or functions to access features of newer systems and to be more reliable, more powerful, more scalable, or enhanced in some other way."
A more holistic view
Key to this new approach to downtime is the need to tightly bind security with storage and systems management. This more holistic view of information management means that enterprises are better prepared to prevent an attack, quickly recover in the event of a disruption, and make day-to-day operations run more smoothly. It means creating a resilient infrastructure that is flexible enough to respond to an ever-changing IT environment, but rigid enough to withstand an attack or disruption. Ultimately, it means that enterprises can truly understand their environment, act to protect it, and control it on an ongoing basis. Specifically:
- Understand Above all, enterprises need to understand the state of their information environment. That means assessing the risk against the latest vulnerabilities, exposures, and threats. Early warning systems provide critical information about the external threat environment. Understanding also means knowing what systems are authorized and connected to the network, which applications are deployed, and what personnel are logged on. In addition, enterprises should know whether patches are up-to-date and whether system and data backup procedures are being performed regularly.
- Act Once enterprises understand what's happening in their environments, they must protect their information assets while minimizing the risk of disruption. Acting to protect assets involves shielding information from attack, mitigating threats, fixing errors, and recovering from incidents when they happen. Protection technologies such as antivirus, antispam, and intrusion-prevention should block threats automatically and be able to receive updates in real-time. Patch management systems are also key so that organizations can rapidly update software at the discovery of a new vulnerability. And it's important to trigger frequent backups when a threat is on the horizon to ensure that systems can be brought back online quickly, minimizing downtime and loss of information.
- Control Enterprises must also be able to control their environment. They need to maintain and monitor their infrastructure on an ongoing basis, ensuring that they understand the external threat environment and their internal security posture. In addition, they should have remediation capabilities that automatically distribute software and content updates and patches in response to a threat or vulnerability. It also means having asset management capabilities that help prioritize remediation based on the most critical assets and having selective restore capabilities to allow for timely recovery of critical assets.
Conclusion
No matter how good an enterprise's network is, data loss and system crashes are inevitable. And no matter what the cause, when business information isn't available, every minute down costs money. That's why today's enterprises need to develop and implement a specific plan that gets the network back up and running within minutes, not hours or days.
How important is it to ensure the security and availability of your information assets? According to a recent report by the U.K.'s Department of Trade and Industry, 70% of companies go out of business after a major data loss. That's why it is no exaggeration to say that, in many ways, your information is your business.
Loss of productivity due to unplanned or planned downtime is becoming more of a drain on organizations and therefore more unacceptable. Organizations that have heterogeneous IT environments would do well to explore a new approach to minimizing downtime.
Tom Schmidt writes frequently about information security topics. He has more than 15 years' experience as a writer and editor in high-tech publishing.