Perpetual Motion EnterpriseRock-solid storage infrastructure is the key to recovering business-critical information after a disaster
By Arun Taneja It is a well-documented fact that most businesses cease to exist within three years of a catastrophic loss of data. But because the thought of a disaster is repulsive to most IT and business executives, we tend to avoid serious prepariations for one.
Many businesses ignore insulating their strategic business applications with the possibility of disaster in mind, but as the Sept. 11th terrorist attacks have reminded us, disasters do and will happen. Most disasters are not human-made such as the ones in New York City and Washington, D.C., but nevertheless, they are disasters indeed. Floods, earthquakes, power failures, and fires are all sufficiently common that we need to protect our businesses from them. In fact, the biggest risk to data is simple human error or malicious human intent. If you have not insulated your stored enterprise information to address these possibilities, you are jeopardizing not only your career, but your company's competitiveness and perhaps its very existence. The cost of data unavailability, let alone data loss, is very high. Depending on your business, estimates range from $10,000 to $5 million lost for each hour your systems are down. Whatever the specifics for your business, the loss is often more than just the dollar amount; it also includes loss of goodwill and brand image. A well-designed storage infrastructure that protects data integrity and availability need not be expensive, especially with adequate planning. Given this goal, the best way to build a bulletproof storage infrastructure for enterprise information is to take a building block approach. THE BUILDING BLOCK APPROACHFirst of all, you need to start by defining the business goals for your company. Define, with agreement from the corporate management team, the importance of data integrity and availability. For example, how much downtime can you withstand per year? (Make no distinction between planned and unplanned downtime, which is meaningless from a user's perspective.) Do not assume that all data is equally important. Rather, classify which applications are more important to have running under all conditions (such as data warehouses and transaction systems, for example). Furthermore, assign time value to data. Some data is important for a day or two, then loses value rapidly; other data has value for longer period of time. Some data is required by law to be kept available online for a certain period of time, and other data may be required by law to be available for a certain period of time but not necessarily online. This information will help you build a storage infrastructure that recognizes the various levels of value and correlates it to the type of systems it will be stored on. This approach is premised on the existence of a bulletproof storage environment within a data center. When you have developed that capability, then consider replicating information that is critical in case the primary data center is unavailable. Whether the replication is across the street to a different building or across the continent depends on your business and its geographical territory. Regardless, remember that data that is always available but lacks integrity is of little value; indeed, it is potentially dangerous to your business. The storage infrastructure needs to be designed, therefore, with both availability and integrity in mind. (Security is another aspect but is not a subject that I will address here.) In essence, there are five aspects to a bulletproof storage environment: redundant array of independent disk (RAID) storage, redundant pipes, protected servers and applications, backup and recovery, and disaster recovery. RAID STORAGERAID-protected storage systems are the basis of your "hardened" environment. Today, most RAID arrays meet the basic criteria for data integrity and protection. As a minimum, the system should support RAID levels 0, 0+1, and 5. (Depending on the importance of your application, you may need RAID 3 as well.) Currently, only Network Appliance Inc. offers RAID 4-based products and has done an excellent job in terms of data protection and integrity. But just about any vendor including EMC Corp., Compaq Computer Corp., Hitachi Data Systems Inc., Sun Microsystems, LSI Logic, IBM, and MTI can deliver solid protection as well. Your selection should be influenced by performance, scalability, managability, and price, in addition to how well the vendor meets other requirements (more on those later). With the advent of network-based virtualization appliances from companies such as DataCore Software Corp., FalconStor Software, Veritas Software, StoreAge Networking Technologies Ltd., and StorageApps (now a part of Hewlett-Packard), you could even build RAID protection simply from banks of disk systems. In any case, the RAID system must be fully redundant with hot swappable power supplies, fans, drives, and controllers. Dual AC inputs and mirrored cache are must. REDUNDANT PIPESRedundant pipes to the server are another important element, whether you have a direct-attached storage (DAS) or storage area network (SAN) environment. In the former, each server should have dual host bus adapters (HBAs), each connected to a controller on the RAID array; in the latter, each server should have dual HBAs, with each HBA connected to a different Fibre Channel switch. More complex meshing may be necessary depending on the size of the SAN, but the principle remains the same: dual connections among servers, switches, and storage systems. To complete this picture, you will need multipathing software in the server to redirect traffic among HBAs in case of failure. Multipathing support is built into some OSs (such as Solaris), otherwise, you'll need to purchase it separately. Some products, such as EMC's PowerPath, add load balancing to the mix. PROTECTING SERVERS AND APPLICATIONSThe next step is to protect your servers and applications in most cases, via clustering. Whether you're implementing a DAS or a SAN environment, the overall objective is the same: to fail-over an application from a failed server to a functioning one transparently. In a simple two-cluster architecture, in the "failed" mode the surviving server typically performs the job of two servers and application performance, but not availability, is affected. However, clusters of 16 or 32 or more servers are now possible (using software from Legato Systems Inc., Veritas, and several other vendors) whereby you can fail-over different applications to several functioning servers, thereby minimizing the performance impact. The important thing to realize is that the same application is not necessarily running on each server. For instance, server A could be a file server, and server B a database server. In a failure mode, the surviving server runs both applications. However, all servers could also be running the same application, as in a Web server farm. The key to all clustering software is its ability to "see" all the storage shared by the servers and keep it totally secure and demarcated under all conditions. Fortunately, clustering software has now matured to a point where serious production-level environments are feasible. Clustering software is sometimes used in conjunction with clustered file systems, whereby all storage is visible to all servers. However, note that in all cases, the ability to create clusters of disparate OSs is still only a yet unrealized dream. Make sure the servers themselves are protected with redundant heartbeats. For specific applications, you will need to purchase agents that give you granular control over application behavior upon restart. You can also develop custom agents for homegrown or other less popular applications; programming is generally not complicated or time consuming. BACKUP AND RECOVERYYou now have a rock-solid infrastructure from a storage, server, and applications perspective. I assume you have built in appropriate redundancies and protection at the network level as well. Now what? Are you ready to implement a disaster recovery solution? Not quite. First, you need a backup-and-recovery strategy. (If you're wondering why, consider these scenarios: How often have you inadvertently deleted a file and wished you could get last week's version back? How often does the systems or storage administrator delete or change something and needs to get back to yesterday's image? What if a disgruntled employee deletes an entire CRM database before saying goodbye?) It is crucial to understand that backup is a means to an end. Recovery is the ultimate goal: Many enterprises back up regularly, but find out too late that the data is unrecoverable. Thus, press your vendor into explaining how to ensure that backed up data is recoverable. Find out how easy or difficult it is to recover and at what speed. Don't forget that volume-level recovery means there is no filesystem awareness and that the full volume has to be associated with the application. Depending on the size of the volume, recovery could take several hours. There are several approaches here; your choice depends on the nature of your business and the importance of "up-to-the minute" data. It will also depend on the acceptable time-to-recovery. Take the banking and brokerage business, for example. Here, you'd need to protect data up to the last transaction that was completed by the system. You'd also need to recover in the shortest amount of time possible, given the downtime cost in this sector is greater than $5 million per hour. But for smaller, less time-sensitive businesses, the loss of a day's worth of transactions may not be catastrophic, and recovery time of a day or two may be acceptable. Here are just a few possibilities: Backup to and recovery from tape. Most enterprises perform full backups weekly and incremental backups daily and store them on tape. A full backup generally requires downtime unless snapshot technologies are used to take a point-in-time image of the filesystem or the volume and then release the filesystem or volume back to the application. Any changes thereafter are recorded separately. The static snapshot can be backed up without stopping the application; no downtime is incurred except for the amount of time it takes to take a snapshot. Application performance can be affected, however. In most traditional backup methodologies, the backup tapes are sent to a remote site to protect against disasters at the data center. Companies such as Iron Mountain have made a business of storing tapes in secure sites for many large enterprises. There are two considerations here, however: Time-to-recovery is long and the data is good only as of the time of backup. Financial institutions and others where the transaction frequency and size is large need other methods of recovering "up-to-the minute" information.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















