Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011

Page created by Stephen Mitchell
 
CONTINUE READING
Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011
Three Key Requirements of a Sound
Disaster Recovery Strategy

                             March 2011
Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011
Disaster Recovery: Best Practices

Contents
Abstract ......................................................................................................................................................... 3

Three DR Requirements ................................................................................................................................ 3

Mitigating Risk............................................................................................................................................... 4

   Mitigating Risk: Disk for Backup, Tape for DR........................................................................................... 4

   Disk for Backup, Tape for DR (and maybe more) ...................................................................................... 5

Ensuring Long‐term Affordability.................................................................................................................. 6

   TCO and Tape ............................................................................................................................................ 6

Testing, Testing, Testing! .............................................................................................................................. 7

Conclusion ..................................................................................................................................................... 8

BlueScale, TranScale, Spectra, and the Spectra Logic are registered trademarks of Spectra Logic Corporation. All rights reserved worldwide. All
other trademarks and registered trademarks are the property of their respective owners. All library features and specifications listed in this
white paper are subject to change at any time without notice. Copyright © 2011 by Spectra Logic Corporation. All rights reserved.

                                                                                 2
Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011
Disaster Recovery: Best Practices

Abstract
For short‐ and long‐term survival, organizations must try to protect their data against any and all of the
possible disasters, large and small, that can disrupt daily operations. Disasters can range from power
outages, employee theft, and virus and malware, to site shutdown caused by a natural disaster. And
whatever the magnitude, any of these disrupts an organization’s normal flow of activity1 so recovery
from a disaster should be a top priority for organizations.

A strong disaster recovery (DR) plan
must address risk mitigation,                              Disaster Recovery Strategy
affordability, and include a test/drill
                                                1) Risk mitigation– so data can survive any disaster
component. And part of the DR plan              2) Affordability over the long term
must include tape in the mix along              3) Testing, Testing, Testing!
with disk, because of tape’s
unmatched advantages in terms of
portability (easy to move from primary site to ensure availability), invulnerability to malware and
viruses (all data backed up prior to the malicious code remains free of corruption), and of course,
affordability. Tape is the ultimate insurance.

Three DR Requirements
A strong DR plan is a necessity: study after study confirms a horrifyingly high mortality rate for
organizations caught without DR plan when a disaster
strikes. An example: 50% of organizations without a data
protection strategy2 never even re‐opened after a strike by a
tornado in the Midwest. Another example: a shocking 43%
of businesses never re‐opened following a significant data
loss due to disaster. Of that figure, 80% failed in a year and
93% within five years.3

This underlies the common sense notion that a good disaster recovery plan is the best insurance a
company can take out against major and minor threats to data.

Every company has its own disaster recovery requirements according to staff size, IT budget, existing
backup/DR equipment, and data priorities. Some commonalities, however, are found across all plans:
the need to protect data from the effects of myriad disasters, the element of affordability, and the need
to test and improve DR strategies regularly.

1
  “Server Virtualization Part 5: Disaster Recovery”
http://www.bitpipe.com/detail/RES/1259882237_596.html?tbaction=play&titleId=51904036001
2
  Maltby, Emily. “Readying for the Worst.” The Wall Street Journal. 9 September 2009.
3
  “Importance of Succession Planning: Continuity Disaster Recovery.” Phoenix Blogs. May 2007.
http://continuitydisasterrecovery.phoenix-blogs.com/importance-of-succession-planning/
http://www.bizjournals.com/cincinnati/stories/2004/08/09/focus5.html

                                                       3
Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011
Disaster Recovery: Best Practices

Mitigating Risk
The first task in DR planning is to assure that data is protected in the event of significant disruption—
essentially, an insurance policy against catastrophe. Uniformly, the best disaster recovery strategy
includes storing data off‐site, with at least one copy on
tape.

The most commonly used data storage options are tape
and disk storage, typically combined in a data
protection architecture. More recently, the cloud option
for storage has begun to gain a foothold in the market.
The cloud is widely considered to be not quite ready for
primetime due to prohibitive expense,4 security
concerns, and an organization’s absence of control over
its own data. For example, trusting data to the cloud means that an organization believes that the cloud
providers have a competent disaster recovery plan, will not arbitrarily go out of business, and will not
gate access to data if a payment issue arises—and such issues may arise if disaster strikes either the
cloud company or the organization whose data the cloud company stores.

Mitigating Risk: Disk for Backup, Tape for DR
Disk is a very effective tool for backup, especially when coupled with the variety of methods that create
additional levels of data protection such as RAID. Organizations are increasingly establishing strategies
that include off‐site disk storage through remote replication and other technologies. Disk, combined
with tape, serve as effective backup system and archiving architecture, with tape serving as a key
element in archiving and disaster recovery.

In terms of the type of disasters that may occur, only 3% of
all disasters5 are significant natural disasters. Although
these may get the most press, they do not form the most
pressing threats to data. The most common threat to data
is hardware malfunction, followed by human error,
software corruption, and computer viruses. Disk is
vulnerable to some degree6 as shown in long‐term studies7 of disk drive failures.

4
  Amazon S3 Data Pricing. http://aws.amazon.com/s3/#pricing. Accessed February 2011. For the least expensive
and most reliable Amazon S3 storage, organizations pay $0.055 per GB. For firms that require over 5000 TB of
storage (5PB), the total expense comes to $275,000 per month. While most organizations with this kind of data
probably won’t be using the cloud only, this figure at least illustrates the radical expense of cloud storage.
5
  Gallant Data Recovery Services. “Statistics About Leading Causes of Data Loss,” Displayed April 2011,
http://www.gallantent.com/solutions.htm.
66
   Shroeder, Beth and Garth Gibson, "Disk failures in the real world: What does an MTTF of 1,000,000 hours mean
to you?," Usenix File Storage Technologies Conference, 2007.
Pinheiro, Eduardo, Wolf‐Dietrich Weber and Luiz Andrè Barro, “Failure Trends in a Large Disk Drive Population,”
Google.

                                                        4
Three Key Requirements of a Sound Disaster Recovery Strategy - March 2011
Disaster Recovery: Best Practices

Protecting data from logical corruption is one of the primary uses of tape in disaster recovery, and
something that disk, despite its many strengths as a data backup medium, has never really solved.8 If
vicious software or viral malware hits a disk, it can spread to the initial disk, the backup disk, and the
RAID’d disk, which serves as the backup for the backup. It is possible that a disk‐only backup strategy
can actually worsen the initial disaster situation against which it is supposed to protect.

The combination of disk and tape for disaster recovery is the strongest defense against risk. Online
providers clearly trust tape, as illustrated by Google’s recent Gmail issue. On February 28, 2011, Google
posted on its Gmail platform an apology to the 0.02% (estimated approximately 150,000) of users who
could not access their email.9 The culprit? A software bug that attacked email in the disk arrays and disk
backups across data centers. All copies of the email were unavailable, save those written to tape. The
announcement states, “To protect your information from these unusual bugs, we also back it up to tape.
Since the tapes are offline, they’re protected from such software bugs.10” This succinctly illustrates the
importance of using tape for DR.

For hardware malfunction, despite many claims of disk companies, modern tape libraries with widely
available technology such as LTO are exponentially more reliable than enterprise disk—disk’s error rate
is about 41,000 times as great as tape, making tape the most reliable storage solution for data
protection needs.11 Data written incorrectly is worthless.

Disk for Backup, Tape for DR (and maybe more)
Firms increasingly use tape to create full copies of data as disaster recovery insurance and for active
archiving, and use disk for incremental backups. Along with backing up daily data changes, disk can also
be set up to employ any of the disk‐based snapshot or mirroring technologies backups that are support
more granular backups—for example, every hour. Incremental and point‐in‐time backups during the day
minimize the data loss potential of a disaster occurring between backup windows.

Henson, Valerie.” Opinion: Real‐world Disk Failure Rates offer Surprises,” Computerworld, 2007,
http://www.computerworld.com/s/article/9025380/Opinion_Real_world_disk_failure_rates_offer_surprises.
7
  Shroeder, Beth and Garth Gibson, "Disk failures in the real world: What does an MTTF of 1,000,000 hours mean
to you?," Usenix File Storage Technologies Conference, 2007.
Pinheiro, Eduardo, Wolf‐Dietrich Weber and Luiz Andrè Barro, “Failure Trends in a Large Disk Drive Population,”
Google.
Henson, Valerie.” Opinion: Real‐world Disk Failure Rates offer Surprises,” Computerworld, 2007,
http://www.computerworld.com/s/article/9025380/Opinion_Real_world_disk_failure_rates_offer_surprises.
8
  Hill, David G. Data Protection: Governance, Risk Management, and Compliance. Boca Raton: CLC Press. 2009.
Page 53.
9
  Gustin, Sam. “GFail: Google ‘Very Sorry’ After the Cloud Eats Thousands of Gmail Accounts,” Wired Epicenter.
2/28/2011. http://www.wired.com/epicenter/2011/02/gmail‐fail/
10
   Treynor, Ben. “Gmail back soon for everyone.” Post on www.gmail.com. 28 February, 2011.
http://gmailblog.blogspot.com/2011/02/gmail‐back‐soon‐for‐everyone.html
11
   Newman, Harry. “Why Enterprise Tape Can’t Get No Respect.” Enterprise Storage Forum. June 17 2010.
http://www.enterprisestorageforum.com/continuity/features/article.php/3888366/Why‐Enterprise‐Tape‐Cant‐
Get‐No‐Respect.htm ‐‐the actual error bit rate for LTO tapes is 1 bit in every 1017 bits, while disk errs 512 bytes per
every 1016 bits of data transferred

                                                           5
Disaster Recovery: Best Practices

Some data will most likely be lost
in the event of a disaster,
assuming the error occurs
somewhere between backup
windows. Disk backups can be
easily automated to create more
frequent backups. Of course, this
data is at risk depending on the
timing and nature of the data
loss, but may be more than worth
the investment depending on the
nature of the data that is being protected.

The importance of using tape (along with disk) for DR is emphasized in the Google example—the tape
was unaffected by the bug that led to the email deletion.

Ensuring Long‐term Affordability
Disaster recovery preparation and plan implementation must not break the bank, given that is precisely
what it is trying to prevent. Rather, a strong disaster recovery plan needs to be feasible for an
organization over the long‐term. Recent laws like Sarbanes‐Oxley make it necessary to keep data for five
to seven years or longer, and are subject to audits at any time; this should factor into a strong DR plan,
as one of the unforeseen potential costs of a disaster. For example, Lagasse, Inc., a wholesale
distribution company headquartered in New Orleans with over 1,000 associates nationwide, relied on a
well tested DR plan that proved its value following Hurricane Katrina. Lagasse had no downtime during
the hurricane, and then spent two months after its initial disaster recovery confirming compliance to
Sarbanes‐Oxley.12 Tape served well Lagasse well in the compliance tasks that followed the disaster.

TCO and Tape
For affordable, long‐term storage, tape has long been the industry standard. Recent analyses by the
Clipper group13 show that the cost per TB of data storage of disk is anywhere from 5x to 290x as
expensive as tape storage, for both initial and long‐term costs. Since disk is on‐line it consumes power
24 hours a day, 7 days a week, every day. For organizations that want to move data off‐site using disk,
other less obvious costs are incurred, such as the purchase of WAN bandwidth for longer distance data
moves. These networks can cost anywhere from $100 to $1,500 per MB/s of capacity for the network;

12
   “Katrina Recovery.” Lagasse, Inc. Powerpoint presentation. Available on the internet at:
http://www.slideshare.net/mlancas/katrina‐recovery‐final
13
   Jelitto, Jens, Mark Lantz et al. “Magnetic tape storage advances and the growth of archival data,” Proceedings of
the First International Workshop on Standards and Technologies in Multimedia Archives and Records (STAR),
Lausanne, 2010. http://mmspl.epfl.ch/webdav/site/mmspl/shared/star2010/ppt/star2010_jelitto.pdf

                                                          6
Disaster Recovery: Best Practices

one analyst notes that organizations can spend more than the rest of their DR budget combined simply
on network maintenance.14

Cloud storage claims to be incredibly cost‐effective, but this is not yet proven. “Cloud services are
charged on a per MB, GB or TB usage basis, which can make predictable budgeting a challenge. One blue
chip company that recently considered moving to cloud for data replication estimated that it would cost
them, over a period of three years, $55,000 more [emphasis added] when compared with running a
comparable in house system.15” Other questions remain about cloud services use for disaster recovery,
not the least having to do with determining the resiliency of the cloud service’s own DR plan.

Testing, Testing, Testing!
In the event of a disaster, the unavailable data must be restored within a reasonable time frame if the
disaster recovery effort is to be successful.16

To bring data up to restore core business functions, it is necessary to have a tested DR plan. A good
restore plan takes into account the nature of the data to be restored; if it is mission‐critical data, restore
it as quickly as possible. Establish a time frame to reflect data importance—restore mission critical data
in a few hours, less important data as a secondary priority, (for example, within a 24‐ or 36‐hour time
frame). Restore the least used data whenever it is needed, or at least after the most important data is
restored and the organization is up and running.

 The numbers argue that at least a significant proportion – up to half‐‐ of organizations have either
untested DR plans or no DR plans at all: “of the 50% or so of companies that build continuity plans,
fewer than 50% actually test their plans, which is like having no plan at all.17” This point of view is

14
   Aaron, Jeff. “WAN Requirements for Successful Disaster Recovery.” Continuity Central.
http://www.continuitycentral.com/feature0403.htm October 2006
15
   Worms, Phil. “Why Cloud Computing Must Be Included in Disaster Recovery Planning,” Cloud Computing,
January 26, 2011. http://cloudcomputing.sys‐con.com/node/1689928
16
   “IT Disaster Recovery Plan: How Important Is it For You?” IT Outsourcing Adviser. http://www.it‐outsourcing‐
adviser.com/it‐disaster‐recovery‐plan.html

17
     Toigo, Jon. Disaster Planning Organization Main Page. Accessed April 201 1. http://www.drplanning.org/portal/#

                                                           7
Disaster Recovery: Best Practices

confirmed by a Symantec survey that shows small to medium sized organizations at the greatest risk of
poor DR planning. Only half even have plans and of those “only 28 percent have actually tested their
recovery plans, which is a critical component of actually being prepared for a potential disaster.18

Experts agree that testing your disaster recovery plan is key to a successful disaster response. Philip Jan
Rothstein, FBCI, president of the Rothstein Associates management consulting firm believes, “An
unexercised contingency plan is often worse than no plan at all.19”

Conclusion
For businesses, the keys to surviving a disaster are:
         strong storage architecture that uses disk and tape appropriately, with tape for disaster
            recovery
         custom disaster recovery plan
         regular tests of the DR plan

Make sure you can restore your data before you are under tremendous pressure to do so following a
disaster. At the point that recovery is required, the plan must be manageable‐‐or else it will be
overwhelming and increase risk of its failure. The DR plan is a lifeline for an organization, given the
frightening percentages of businesses, without plans, that fail after a disaster of almost any significance.
And tape’s role in DR is central, because tape serves as an affordable catastrophic insurance policy for all
data, especially for the data stored on disk.

18
   Sachoff, Mike. “Half Of Small Businesses Lack A Disaster Recovery Plan,” Small Business Newz, Jan. 12, 2011.
http://www.smallbusinessnewz.com/topnews/2011/01/12/half‐of‐small‐businesses‐lack‐a‐disaster‐recovery‐plan
19
   Harvey, Cynthia. “Identifying Weak Points In A Disaster Recovery Plan“ Processor, Vol.33 Issue 3, February 11,
2011. http://www.processor.com/editorial/’

                                                         8
You can also read