Oh no, something went boom. What's our disaster recovery plan?


Who said we didn't need one in the budget meeting... Ah, yes

Small truck skids towards side of road as wheel comes loose. Photo by shutterstock
Sponsored Disaster recovery systems mostly suck. There are a seemingly unlimited number of them, but finding ones that aren't either utterly maddening or ruinously expensive is rare. Fortunately, ease of use is starting to be a product dimension that the storage industry is competing on again, so options are emerging.
Regardless of the size of your organization, backups are absolutely essentially. Though it's been said before, it needs to be said again: if your data doesn't exist in at least two places, then it does not exist.
In a perfect world, organizations of all sizes would also hold disaster recovery dear, with both backups and disaster recovery being engaged in by all. Unfortunately, many organizations feel that disaster recovery is either too expensive, too time consuming or both. It's neglected and even forgotten.
Right up until something goes boom. Then everyone starts saying disaster recovery really loudly.
For organizations doing basic backups, such as backing up data at the VM level and using software to back VMs up to the cloud then there are many solutions to choose from, and many are actually easy to use. The instant you want to back up anything more than just VMs, or start orchestrating disaster recovery, the discussion gets complicated.

Array simplicity

There are still a lot of organizations that prefer do handle their backups and DR at the array level. This could be because they're using those arrays to provide storage to more than just VMs, or it could be because they have a lot of IT automation build around arrays. So just where are array vendors here?
In my opinion, the best way to take the pulse of a market is to look at the middle-ranked vendors. Those selling the most units don't necessarily evolve fastest, nor deliver the best TCO. From 2010 to 2015, for example, HDS was a great company to watch. Sales teams from Dell, HP, EMC and NetApp all learned to curse them.
Now Fujitsu has updated its Eternus range. If you are unaware, Eternus comprises all-flash and hybrid arrays, tape libraries, some data protection appliances and a hyperscale-targeted SKU.
Under "integrated systems", Fujitsu offer VSAN ready nodes, various Windows storage-based solutions, pre-canned Openstack and, Hadoop, and so forth. If you were designing a minimum set of storage systems to meet the needs of the Fortune 2000 while paying little attention to anyone else, this is exactly what would result.
At first glance, the Fujitsu arrays aren't bad. Features include replication and data integrity in a variety of flavours, encryption (with key lifecycle management), and Disaster Recovery.
Fujitsu don't seem to be redefining the industry. They look like a vendor where someone very diligently went around and ticked all the important boxes required to please enterprise customers.
Okay, so we've established that Fujitsu can serve as a reasonable proxy for the state of the storage industry as a whole, so how does this relate to disaster recovery?

Odd restrictions

Traditionally, arrays haven't been very cooperative with the idea of economic efficiency when it comes to disaster recovery. As general rules, if you want to set up replication between two identical storage arrays and the latency between them is less than 10msec, then you can set up replication across sites easily.
Try to set up replication between dissimilar arrays, try to set up a many-to-one replication scenario, or try to do any of this over high latency or mediocre-throughput network connections, and arrays just refuse to play along. This is one of the biggest reasons why "proper" backups and disaster recovery have long been considered "out of reach" for midsized organizations.
The truth of the matter is that most organizations can't today – and likely will never – be able to afford to maintain a like-for-like duplicate of their production environment for the purposes of disaster recovery. This isn't to say that a like-for-like duplicate isn't the ideal solution. It also isn't to say that edge cases don't exist where the odd organization will actually build such a thing.
It's that "overwhelming majority of organizations" category that I'm talking about.

Ease of use

In perusing Fujitsu's storage documentation I have found repeated mentions of "simple and transparent failover" and a number of examples of scenarios where Fujitsu has different sized all-flash and hybrid arrays capable of talking to each other. In fact, Fujitsu is advising using its hybrid Eternus DX S4 as a failover target for their all-flash Eternus AF S2.
Oh, it claims to have solved the "replicating over two cans and a piece of wet string" problem. Apparently, the arrays will not lose their minds if the WAN link between the sites is a little sketchy, just so long as it can handle actually handle the load.
Basically, Fujitsu claims to have a push-button simple disaster recovery concept baked in to its storage arrays that doesn't need to be treated like a princess. There's no reason not believe Fujitsu on this. Search engines aren't exactly bristling with refutation of these claims, nor does Fujitsu have a reputation for regularly making such claims without the ability to back them up.
For me, this then serves as a signal that ease of use is finally important for disaster-recovery systems. Even small organizations that use arrays can get in on the game, hopefully increasing the number of organizations that have not only backups, but actual orchestrated, simple failover.

Necessity

For those who already have backups and disaster recovery sorted, the above probably seems obvious, boring and stupid. Technology progresses. Why do I suddenly think this really a big deal?
The answer can be found somewhere in the response I've observed to the Meltdown bug. A truly astonishing number or administrators have been complaining on social media about how inconvenient it is for them to have to "shut down their entire data center" to patch everything.
These administrators are complaining because they need to apply patches to operating systems, hypervisors, physical servers, network switches and yes, even storage arrays. Lacking a proper – and tested – disaster-recovery set up, these administrators can't simply fail over to another site, patch the primary site and fail back.
In some cases it's worse: administrators hypothetically can perform failovers like that - but they won't, because aren't sure it will actually work. Instead, they'll take multiple weekends out in a row to shut things down and patch disruptively. This is dumb and we collectively need to stop doing it.
We are finally starting to see the emergence of capabilities to whack these last concerns about disaster recovery become standard functionality and that is a good thing. Now we just have to hope that as these capabilities become mundane we remember to actually set them up and use them.

Comments

Popular posts from this blog

The Register Lecture: The Secret Spitfires

Silicon Valley's Corrupt Underbelly: It's Far Worse Than We Thought