Part 6: ASR Invoke Failover to Azure

Previous Post in Series:  Part 5: ASR Invoking a DR-Drill

Overview

Welcome back folks.  In the last part of the guide we ran through a test failover for one of our protected VMware VMs, in this section we’re going to run through it for real.  To make the guide a little more straightforward, we’ll make a few assumptions:

  • Our failed over VMs will not retain the same IP addresses that they have on-premises.
    • If our VMs needed to talk to each other, this would likely mean an update to DNS etc.
  • We’ll be connecting to our failed over VMs over RDP via a locked down Public IP address.
    • We’ll be setting up a site-to-site connection using the VPN gateway service.

The main tasks we’ll be running through will be, creating a Recovery Plan and initiating a planned failover.  With that our of the way, let’s crack on.

Create a Recovery Plan

As we have two replicated VMs, we’re going to fail them both over together.  To do that, we’ll need to create a Recovery Plan:

  • Select “Site Recovery” under “Getting Started”
  • Select “Step 2: Manage Recovery Plans”
  • Now select “+ Recovery plan”
  • Enter a “Name” for your Recovery plan
  • Select “Microsoft Azure” as the “Target”
  • Select “Resource Manager” as the “…deployment model”
  • Click “Select items”
  • Tick all VMs you want to failover as part of this Recovery plan
  • Click “OK” twice

The Recovery plan should take no more than 60 seconds to deploy.

Just so we can step through the process, we’re going to make a minor change to our Recovery plan.  You would do this for many reasons, for example:

  • Add additional protected VMs into the plan
  • Group protected VMs together for failover
  • Run a script once a VM/group of VMs has come up
  • Add a manual task before/after a stage

In this guide, we’re going to add a manual task advising the admin to deploy/configure a Public IP address for each of the failed over VMs.  We’ll add this right at the end as it doesn’t actually effect anything other than our ability to RDP to the guest.

  • Select your freshly deployed Recovery plan
  • Click “Customize”
  • Click “…” to launch the Context menu
  • Select “Add post action”
  • For “Insert”, select “Manual action”
  • Give your action a “Name”
  • Provide “Action instructions”.  These should be clear enough that the admin knows what to attend to without issue
  • Click “OK”
  • Click “Save”

Invoke a DR Failover to Azure

Armed without Recovery plan, we should be in a good place to action our failover:

  • Select the Recovery plan you deployed above
  • Click “…More”
  • Select “Failover”

NOTE:  I’ve only ran through a test failover on one of my VMs, but hey let’s throw caution to the wind and go for it anyway.  The downside is, you’ll have to accept the popup warning to continue

The default values should be ideal for what we’re looking to achieve, so with that in mind:

  • Click “OK”

Looking back on our vCenter server, we can see that that there is an “Initiate guest OS shutdown” task, so things look to be progress as expected.

As ever, you can view the progress of the job in the “Site Recovery Jobs” blade

So everything looks to have completed as it should have and as expected, the job has paused at the “Manual” action stage.

Here we’ll have to intervene for the job to continue:

  • Within the “Failover” Site Recovery Job, expand the tree that contains the manual step you added.
  • Click “…” to open the context menu
  • Select “Details”

NOTE:  The first time you do this, you’ll see “Details”, any time after that, you’ll see “Complete manual action”, I missed the screenshot for the details piece 🙁

  • Attend to your “Manual action”, in my case add a Public IP to each VM
  • Navigate back to the “Failover” Site Recovery Job
  • Expand the tree that contains your manual action
  • Click “…” to open the context menu
  • Select “Complete manual action”
  • Again, type something deep and insightful into “Notes”
  • Place a tick in “Manual actions complete”
  • Click “OK”

Depending on where you placed your manual step, the “Failover” job should complete successfully.

NOTE:  Both of my VMs ended up being deployed into different Resource Groups as I forgot to go back into the “Compute and Network” replication settings and revert the changes I made for the test failover…be better than me 🙂

With the Public IPs added, I was able to connect to both my failed over VMs without issue.

Enable Reverse Replication

So our VMs are now running from Azure, that’s fantastic.  As an added bonus, whatever terrible event that led us to have to invoke DR to Azure has now been resolved, but we’re not quite ready to pull the trigger on a failback yet.  What we should do though is enable reverse replication to make sure our VMs are back in a protected state while they’re running from Azure.

Before I start running through this process, I’ve created a text file on the desktop of each VM called, “REPLICATE_ME.txt”.  This will be a nice easy way for me to see that the reverse replication has worked when we finally get around to doing a failback to on-premises.

For reverse replication to take place, we’ll first need to setup connectivity between our Azure Virtual Network and our on-premises network, for that we’ll need to Deploy a VPN Gateway into our Virtual Network:

NOTE:  You also have the option of Express Route here but that’s our of scope for this guide

Deploy a Virtual Network Gateway
  • From the Azure dashboard, select “+ Create a resource”
  • Search for “Virtual Network Gateway” and select it
  • Click “Create”
  • Enter a “Name” for the gateway
  • Select the same “Region” your VNET and Vault are deployed into
  • Select “VPN” for “Gateway type”
  • Select “Route-based” for “VPN type”
  • Select “VpnGw1” or above for the “SKU”
  • Select the “Virtual Network” your VMs were failed over to
  • Either accept the default for “Gateway subnet address range” for enter a suitable range within your VNET address space
  • Select “Create new” for “Public IP address”
  • Enter a “Public IP address name”
  • Leave “Enable active-active mode” as “Disabled”
  • Leave “Configure BGP ASN” as “Disabled”…unless you need/want it
  • Click “Review + create”

NOTE:  It may take up to 45 minutes to deploy your VPN gateway so go grab a tasty beverage 🙂

With our VPN gateway deployed, we’ll want to set up a Site-to-site connection.

  • Navigate to your freshly deployed VPN gateway
  • Navigate to the “Connections” blade and click “+ Add”
  • Give your connection a “Name” that makes sense e.g. “Source-Destination-S2S”
  • Select “Site-to-site (IP Sec)” for “Connection type”
  • Click the “Local network gateway” blade and click “+ Create new”

NOTE:  The Local gateway is where we specify the remote connection details i.e. the on-premises firewall IP and the internal network we want to route to.

  • Give your Local gateway a “Name” that makes sense e.g. “Destination-EndpointDeviceType”
  • For “IP address” enter the public IP address of on-premises the firewall you’re terminating the connection on
  • For “Address space” enter the local IP range(s) you want to route to from within your Azure VNET

NOTE:  This will be the internal network that the ASR configuration server you deployed earlier sits on

  • Click “OK”
  • Enter a “Pre-shared key”, this will be used at both ends of the connection to establish encryption, I’d suggest a string of at least 20 random characters.
  • Click “OK”

The Azure end should now be good to go, all that’s left is to pass along the relevant config to whoever will be configuring the connection on the on-premises firewall .  Luckily Microsoft have provided the ability to download a sample config that’s prepopulated (for the most part) based on what you’ve already configured at the Azure end.  Samples are currently available for the following vendors:

So let’s go ahead and grab a config:

  • Click the S2S connection you just deployed
  • Click “Download Configuration” on the “Overview” blade
  • Select the “Device vendor” of your on-premises device
  • Select the “Device family” of your on-premises device
  • Select the “Firmware version” of your on-premises device – have a look at the specifics for this last option, they matter.
  • Click “Download configuration”

Now pass that configuration file along to whoever is tasked with configuring the on-premises device.

Once the on-premises end has been configured correctly, your connection status should show as “Connected” and you should be good to crack move onto the next step.

NOTE:  If you want to test connectivity over the VPN, RDP onto one of the VMs you failed over into Azure and try pinging the ASR configuration server (assuming you’ve enabled ping on that server already).

Enable Reverse Replication (Continued)

Within your Recovery Service Vault:

  • Navigate to the “Recovery Plans (Site Recovery)” blade
  • Select the Recovery Plan we created earlier
  • Click “…” to open the context menu
  • Select “Re-protect”
  • From the drop-down, select “Azure to on-premises”

NOTE:  Ignore the VM names below, my lab pretty much blew up halfway through this (hardware failure), so I had to start again…woot!

For the purposes of this guide we’ll be using the same “Process server” and “Master target” server that sits on-premises.  In production though, it might make sense to deploy a new “Process server” into Azure.  There are two ways to do this, deploy it from the Azure marketplace, or click the banner in the screenshot below.

If you go with the latter option, you may run into the same issue I did.  With that in mind, I’ve included a section at the end of this guide detailing how to work round it.

For us though, let’s move on:

  • Confirm your “Process Server” and “Master Target” server is the on-premises ASR Configuration server you deployed earlier
  • Select the VMware “Datastore(s)” that hold your on-premises VMs
  • Click “OK”

As ever, you can monitor things from the “Site Recovery Jobs” blade

Once the job has completed successfully the “Type” should show a status of “Protected” again.

Keep in mind  though that before taking any further ASR actions against these VMs, you allow the synchronisation tasks to complete.

Once complete, the “Replicated Items” status will change to “Protected”

And that’s it for this guide, in the next section we’ll invoke a failback to our VMware environment, thereby completing the loop 🙂

As promised, I’ve also included a section below on how to deploy a new Process Server in Azure in case you find yourself needing it.

EXTRA:
Deploy an Additional Process Server in Azure

You’re possibly looking at this section because you went to deploy an additional process server and it failed.  If that’s the case, you may have run into the issue below, here are a few steps on what cause it and how to resolve it.

  • Click the banner to deploy a new Process Server
  • Choose “Deploy a failback process server in Azure”
  • Confirm you have the correct “Subscription”, “Resource Group” and “Region” selected

NOTE:  Ideally, these should be set to match your Recovery Services Vault

  • Select the “Azure Network” that hosts your Azure VMs.

NOTE:  If you have VMs spanning mutiple VNETs, you’ll need to deploy a Process Server into each one.

  • Select an appropriate “Subnet”, for our example I’m just placing it in the same subnet as the VMs
  • Give it a “Server name”
  • Specify a “Username” and “Password”
  • Specify a “Storage Account” to hold the VM disks
  • Pick an available “IP Address” from your chosen subnet
  • Click “OK”

…and here’s where I ran into a problem.  The error spat out by the system is actually really quite helpful and points us in the right direction to resolve it.

Basically it doesn’t like one of the parameters that it’s been passed, specifically one to do with the OS Image that’s been specified.

Luckily, we can have a look at the ARM template to try and get a handle on what went wrong and hopefully fix it.

  • Go back to your “Resource Group”
  • Select the “Deployments” blade
  • Here you should see the “Failed” deployment for your Process Server, click it
  • Now select the “Template” blade

Here I started scrolling down through the variables being used within the template to see if I could spot anything a little…funky.

…and sure enough, that looks like a trailing space at the end of the “OsVersion” variable

How that we know what “might” be going on, let’s go and kick off a redeployment.

  • Select “Deploy” from the top of the frame

This will open the “Custom deployment” workflow

  • Click “Edit template”
  • Navigate to the error we found above and remove the trailing space
  • Click “Save” in the bottom corner
  • Select the correct “Subscription” and “Resource Group”
  • Place a tick in the T&Cs box
  • Click “Purchase”

If all goes well you should have resolved your issue and the deployment of your Process Server should continue through to completion.

Here’s a link to the official documentation if you need the rest of the configuration steps for Process Server deployment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.