Skip to content

August 31, 2011

VMworld: SRM 5.0 & vSphere Replication (BCO1562)

Speakers: Lee Dilworth, Clive Wenman (VMware)

Understanding the Use Cases and Implementation Options

Prior to SRM 5, relied on array-based replication
– requires same versions of vCenter and SRM but ESX versions can vary
SRM 5 now supports vSphere Replication (in addition to array-based)
– vSphere Replication requires parity of all versions of vSphere

SRM: Site Recovery Manager
SRA: Storage Replication Adapter

SRM 5 UI allows seeing both sites from one interface

vSphere Replication offers a cost-effective choice/alternative to array-based
– does not replace array-based for the foreseeable future

vSphere Replication
– adds native replication to SRM
– – VMs can be replicated regardless of the underlying storage
– – enables heterogenous data stores
– – replication is managed as a property of a VM
– – efficient replication minimizes impact on VM workloads
– supports Microsoft VSS for guest OS quiescing
– RPO can vary between 15 minutes and 24 hours (bandwidth & resource tradeoff)

VRA: vSphere Replication Agent (runs on ESXi)

vSphere Replication Details
– replication granularity per VM
– – some/all of the VM’s disks
– – initial copy of VM can be seeded in any fashion (online/offline/etc)
– – optional to place the disks anywhere at the recovery site
– simplified replication management
– – user selects destination location of disks
– – user selects RPO
– – user can supply initial copy to save bandwidth
– replication specifics
– – changes on the source disk are tracked by ESXi
– – deltas are sent go the remote site
– – does not use VMware snapshots —very nice!

vSphere Replication Limitations
– focus on virtual disks of powered-on VMs
– VR works at the virtual device layer
– FT, linked clone, templates not supported
– automated fallback of VMs not supported in GA lease
– virtual hardware 7 or later required in VM

VR Network
– 44046 for ongoing transfers
– 31031 for initial traffic / full sync

VR Components: VRMS
– VRMS: vSphere Replication Management Server
– equivalent of replication management stack
– one VRMS per vCenter
– replication management layer
– – maps datastores
– – coordinates between primary and replica sites
– – coordinates SRM test bubbles
*** Tip: re-enable your Getting Started tabs (very helpful info on VR, etc)

VR Filter
– runs in ESXi kernel to be in-line for all VM I/O
– attached to the virtual device, intercepts all I/O to the disk
– keeps replication specific state for individual disks
– – tracks regions of the virtual disks modified by the guest
– – each filter instance has a persistent state file to store replication state
– – in-memory state is flushed when the virtual device is destroyed
– transfers data to the VR server (via TCP) uses vmknic
– implements logic necessary to guarantee consistency

VR Service
– runs in host agent
– implements configuration replication in primary site
– manages replication process for replicated VMs

VR Server
– hides remote datacenter details
– – internally maintains host-datastore connectivity map
– – creates and manages replication instances
– – writes data to ESX hosts via NFC
– – manipulates virtual disks via VirtualDiskManager API
– deployed, configured, and managed by VRMS
– – multiple VR server appliances can be instantiated for both availability and scale reasons
– – max of 500 VR protected VMs
– availability
– – must allow replication even if VC, SRM, or VRMS is down
– – virtual appliance can leverage existing VMware solutions (HA, FT)

VR RPO Scheduling
– configuration of replication includes desired “RPO”
– – how stale can the VM’s data get
– – VR Agent in hostd picks a good time to replicate
– 15mins to 24hrs (async)
– uses past behavior to determine future behavior
– – can replicate VM ahead of schedule to efficiently use bandwidth
– each host runs algorithm to find good schedule
– – meet RPO of each VM
– – efficient use of bandwidth
– not a fixed schedule
– transferred will begin when data is becoming stale

VR Disk Transfer Protocol
– start with a full sync
– – only when we first enable replication
– – read entire disks–both primary-site and secondary-site
– – compare block digests, build map of differenc
– – transfer blocks that are different
– can now transfer deltas
– light-weight deltas (LWDs) allows cross-disk consistency
– – create deltas across multiple disks in a two-phase commit protocol
– – ongoing I/O not penalized with replication active

VR Sizing and Overhead
– per protected VM
– – small spike during initial sync
– per host running protected VMs
– – small spike during initial sync
– – steady state overhead is small CPU increase
– – network overhead during RPO transfer windows
– WAN
– – replication traffic
– VR Server Appliance
– – 500 protected VMs
– – during initial sync VRS-to-NFC traffic needs to be taken into account
– Implementation & sizing paper in draft — check vmware.com

Use Cases:
a) site-to-site (async)
– – requires full vSphere 5 and SRM 5 components
– – primarily for SMB customers, heterogenous SANs, no SAN replication available
b) alongside array replication
– – VR: async
– – array: sync / async
– – cost effective to allow less important VMs to use VR
– – can save expensive sync links for array and tier 1 VMs
– – requirements: same as VR-only except for array requirements
– – used for tiered DR offering, SAN migration, datacenter collapsing
c) remote offices to existing paired SRM datacenters
– – somewhat like a hub & spoke model
– – sharing relationships between remotes and primary sites
– – used for remote offices, branch office DR
d) DR as a Service (DRaaS) provider model
– – same as prior model but to a provider
– – no need for dedicated customer-owned secondary site
– – 10:1 inbound limit; additional require more VC, SRM, etc

Share your thoughts, post a comment.

(required)
(required)

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments