SANs: EMC VMAXe and HP 3PAR V400


If you’re in the market for a new enterprise-class storage array, both EMC and HP/3PAR have good options for you. Toward the end of 2011, we began evaluating solutions from these two vendors with whom we have history and solid relationships. On the EMC side, we’ve grown up through a CX300 in 2006 and into two CX3-40’s in 2008. At the end of 2008, we deployed a 3PAR T400 at our production site and brought back that CX3-40 to consolidate it with the one at our HQ. It’s been three years hence, and our needs call for new tech.

As is the nature of technology, storage has made leaps and bounds since 2008. What once was unique and elevating to 3PAR–wide striping and simplified provisioning from one big pool of disks–has become common place in arrays of all classes. We used to liken it to replacing the carpet in a room with furniture. It’s a real chore when you have to painstakingly push all the chairs and tables into a corner (or out of the room altogether!) when you want to improve or replace the carpet. With disk abstraction and data-shifting features, though, changes and optimizations can be made without the headaches.

SAN Winner: HP 3PAR V400


At the end of the day, it wasn’t the minor technological differences that made the decision for us. Sure, we believed that EMC’s VMAXe was the truly enterprise-class array. The ace, though, was product positioning.

We have two SANs. We have a CLARiiON CX3-40 from EMC, which is legacy and as the market sometimes calls it, monolithic. It needs to go. We also have a 3PAR T400, which is as flexible as the day we bought it and has plenty of life left in it, due to its architecture (even though we acquired it in 2008). Thus, when the cards were on the table, only HP had the ability to offer a “free” upgrade to our T400 as well as the new V400.

The upgrade turns our T-series into a multi-tier array with SSD, FC, and NL, and the V-series replaces our aging CLARiiON. EMC tried to compete, but all they could offer was a “deal” less appealing than the original single-array proposition.

Honestly, I felt bad for them, because there was nothing they could do unless they literally took a deep loss (no funny money about it). HP’s solution was the equivalent of two new, good, flexible, low-maintenance SANs. EMC just learned how to be flexible and match 3PAR, so their older arrays (one of which was part of their attempt at competition) just didn’t equate.

It’s going to be another hard sell in 2-3 years when we open the next RFP, because HP/3PAR will now have a monopoly on the floor. Who knows, though? Maybe HP will stumble with their new golden egg, or maybe EMC will figure out how to undercut HP with price while not sacrificing features. For now, the trophy goes to HP. Congrats.

HP 3PAR: AO Update…Sorta


I wish there was an awesome update that I’ve just been too preoccupied to post, but it’s more of a “well. . . .” After talking with HP/3PAR folks a couple months back and re-architecting things again, our setup is running pretty well in a tiered config, but the caveats in the prior post remain. Furthermore, there are a few stipulations that I think HP/3PAR should provide customers or that customers should consider themselves before buying into the tiered concept.

Critical mass of each media type: Think of it like failover capacity (in my case, vSphere clusters). If I have only two or three hosts in my cluster, I have to leave at least 33% capacity free on each to handle the loss of one host. But if I have five hosts, or even ten hosts, I only have to leave 20% (or for ten hosts, 10%) free to account for a host loss.Tiered media works the same way, though it feels uber wasted, unless you have a ton of stale/archive data. Our config only included 24 near-line SATA disks (and our tiered upgrade to our existing array only had 16 disks). While that adds 45TB+ to capacity, realistically, those disks can only handle between 1,000 and 2,000 IOPS. Tiering (AO) considers these things, but seems a little under qualified in considering virtual environments. Random seeks are the enemies of SATA, but when AO throws tiny chunks of hundreds of VMs on only two dozen SATA disks (then subtract RAID/parity), it can get bad fast. I’ve found this to especially be the case with OS files. Windows leaves quite a few alone after boot…so AO moves them down. Now run some maintenance reboot those boxes–ouch!

Installing ESXi 4.1 with Boot from SAN


We’ve been running ESX since the days of v2.5, but with the news that v4.1 will be the last “fat” version with a RedHat service console, we decided it was time to transition to ESXi. The 30+ step guide below describes our process using an EMC CLARiiON CX3 SAN and Dell hosts with redundant Qlogic HBAs (fiber environment).

Document network/port mappings in vSphere Client on existing ESX server
Put host into maintenance mode
Shutdown host
Remove host from Storage Group in EMC Navisphere
Create dedicated Storage Group per host for the boot LUN in Navisphere
Create the 5GB boot LUN for the host
Add the boot LUN to the host’s Storage Group
Connect to the host console via the Dell Remote Access Card (DRAC)
Attach ESXi media via DRAC virtual media
Power on host (physically or via the DRAC)
Press CTRL+Q to enter Qlogic FastUtil
Select Configuration Settings
Select Adapter Settings
Change “HBA BIOS” to Enabled [ESC]
Select “Selectable Boot”
Change to “Enabled”
Change primary from zeroes to the disk with the address of the owning SP
Compare the address with the EMC Storage Processor (SP) Front-End Port addresses in Navisphere
If the disk shows as “LUNZ”, do not enable Selectable Boot or configure a disk (skip to 21)
Escape and Save settings, but don’t leave the utility
At the main menu, select “Select Host Adapter”
Change to the next adapter
Repeat steps 12 through 20
Exit the utility, reboot, and press F2 to enter Setup
Change Boot Hard Disk Sequence to put the SAN disk first
Exit BIOS and reboot
Press F11 for boot menu
Select Virtual CD
In Setup on the “Select a Disk” screen, select the remote 5GB LUN
Press F11 to begin install
Press Enter to reboot the host
After setup completes, configure password, time, and network
Add host to vCenter and configure networking (per step 1)
Add LUNs to the host’s Storage Group in Navisphere
Rescan for storage on host in vCenter

HP 3PAR: The AO Caveat

Earlier this year, we posted about a new SAN bidding process and the eventual winner, the HP 3PAR V400. Now that we’ve been live on it for about six weeks, it’s time for a small update on a particular feature that might weigh in on your own decision, if you’re in the market.

Our new V400 was our first foray into the tiered storage market and we liked what we heard about gaining the speed of SSD storage on hot blocks while not wasting the cost of average data. EMC claimed advanced metrics, granular policies, and the ability to optimize as frequently as every 10 minutes. This sounded REALLY good. 3PAR also cited some of those things, sans the frequency, and we assumed they were about even, granted the results might be slightly delayed on the V400 (vs. VMAXe). What we’ve discovered isn’t so symmetric.

HP 3PAR leverage a feature they call “Adaptive Optimization”, which moves 128MB regions of data between storage tiers (0: SSD, 1: FC, 2: NL). The management of this feature was/is incorporated into 3PAR System Reporter product, which accumulates array performance data on an ongoing basis. While this repository of information is definitely the right choice to build AO upon, the implementation thereof is very elementary.

AO configuration is based on policies which apply to Common Provisioning Groups (CPGs), which are the containers/metadata holders of Virtual Volumes (VVs), otherwise known as LUNs in competitor storage products.

To briefly explain this single-step configuration of an AO policy, the tiers are CPGs (CPGs are a single type and RAID config of storage; i.e. SSD RAID 5), and the tier sizes are the maximum allowable space that the policy can use in a given CPG. For scheduling, the date/week day/hour are when the optimization(s) run and any movements are based on the amount data (in hours) specified in Measurement Hours (ranging from 3 to 48 hours; i.e. run at 1700 based on the past 9 hours of data). Mode determines how aggressive it is in moving regions up/down (Performance, Balanced, and Cost), and the last is whether the policy is enabled.

What we’ve found is that these options fall short of our tiering hopes and tend to de-optimize our storage such that things run on the slower side because AO has decided it should move regions down to NL (it seems very biased toward NL, even in a “Performance” mode configuration).

Before I go further, I would like to say that we have no hands-on experience with EMC storage to prove that such limitations are not the case with them, but my understanding from our technical review was that they had more intelligence built in to VMAXe, etc.

Our main complaints are in the reactive nature of AO. In our environment, our cycles of data activity are based more on day of the week than a specific hour of the day. In other words, Mondays look like Mondays, Tuesdays like Tuesdays, etc. With AO, we can only base the “optimization” on up to 48 hours of immediately past data such that even if we focus on weekday business hours, the nightly movements will prepare Tuesday for Monday’s behavior, and so on.

From what EMC said, their tiering software lets you decide what percentage of each type of storage is used by a given policy. So you might have a policy that uses 20% SSD, 70% FC, and 10% NL, and it will move hot/warm/cold data around accordingly. In 3PAR AO, those tier size settings are just “allowable” space, but there’s no way to encourage AO to use the SSD, for example. It may simply decide that it sees the data as cold and to move it down to NL or wherever the coldest allowance is.

3PAR’s answer is to shrink that size setting so it can’t use more than ### GiBs, but this becomes tedious, depending on how many VVs you have in each CPG. For us, we went with a three-policy configuration of “Gold”, “Silver” and “Bronze” that have greater/lesser amounts of SSD, FC, and NL as you cross the spectrum (i.e. Gold has 1200 GB of SSD, 10000 GB of FC, and 500 GB of NL; while Bronze has no SSD, 10000 GB of FC, and 10000 GB of NL; and Silver is a balance of the two). We find that though we’d wish Gold to be aggressive and use all of its SSD, it often leaves hundreds of GBs unused.

All that said, we are meeting with HP 3PAR folks tomorrow to see about tweaking the policies (and creating new ones, probably) to improve the behavior, but some of these things will remain unsolved (i.e. scheduling and the reactive nature of it).

For all this negativity, 3PAR shines with large pools of homogeneous storage (i.e. hundreds of FC disks), and as it stands, I’m not sure we didn’t make a mistake when insisting upon a tiered solution rather than a single 300 x 400GB FC drive configuration. I believed in the power of SSD, which I’m not yet seeing in 3PAR’s setup, but I’m not sure 3PAR knows how to use them properly. So…consider that when shopping. They really do make a good argument for good ‘ole reliable FC disks in large quantities.

EMC XtremIO Gen2 and VMware


Synopis: My organization recently received and deployed one X-Brick of EMC’s initial GA release of the XtremIO Gen2 400GB storage array (raw 10TB flash; usable physical 7.47TB). Since this is a virgin product, virtually no community support or feedback exists, so this is a shout out for other org/user experience in the field.

Breakdown: We are a fully virtualized environment running on VMware ESXi 5.5 on modern Dell hardware and Cisco Nexus switches (converged to the host; fiber to the storage), and originally sourced on 3PAR storage. After initial deployment of the XtremIO array in early December (2013), we began our migration of VMs, beginning with lesser-priority, yet still production guests.

Within 24 hours, we encountered our first issues when one VM became unresponsive and upon a soft reboot (Windows 2012 guest), failed to boot–it hung at the Windows logo. Without going into too much detail, we hit an All Paths Down situation to the XtremIO array and later after rebooting the host, we still could not boot that initial guest. Only when we migrated (Storage vMotion) back to our 3PAR array could we successfully boot the VM.

Over the past two weeks, we’ve seen unfortunately low levels of deduplication (1.4-1.6 to 1), which we expected to be on par with Pure Storage’s offering. Apparently, Pure’s roughly 4.0-5.0 to 1 reduction ratio is due, at least in part, to their use of compression on their arrays. We were unaware of this feature difference until mid-implementation (when the ratio disparity became clear).

Additionally, the case of VMs going dark in part and then whole after reboot (which is normally the solution to most Windows problems 🙂 has repeated on three VMs to date (one mere minutes ago), resolved only by migrating back to 3PAR, even if only temporarily. The guests in question have all been running Windows Server 2012 R2, using virtual hardware versions 9 and 10, and powered atop ESXi 5.5.0 build 1331820. HBAs are QLogic 8262 CNAs running 5.5-compatible firmware/drivers.

Specific to the XtremIO array, we’ve also experienced a non-standard amount of “small block IO” alerts, even though we have isolated vSphere HA heartbeats to two volumes/datastores. Furthermore, we see routine albeit momentary latency spikes into the 10s of milliseconds, which are as yet unexplainable, but could be hypothesized to correlate with the small block I/O, which XtremIO forthrightly does not handle well due to its 4KB fixed block deduplication. In contract, Pure Storage doesn’t handle large block I/O as well due to its variable block and process of deduplication.

Aside from these issues (which are significant), the XtremIO array has generally performed well during normal operations with a full load (we migrated as much of our production environment as it would hold–we ran shy of a full migration due to insufficient data reduction ratios).

Without further adieu, if you have personal field experience with an XtremIO array in a virtual environment (preferably server virtualization, but VDI is fine, too), please post your stories here. Perhaps a cause to these woes will arise from the collective exchange. Thank you.

Update (one hour later, 12/26/13): we seem to have determined that XtremIO has an issue with Windows Server 2012 R2 guests and/or EFI firmware boot. Our environment has both 2012 and 2012 R2 guests, but our 2012 guests exclusively use BIOS firmware and our 2012 R2 guests exclusively use EFI firmware. We have not yet been able to test EFI on 2012 (non-R2) or BIOS on 2012 R2. All guests (2008 R2 and later) use LSI Logic SAS controllers.

EMC XtremIO and VMware EFI

After a couple weeks of troubleshooting by EMC/XtremIO and VMware engineers, the issue was determined to be an issue with EFI boot handing off a 7MB block to the XtremIO array, which filled the queue, and which would never clear as it was waiting for more data to be able to complete communication (i.e. deadlock). This seems to only happen with EFI firmware VMs (tested with Windows 2012 and Windows 2012 R2) and the issue is on the XtremIO end.

The good news is that the problem can be mitigated by adjusting the Disk.DiskMaxIOSize setting on each ESXi host from the default 32MB (32768) to 4MB (4096). You can find this in vCenter > Host > Configuration > Advanced Settings (bottom one) > Disk > Disk.DiskMaxIOSize. The XtremIO team is working on a permanent fix in the meantime, and the workaround can be implemented hot with no impact to active operations (potentially minor host CPU load increase as ESXi breaks down >4MB I/O into 4MB chunks).

Coming Attractions: Service Manager & IPv6


On this fine evening, we wanted to share with you a little preview of coming attractions, which will hopefully appear in future posts. Two of our projects revolve around Microsoft System Center Service Manager and IPv6 (separate endeavors). Both of these hold good promise for our organization and where we go with each may help you as well.

Through the years, we’ve used a couple different help desk and change management tools–Track-It! and Alloy Navigator–and in each, we’ve run into issues and shortcomings. Track-It! was fine as a ticketing system, but provided very little correlation (if any), no audit trail, and sparse asset management. Alloy is a step in the right direction with a pretty comprehensive set of features, ranging from Purchase Orders to Incident and Change Management to Asset tracking, but the application and system itself are fraught with bugs, counter-intuitive processes, etc. In other words, lots of ongoing work which is worthy of many tickets itself.

So we’re venturing into Microsoft’s Service Manager territory and are very interested in the integration with the rest of the System Center suite (Configuration Manager and Operations Manager), as well as Active Directory. We’re also checking out Provance IT Asset Management, a management pack for SM, which enhances the product and provides an otherwise absent financial piece. Looking good so far!

On the networking side, we’ve been in the R&D phase with IPv6 (Internet Protocol version 6) for a few months now since receiving our own /48 block of addresses from ARIN. The documentation online is a bit sparse and mostly targeted to either consumers (Teredo) or ISPs, but we’re finding some nuggets in the digging. Some good resources thus far are:

IPv6: Cisco IOS


Addressing. Routing. DHCP. EIGRP. HSRP. Mobility. After consuming Cisco’s 706-page IOS IPv6 Configuration Guide, these are just a few of the areas we’re processing as the deployment plan starts coming together. If you’re running something other than Cisco, some of the commands below, and of course EIGRP, may not directly apply, but perhaps you can abstract the concepts and use them in your own network.

Here’s a rundown of the IOS commands we’ll be utilizing as we begin to implement:

ipv6 address: (Interface) Apply to VLAN interfaces, routing interfaces, etc (i.e. vlan20, g1/10, g2/0/23)
ipv6 general-prefix: (Global) Specifies the prefix of your IPv6 address space (i.e. 2001:d8:91B5::/48)
ipv6 unicast-routing: (Global) Enables IPv6 routing on the switch/router
ip name-server: (Global) Not specific to IPv4 or v6, but necessary to add IPv6 name server addresses
ipv6 dhcp relay destination: (Interface) Configure on all interfaces that need DHCP relaying
ipv6 eigrp: (Interface) Unlike IPv4, EIGRP is interface-specific (no “network” statements); apply to routing interfaces
ipv6 router eigrp: (Global) Creates the EIGRP router process on the switch
ipv6 hello-interval eigrp: (Interface) Configured on interfaces using EIGRP to set the frequency of hello packets to adjacent routers
ipv6 hold-time eigrp: (Interface) Configured on interfaces using EIGRP to tell neighbors how long the sender is valid
Coming next: a consolidated IPv6 deployment plan, derived from NIST Guidelines for the Secure Deployment of IPv6…