Typically when dealing with storage performance problems, the first questions asked are what type of disks? What speed? what protocol? However your problem might be in the first port of call of your storage array, the storage controller!
When reviewing storage controller configurations of the most favourite storage arrays, one thing stood out to me and that is the absence of CPU specs. Storage controllers of the storage array are just plain simple servers, equipped with a bunch of I/O ports that establish communication with the back end disks and provide a front-end interface to communicate with the attached hosts. The storage controllers run proprietary software providing data services and specific storage features. And providing data services and running the software requires CPU power! After digging some more, I discovered that most storage controllers are equipped with two CPUs ranging from quad core to eight core. Sure there are some exceptions but lets stick to the most common configurations. This means that the typical enterprise storage array is equipped with 16 to 32 cores in total as they come with two storage controllers. 16 to 32 cores, thats it! What are these storage controller CPU used for? Today’s storage controller activity and responsibility:
- Setting up and maintaining data paths.
- Mirror writes for write-back cache between the storage controllers for redundancy and data availability.
- Data movement and data integrity.
- Maintaining RAID levels and calculating & writing parity data.
- Data services such as snapshots and replication.
- Internal data saving services such as deduplication and compression.
- Executing Multi-tiering algorithms and promoting and demoting data to appropriate tier level.
- Running integrated management software providing management and monitoring functionality of the array.
Historically arrays were designed to provide centralised data storage to a handful of servers. I/O performance was not the a pain point as many arrays easily delivered the request each single server could make. Then virtualisation hit the storage array. Many average I/O consuming grouped together on a single server, making that server, as George Crump would call it, an fire-breathing I/O demon. Mobility of virtual machines required an increased of connectivity, such that it was virtually impossible (no pun intended) to manually balance I/O load across the available storage controller I/O ports. The need for performance increased, resulting in larger number of disks managed by the storage controller, different types of disks, different speeds.
Virtualization-first policies pushed all types of servers and their I/O patterns on the storage array, introducing the need of new methods of software defined economics (did someone coin that term?) It became obvious that not every virtual machine requires the fastest resource 24/7, causing interest into multi-tiered solutions. Multi-tiering requires smart algorithms promoting and demoting data when it makes sense, providing the best performance to the workload when required while offering the best level of economics to the organisation. Snapshotting, dedup and other internal data saving services raised the need of CPU cycles even more. With the increase of I/O demand and introduction of new data services its not uncommon for virtualised datacenter to have over-utilised storage controllers.
Rethink current performance architecture
Server side acceleration platforms increases performance of virtual machines by leveraging faster resources (flash & memory) that are in closer proximity to the application than the storage array datastore. By keeping the data in the server layer, storage acceleration platforms, such as PernixData FVP, provides additional benefits to the storage area network and the storage array.
Impact of read acceleration on data movement
Hypervisors are seen as I/O blenders, sending the stream of random I/O of many virtual machines to the I/O ports of the storage controllers. Theses reads and writes must be processed, writes are committed to disks, data retrieved from disks to satisfy the read requests. All these operations consume CPU cycles. When accelerating writes, subsequents reads of that data are serviced from the flash device closest to the application. Typically data is read multiple times, decreasing latency for the application, but also unloading – relieving – the architecture from servicing that load. FVP provides metrics that show how many IO are saved from the datastore by servicing the data from flash. The screenshot below is taken after 6 weeks of accelerating database workloads. More info about this architecture here
The storage array does not have to service those 8 billion IOPS, but not only that, 520,63 TB did not traverse across the storage area network occupying the I/O ports of the storage controllers. That means that other workloads, maybe virtualised workload that hasn’t been accelerated yet, or external workload using the same array will not be affected by that I/O anymore. Less I/O hitting the inbound I/O queues on the storage controllers, allowing other I/O to flow more freely into the storage controller, lesser data to be retrieved by disks, lesser I/O going upstream from disk to I/O ports to begin its journey back from the storage controller all the way up to the application again. Saving copious amounts of CPU cycles allowing data services and other internal processes to take advantage of the available CPU cycles increasing the response of the array.
The screenshot is made by one of my favourite customers, but we are running a cool contest see which application the accelerated and how many IOPS other customers have saved.
Impact of Write acceleration on storage controller write cache (mirroring)
Almost all current storage arrays contain cache structures to speed up both reads and writes. Speeding up writes provide benefit to both the application and the array itself. Writing to NVRAM, where typically the write cache resides, is much faster than writing to (RAID-configured) disk structures allowing for faster write acknowledgements. As the acknowledgment is provided to the application, the array can “leisurely” structure the writes in the most optimum way to commit to data the backend disks.
To avoid a storage controller to be a single point of failure, redundancy is necessary to avoid data loss. Some vendors provide journaled and consistency points for redundancy purposes, most vendors mirror writes between the cache areas of both controllers. Mirrored write cache requires coordination between the controllers to ensure data coherency. Typically messaging is used via the backplane between controllers to ensure correctness. Mirroring data and messaging requires CPU cycles of both controllers.
Unfortunately even with these NVRAM structures, write problems seem to be persisting even today. No matter the size or speed of the NVRAM it’s the back-end disk capability to process writes that is being overwhelmed. Increasing cache sizes at the controller layer just delays the point at which write performance problems begins. Typically this occurs when there is a spike of write I/O. Remember, most ESX environments generate a constant flow of I/O’s adding a spike of I/Os is usually adding insult to injury to the already strained storage controller. Some controllers reserve a static portion for mirrored writes, forcing the controller to flush the data to disk when that portion begins to fill up. As the I/O keeps pouring in, the write cache has to wait to complete the incoming I/O until the current write data is committed to disk resulting in high latency for the application. Storage controller CPUs can be overwhelmed as the incoming I/O has to be mirrored between cache structures and coherency has to be guaranteed. Waisting precious CPU cycles on (a lot of) messaging between controllers instead of using it for other data services and features.
FVP write back acknowledges the I/O once the data is written to the flash resources in the FVP cluster. FVP does not replace the datastore, therefor writes still have to be written to the storage array. The process of writing data to the array becomes transparent as the application already received the acknowledgement from FVP. This allows FVP to shape write patterns in such a way that are more suitable for the array to process. Typically FVP writes the data as fast as possible, but when the array is heavily utilised FVP time-releases the I/O’s. This results in a more uniform IO pattern. (Datastore write in the performance graph above). By flatting the spike, i.e. writing the same amount of IO’s over a longer period of time, the storage controller can handle the incoming stream much better. Avoiding forced cache flushes and CPU bottlenecks as a result.
FVP allows you to accelerate your workload, the acceleration of reads and writes reduces the amount of I/O’s hitting the array and the workload pattern. Customers who implemented FVP to accelerate their workloads experience significant changes of storage controllers utilisation benefitting external and non-accelerated workloads in the mix.