This blog is the first in a two part series that examines Fibre Channel over Ethernet (FCoE) implementations with VMware vSphere 5.1 using VMware’s software FCoE and a hardware FCoE adapter. These blogs are intended to share our findings regarding the relative performance of software and hardware FCoE adapters when working with large-block, sequential I/O – in particular, the impact of the Disk.DiskIOMaxSize setting on storage performance.
In recent lab tests with software FCoE and a few virtual machines (VMs), we encountered an unexpected drop in throughput (MB/s) with large block I/O. We were using sequential I/O through a single physical 10Gb Ethernet (10GbE) port. The VMs were running Microsoft Windows 2008 R2; each was configured with four virtual CPUs (vCPUs) and 8GB of memory. Two raw device mapping (RDM) disks were mapped to each host. We enabled the software FCoE driver that comes with the hypervisor and made appropriate LUN mappings.
The IOmeter software tool was used to test a range of block sizes (512B – 1MB) across all RDM drives, with two workers per VM – one set to test 50% reads and the other to test 50% writes for full duplex mode. The targets used in this case were four Linux-based storage memory emulators with four targets each, for a total of 16 targets.
Figure 1 shows the results for these sequential I/O tests when we used the default setting for Disk.DiskMaxIOSize. This figure represents the baseline performance for software FCoE.
Figure 1. I/Os with default Disk.DiskMaxIOSize setting. using software FCoE.
With larger block sizes, the array was unable to perform any I/Os.
Figure 2 shows throughput during the same test of software FCoE and, in particular, the drop-off that occurred with larger block sizes. At this point, we theorized that the array became stressed with blocks that were 64KB or larger.
Figure 2. Throughput with default Disk.DiskMaxIOSize setting running software FCoE.
We also observed latency times using esxtop on the host to see if they might be a concern. Results are shown in Table 1, which provides average rather than median values.
For more information on storage performance in vSphere, refer to the VMware vSphere Blog.
Table 1. Average latency values with default setting
|Block size||DAVG read||DAVG write|
|256K||16 ms||16 ms|
|512K||17 ms||43 ms|
|1M||19 ms||50 ms|
Note that, with the default Disk.DiskMaxIOSize setting, no I/Os were taking place with larger block sizes, as demonstrated by Figures 1 and 2. DAVG represents the latency between the adapter and the target device. Note that according to VMware, latency of 20 ms or more are major storage performance concerns.
To address this storage performance issue with large block sizes, we turned to VMware KB article (kb:1003469), which suggests reducing the size of I/O requests passed to the storage device in order to enhance storage performance. You can achieve this size reduction by tuning the global parameter Disk.DiskMaxIOSize, which is found on the host under Configuration→Software→Advance Settings→Disk. As shown in Figure 3, this parameter is defined as the Max Disk READ/WRITE I/O size before splitting (in KB); thus, larger blocks are split into multiples of the Disk.DiskMaxIOSize setting.
Kudos to Erik Zandboer, VMware expert and VMdamentals blogger, for bringing this article to our attention!
Figure 3. Displaying the default Disk.DiskMaxIOSize setting, which is 32MB
After reading this KB article, we decided to vary the setting of Disk.DiskMaxIOSize to determine if this would, indeed, enhance storage performance. Since we had noticed that performance was beginning to deteriorate with 64KB blocks, we restricted the maximum block size to 64KB, as shown in Figure 4.
Figure 4. Changing the Disk.DiskMaxIOSize setting
Next, we re-ran the test to see if there was any impact on IOPS, throughput and latency.
Note that we did not monitor CPU utilization, which should not be overlooked if you plan to tune Disk.DiskMaxIOSize.
Figure 5 shows that reducing the Disk.DiskMaxIOSize setting had little impact on read/write I/Os.
Figure 5. I/O performance with the new Disk.DiskMaxIOSize setting when running software FCoE
Figure 6 shows the throughput achieved with the new Disk.DiskMaxIOSize setting. Throughput was now able to approach line rate (2300mb) and, rather than crashing as before, only dropped slightly with large-block I/Os (512KB and 1MB).
Figure 6. Throughput with the new Disk.DiskMaxIOSize setting
Table 2 shows that, with the new Disk.DiskMaxIOSize setting, latency began to average out between read and write I/Os, with 33ms for 512KB blocks and 68ms – 72ms for 1MB blocks. However, these latency timings are still in the range of sever storage performance conditions.
Table 2. Average latency values with 64KB blocks
|Block size||DAVG read||DAVG write|
|256K||13 ms||13 ms|
|512K||33 ms||33 ms|
|1M||68 ms||72 ms|
Please note, these results are specific to our lab environment. You should perform your own tests to determine if changing the default Disk.DiskMaxIOSize setting would be beneficial in your particular environment. In addition, there may be trade-offs elsewhere in the storage stack that we are still investigating; we’ll also be comparing these software FCoE results with a hardware FCoE implementation.
So, do you really need to change the Disk.DiskMaxIOSize setting? We agree with Erik that you first need to determine the block size your VMs are executing and, if you are getting poor storage performance with large blocks, then tuning Disk.DiskMaxIOSize might be a consideration. Note that we performed these tests in order to validate that tuning Disk.DiskMaxIOSize would enhance storage performance in a lab environment with sequential reads and writes. However, in many real-world cases, traffic between ESX/ESXi hosts and the array tends to be more random.
Here are the key takeaways:
- Software FCoE out of the box does not handle large block I/O requests, resulting in lower throughput and latency outside of the range of recommended by VMware. Large block performance can negatively impact applications such as backup, streaming media and other large block applications.
- Using VMware ESXi’s Disk.DiskMaxIOSize, we could change the performance dynamics. However, latency still measured outside the acceptable range.
In part two of this blog, we will repeat this testing to evaluate the impact of Disk.DiskMaxIOSize on storage performance with a hardware FCoE implementation. We will note that hardware FCoE has many advantages including better CPU efficiencies. Stay tuned…