Best Practices for NBD Transport
NBD (network block device) is the most universal of VDDK transport
modes. It does not require dedicated backup proxy VMs as does HotAdd, and works on all
datastore types, not just SAN. Sections below give tips for improving NFC (network file
copy) performance for NBD backups.
Parallel jobs on NFC servers: On ESXi hosts,
hostd
contains an NFC server. The hostd
NFC
memory limit is 96MB. VMware recommends backing up 50 or fewer disks in parallel through
each ESXi host. NFC cannot handle too many requests at the same time. It will queue
requests until previous jobs complete.Dedicated backup network: As of vSphere 7.0, ESXi
hosts support a dedicated network for NBD transport. When the tag
vSphereBackupNFC
is applied to a VMkernel adapter's NIC type, NBD
backup traffic goes through the chosen virtual NIC. Programmers can apply the tag by
making the following vSphere API call:
Customers can use an ESXi command like this, which designates interfaceHostVirtualNicManager->SelectVnicForNicType(nicType,device);
vmk2
for NBD backup:esxcli network ip interface tag add -t vSphereBackupNFC -i vmk2
Network I/O Control (NIOC) for NFC backup: When
NIOC is enabled in the virtual distributed switch (VDS or DVS), switch traffic is
divided into various predefined network resource pools, now including one dedicated to
vSphere Backup NFC. The API enumeration for this network resource pool is
VADP_NIOConBackupNfc
. System administrators can set this up in the
vSphere Client with System Traffic > Configure > Edit
, then
optionally change resource settings. Thereafter any VADP NBD traffic is shaped by these
VDS settings. NIOC may be used together with the dedicated backup network feature above,
but this is not a requirement.VDDK 7.0.1 introduced two new error codes,
VIX_E_HOST_SERVER_SHUTDOWN
and
VIX_E_HOST_SERVER_NOT_AVAILABLE
, to indicate entering maintenance
mode (EMM) and in maintenance mode. After VixDiskLib_ConnectEx
to
vCenter, if the backup application calls VixDiskLib_Open
for a
virtual disk on an EMM host, vCenter switches to a different host if possible. Host
switch is non-disruptive; backup continues. If it's too late for host switch, vCenter
returns the SHUTDOWN
code, saying the backup application should retry
after a short delay, hoping for host switch. If no other hosts are available and the
original host is in maintenance mode, vCenter returns NOT_AVAILABLE
.
The backup application may choose to wait, or fail the backup.Error Code | Retry | Comment |
---|---|---|
VIX_E_HOST_NETWORK_CONN_REFUSED
| Frequently | Usually caused by network error. |
VIX_E_HOST_SERVER_SHUTDOWN
| Soon, 3 times | Host will enter maintenance mode (EMM). |
VIX_E_HOST_SERVER_NOT_AVAILABLE
| After waiting? | Host is in maintenance mode (post EMM). |
Host switch to avoid EMM could fail if encryption
keys are not shared among hosts.
NFC compress flags: In vSphere 6.5 and
later, NBD performance can be significantly improved using data compression. Three types
are available (zlib, fastlz, and skipz) specified as flags when opening virtual disks
with the
VixDiskLib_Open()
call. Data layout may impact the
performance of these different algorithms.- VIXDISKLIB_FLAG_OPEN_COMPRESSION_ZLIB– zlib compression
- VIXDISKLIB_FLAG_OPEN_COMPRESSION_FASTLZ– fastlz compression
- VIXDISKLIB_FLAG_OPEN_COMPRESSION_SKIPZ– skipz compression
Asynchronous I/O: In vSphere 6.7 and later,
asynchronous I/O for NBD transport mode is available. It can greatly improve data
transfer speed of NBD transport mode. To implement asynchronous I/O for NBD, use the new
functions
VixDiskLib_ReadAsync()
and
VixDiskLib_WriteAsync()
with callback
VixDiskLib_Wait()
to wait for all asynchronous operations to
complete. In the development kit, see vixDiskLibSample.cpp
for code
examples, following the logic for -readasyncbench
and
-writeasyncbench
options.Many factors impact write performance.
Network latency is not necessarily a significant factor. Here are test results showing
improvements with VDDK 6.7:
- stream read over 10 Gbps network with asynchronous I/O, speed of NBD is ~210 MBps
- stream read over 10 Gbps network with block I/O, speed of NBD is ~160 MBps
- stream write over 10 Gbps network with asynchronous I/O, speed of NBD is ~70 MBps
- stream write over 10 Gbps network with block I/O, speed of NBD is ~60 MBps
I/O buffer improvements: As of vSphere 7.0, changed
block tracking (CBT) has adaptable block size and configurable VMkernel memory limits
for higher performance. This feature requires no developer intervention and is
transparent to users. It is applied automatically by vSphere when a VM is created or
upgraded to hardware version 17, after CBT set or reset. Adaptable block size is up to
four times more space efficient.
As of VDDK 7.0.3, users can configure asynchronous
I/O buffers for NBD or NBDSSL transport. With high latency storage, backup and restore
performance may improve after increasing NFC AIO buffer size. If servers are capable of
high concurrency, backup and restore throughput may improve with more NFC AIO buffers.
The defaults are buffer size 64K (64KB * 1) and buffer count 4. The maximum buffer size
is 2MB (64KB * 32) and the maximum buffer count is 16, as below.
vixDiskLib.nfcAio.Session.BufSizeIn64KB=32 vixDiskLib.nfcAio.Session.BufCount=16
Memory consumption in the NFC server increases with
larger NFC AIO buffer sizes and buffer counts. The memory for each session can be
calculated by
BufSizeIn64KB
* BufCount
. If the value
(BufSizeIn64KB
* BufCount
) is over 16MB, the disk
open operation will fail. In testing, InitEx configuration file settings of buffer size
1MB (64KB * 16) and buffer count 4 performed best, but it depends on hardware setup.In vSphere 6.7 and later, VDDK splits read and
write buffers into 64KB chunks. Changing the buffer size on the VDDK side does not lead
to different memory consumption results on the NFC server side.
In vSphere 6.5 and earlier, the larger the
buffer size on the VDDK side, the more memory was consumed on the NFC server side. With
buffer size set to 1MB, VMware recommended backing up no more than 20 disks in parallel
on an ESXi host. For a 2MB I/O buffer, no more than 10 disks, and so on.
Session limits and vCenter session reuse. In
vSphere 6.5 and later, programs can reuse a vCenter Server session to avoid session
overflow. For details see "Reuse a vCenter Server Session" in chapter 4.
Network bandwidth considerations: VMware
suggests that NBD backups should be done on a network with bandwidth of 10 Gbps or
higher. Operations such as VM cloning or offline migration will also consume memory in
the NFC server. Users must try to arrange their backup window to avoid conflict.
Log analysis for performance issues: The VDDK
sample code can be run to assist with I/O performance analysis. In the configuration
file, set the NFC log level to its highest value
vixDiskLib.nfc.LogLevel=4
. There is no need to set log level in the
server for NFC asynchronous I/O. Then run sample code and investigate
vddk.log
and the vpxa log
to assess
performance.