Switching Between NSX-OVS and Upstream OVS Kernel Modules
Since NSX-OVS is not supported in the latest kernel version, you can switch the
NSX-OVS kernel module to the upstream OVS kernel module before upgrading the kernel to the
latest version. If NCP does not work with the latest kernel after a kernel upgrade, you can
do a rollback (switch back to NSX-OVS and downgrade the kernel).
The first procedure below describes how to switch the NSX-OVS kernel module to the
upstream OVS kernel module when you upgrade the kernel. The second procedure describes
how to switch back to the NSX-OVS kernel module when you downgrade the kernel.
Both procedures involve the Kubernetes
concepts
taints
and tolerations
. For more
information about these concepts, see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration.Switch to the upstream OVS kernel
module
- Modify thetolerationsof bothdaemonset.apps/nsx-ncp-bootstrapanddaemonset.apps/nsx-node-agent. Change the following:- effect: NoExecute operator: Existsto:- effect: NoExecute key: evict-user-pods
- Modify the nsx-node-agent configmap. Changeuse_nsx_ovs_kernel_moduletoFalse.
- Taintworker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:kubectl taint nodes worker-node1 evict-user-pods:NoExecute
- Taintworker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
- Uninstall the ovs-kernel module and restore the upstream OVS kernel module on worker-node1.
- Delete kmod filesvport-geneve.ko,vport-gre.ko,vport-lisp.ko,vport-stt.ko,vport-vxlan.ko,openvswitch.koin directory/lib/modules/$(uname -r)/weak-updates/openvswitch.
- If there arevport-geneve.ko,vport-gre.ko,vport-lisp.ko,vport-stt.ko,vport-vxlan.ko,openvswitch.kofiles in directory/lib/modules/$(uname -r)/nsx/usr-ovs-kmod-backup, move them to directory/lib/modules/$(uname -r)/weak-updates/openvswitch.
- Delete directory/lib/modules/$(uname -r)/nsx.
- Upgrade the kernel of worker-node1 to the latest version and reboot it.Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.
- Restart kubelet.
- Removetaint"evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
- Removetaint"evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
- Repeat steps 3-9 for other nodes.
- Recover thetolerationsof both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.
Switch back to the NSX-OVS kernel
module
- Modify thetolerationsof bothdaemonset.apps/nsx-ncp-bootstrapanddaemonset.apps/nsx-node-agent. Change the following:- effect: NoExecute operator: Existsto:- effect: NoExecute key: evict-user-pods
- Modify the nsx-node-agent configmap. Changeuse_nsx_ovs_kernel_moduletoTrue.
- Taintworker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:kubectl taint nodes worker-node1 evict-user-pods:NoExecute
- Taintworker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
- Downgrade the kernel of worker-node1 to a supported version and reboot it.Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.
- Restart kubelet.
- Removetaint"evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
- Removetaint"evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
- Repeat steps 3-8 for other nodes.
- Recover thetolerationsof both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.