在运行时调试 NCPLast Updated January 21, 2025
联机诊断系统 (Online Diagnostic System, ODS) 功能可在运行时自动调试 NCP。ODS 通过 NCP 中内置的操作手册实现。
操作手册包含调试过程。操作手册会生成调试报告。请注意,您无法修改预定义的操作手册。
您可以运行 CLI 命令来执行以下操作手册操作:
- 调用操作手册以启动运行时调试
- 检查操作手册状态
- 获取调试报告
从 NCP 4.1.2 开始,可以使用操作手册
NCPPendingPod
。此操作手册可用于对停滞在挂起状态的 Pod 进行调试。它将检查 nsx-node-agent/ncp/hyperbus
以帮助找到此类问题的根本原因。运行时的调试步骤
步骤 1:在 NCP Pod 中启动 sha-agent 容器
默认情况下,NCP Pod 中只有一个 NCP 容器。需要执行运行时调试时,需要先在 NCP Pod 中启动 sha-agent 容器。yaml 文件中具有启动 sha-agent 容器所需的密钥和配置映射。sha-agent 容器已添加到 NCP Pod 中。应用 yaml 后,确保两个容器(NCP 和 sha-agent)均处于运行状态。
ncp-ods.yaml:
# Yaml template for NCP Deployment
# Proper kubernetes API and NSX API parameters, and NCP Docker image
# must be specified.
# This yaml file is part of NCP 4.1.1.0 release.
# This is a FAKE configmap, which only ensures that sha-agent could start up.
apiVersion: v1
data:
deployment_info.yml: "outer_collector:\n tsdb:\n # Access metrics instance via
servicename.namespace\n endpoint: metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_namespace:
nsx-system\nnorthstar_service_type: METRICS\norg_id: \ninstance_id:\nmanager_fqdn:
metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_type: NTM\n"
kind: ConfigMap
metadata:
name: sha-agent-config
namespace: nsx-system
---
# This is a FAKE secret, which only ensures that sha-agent could start up.
apiVersion: v1
data:
ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQkFBS0NBUUVBN2ZsRXN4b3dsV3VrVU1qdGVRZnUzYVlLOWJUb2h0Vi9YRVRmK1Bjd3hMUzlmSU9TCjFIeEgrUDNScFR5cEZkOVd4S3ZJNnVMeStkZzRVUzVrT1UwRENXNVhNOFlNK2xaNmN3R2poVkJucnpQSjZJQy8KazhMZXlWcGtUWldpT0Zpdkc3ZkN0dGZrbTBsMktNS2V3ZWNYY1BmQldzU2JsblVGUWZQZnJDZjRaaExsYnk1bApXd0w4ZjZIOG0vNXEvMGhCWERvWk4vOG5objZWS0lGQWd5VDhybGdaaXljNDg5bG9ReXRWVWowUWZndGtqeE9kCm5tQTRuUkJZQzd3c2Q2TEFid0VpQVNTdkFTM21scHV5MVhkMHlCQ3lQT2Z0Titoa2FIRjV1dzFFQ3E2ZUR4V00KWUxvYkdPVTd1V2JFKzNaLzRVN1FEeU9yS3lVdVhnUG1RSnlnRlFJREFRQUJBb0lCQVFDWXRSZHZzd093THJYdgpuVEErTldnRDFkUThuYzJGRUtXOHlQbk1vcHNwN3kyVkpEMXBteUw0VmJCZFQxTFZsVTd4djZhYmkrME5oTUdHCjNyVXp6QWFCMjh1YmpxQ3ZXQ1VWZmR5MzVNUFVPdkI3QVh0dVQyTjFaRXJ2T25FeHBUOGhFMGVnMjJONGZxaVQKT1doMDExMUVnY2dTL2cwMWZIeFdPUyswSXFZVW9SalJIcVFCZ0dkR3NqYmJNTmF1Skl2bitQb1BKM1ZjeVJ4cgorWHhPOHlidHlzUlV3dkxYb2dpTmc5UTRsUENEZmtBdkFWNHdqRGFWeFBHUWs3YmFOOVJzeEVtVnZpZFU4aFpmCnFMbGlXUUwrdFdvWk5zcTM3VlhBRk51R203c2dISFptb3dUcEw5b1BjZU9rNkhsa2g0OEVCQWJmZm9zd0l0c0sKMUM1WWFtakJBb0dCQVBibGl5QzZuc3lMQ01EcUR1OEdYTHo2WER5UktHdnhlbmZjRjEzUzZpR2QxenkwUTVRawprdmdoMDlpVktVM21GQ1BLZElRMENDUWIza0lVU2hxL3c1MGw3KzBtcm5MUDI5ZjBOWHRFdDhNYTdCM1pPbmpBCnlzZnNCOU9KUkkzeDFVV0hwUXhnV3N2Q1RES0sxN2xhTnRYbDNiUGtNdGpRUWNwM21MYlJSeUpSQW9HQkFQYS8KZ0U3aCtSdUp4U2ZSVXNNbEVEOVV6ZGliNVphdmlSQ3Zaak56bkMrT1duYWpGMHBpeUZWRGlCYndVQTlkRkR5SApuaHlwRWlZVnRhWGFybm9FbEVkZmlYMGI2ZXJHWFYySWNVMDhWQU9wVGkveUZYQWprdDI5SytKVDVzN3lQVytGCklzcnZPRWJnQ3VFZmIvR1p4NmdETSsxMmlzSW9wTWdYb2dTeFRBeUZBb0dCQU5YUzJubFA1bk9TL2RQRllZV1UKNXdBcmUzSmc3TGIvZldjTXo1Zk1NRVZJNDcySkNQWGw3dnJDb1N2emtzQUtRT3IyVFk2cFdWdWNYeEt2YTdaYQoyZGpob0Rhc3gyeGJwRFFWSmJSS1FUUFJ2eWZpbUFjNFFPWi8vZzh2MUpWeUdaaUw3MThXbTh2WHpCSUJ1TzZuCnVOSHFyK1U1L3VkVEJZZUpxRks4VUhUaEFvR0JBTURpdkl0dGpJMGhhcFNReG5DMEpYcE1jY20xUElsSjJRekoKQUV5U1FITFFoaGtkcnRSQVdqaUUzUHFKaXh3bmQrMUZXcTB1NFhnU0duaDNkVkwvQjJhdjRVdUNxWjRVeU9HWQpDbklGQ2V2K3lwY2lWKzNjY1MrVGRKMnRWczFKZ2dzT2VUOUlONmIzOXFrN0tRZ2xYWFVTWStKcWUxZ0I2NlpiCkN4VTkvNlA5QW9HQkFNUDI0NWRKQ2dsc0dwQ3pjZVR3bFd3MUdmTUdhN0V3QWd4ZXNFQVd3NmE3d2lENVhDcDMKVHo3eE9YdzM4TyswQUNucEZTMXBpNTM4eWZCKzF5aFhtZU8wOUhTdUpsOEVYVlpRR0VNWUR2Q0M5OHVGb0xzWgprcHEwallqQlpzV0JScE9XaXVPdkQrU3pQa0xtNU10cytUemdQWURqWjVEamw2SExHQ3dDNjFKSQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
kind: Secret
metadata:
name: sha-agent-tls-cert
namespace: nsx-system
type: kubernetes.io/tls
---
apiVersion: apps/v1
kind: Deployment
metadata:
# VMware NSX Container Plugin
name: nsx-ncp
namespace: nsx-system
labels:
tier: nsx-networking
component: nsx-ncp
version: v1
spec:
# Active-Standby is supported from NCP 2.4.0 release,
# so replica can be more than 1 if NCP HA is activated.
# replica *must be* 1 if NCP HA is deactivated.
selector:
matchLabels:
tier: nsx-networking
component: nsx-ncp
version: v1
replicas: 1
template:
metadata:
labels:
tier: nsx-networking
component: nsx-ncp
version: v1
# annotations:
# prometheus.io/scrape: "true"
# prometheus.io/port: "8001"
spec:
# NCP shares the host management network.
hostNetwork: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
# If configured with ServiceAccount, update the ServiceAccount
# name below.
serviceAccountName: ncp-svc-account
# podAntiAffinity could ensure that NCP replicas are not be co-located
# on a single node
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: component
operator: In
values:
- nsx-ncp
- key: tier
operator: In
values:
- nsx-networking
topologyKey: "kubernetes.io/hostname"
containers:
- name: nsx-ncp
# Docker image for NCP
image: nsx-ncp
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- AUDIT_WRITE
env:
- name: NCP_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NCP_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
livenessProbe:
exec:
command:
- /bin/sh
- -c
- check_pod_liveness nsx-ncp 30
initialDelaySeconds: 5
timeoutSeconds: 30
periodSeconds: 10
failureThreshold: 5
volumeMounts:
- name: projected-volume
mountPath: /etc/nsx-ujo
readOnly: true
- name: sha-agent
image: nsx-ncp
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /etc/nsx-ujo
name: projected-volume
readOnly: true
- mountPath: /var/log/nsx-ujo
name: host-var-log-ujo
- mountPath: /cert/
name: sha-agent-cert
- mountPath: /etc/sha
name: config-volume
command: ["start_sha"]
volumes:
- name: host-var-log-ujo
hostPath:
path: /var/log/nsx-ujo
type: DirectoryOrCreate
- name: sha-agent-cert
projected:
defaultMode: 420
sources:
- secret:
items:
- key: tls.key
path: tls.key
- key: tls.crt
path: tls.crt
- key: ca.crt
path: ca.crt
name: sha-agent-tls-cert
- secret:
items:
- key: tls.crt
path: nsx-cert/tls.crt
- key: tls.key
path: nsx-cert/tls.key
name: nsx-secret
- configMap:
defaultMode: 420
name: sha-agent-config
name: config-volume
- name: projected-volume
projected:
sources:
# ConfigMap nsx-ncp-config is expected to supply ncp.ini
- configMap:
name: nsx-ncp-config
items:
- key: ncp.ini
path: ncp.ini
# To use cert based auth, uncomment and update the secretName,
# then update ncp.ini with the mounted cert and key file paths
#- secret:
# name: nsx-secret
# items:
# - key: tls.crt
# path: nsx-cert/tls.crt
# - key: tls.key
# path: nsx-cert/tls.key
#- secret:
# name: lb-secret
# items:
# - key: tls.crt
# path: lb-cert/tls.crt
# - key: tls.key
# path: lb-cert/tls.key
# To use JWT based auth, uncomment and update the secretName.
#- secret:
# name: wcp-cluster-credentials
# items:
# - key: username
# path: vc/username
# - key: password
# path: vc/password
步骤 2:启动操作手册调用并获取结果
在 NCP 中,我们使用 sha-agent CLI 通过在 sha-agent 容器中运行 CLI 命令来执行相关的操作手册操作。在 sha-agent 容器中,
sha-appctl
负责执行操作手册命令。当您遇到 Pod 停滞在挂起状态的问题时,需要先获取 Pod 的名称和命名空间。然后,使用以下命令启动操作手册调用:kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c start_invocation --runbook NCPPendingPod --pod_ns POD_NS --pod_name POD_NAME
该命令将输出一个调用 ID 以跟踪调用状态。您可以使用该 ID 通过以下命令获取调用结果:
kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c get_invocation_result --invocation INVOCATION_ID
您可以在命令输出中获取 json 格式的操作手册结果,以查看调试结果。