런타임에 NCP 디버깅
ODS(온라인 진단 시스템) 기능은 런타임에 NCP의 디버깅을 자동화합니다. ODS는 NCP에 내장된 런북을 통해 구현됩니다.
런북(runbook)에는 디버깅 절차가 포함되어 있습니다. 런북(runbook)은 디버깅 보고서를 생성합니다. 미리 정의된 런북(runbook)은 수정할 수 없습니다.
CLI 명령을 실행하여 다음 런북(runbook) 작업을 수행할 수 있습니다.
- 런북(runbook)을 호출하여 런북(runbook) 디버깅 시작
- 런북(runbook) 상태 확인
- 디버깅 보고서 가져오기
NCP 4.1.2부터 런북(runbook)
NCPPendingPod
를 사용할 수 있습니다. 이 런북은 보류 중 상태에서 중단된 포드를 디버깅하는 데 사용할 수 있습니다. nsx-node-agent/ncp/hyperbus
를 확인하여 이러한 문제 유형의 근본 원인을 파악합니다.런타임 시 디버깅 단계
1단계: NCP 포드에서 sha-agent 컨테이너 시작
기본적으로 NCP 포드에는 NCP 컨테이너가 하나만 있습니다. 런타임 디버깅을 수행해야 하는 경우 먼저 NCP 포드에서 sha-agent 컨테이너를 시작해야 합니다. yaml 파일에서 sha-agent 컨테이너를 시작하는 데 필요한 secret 및 config-map이 있습니다. sha-agent 컨테이너가 NCP 포드에 추가됩니다. yaml을 적용한 후 두 컨테이너(NCP 및 sha-agent)가 실행 중인지 확인합니다.
ncp-ods.yaml:
# Yaml template for NCP Deployment # Proper kubernetes API and NSX API parameters, and NCP Docker image # must be specified. # This yaml file is part of NCP 4.1.1.0 release. # This is a FAKE configmap, which only ensures that sha-agent could start up. apiVersion: v1 data: deployment_info.yml: "outer_collector:\n tsdb:\n # Access metrics instance via servicename.namespace\n endpoint: metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_namespace: nsx-system\nnorthstar_service_type: METRICS\norg_id: \ninstance_id:\nmanager_fqdn: metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_type: NTM\n" kind: ConfigMap metadata: name: sha-agent-config namespace: nsx-system --- # This is a FAKE secret, which only ensures that sha-agent could start up. apiVersion: v1 data: ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQkFBS0NBUUVBN2ZsRXN4b3dsV3VrVU1qdGVRZnUzYVlLOWJUb2h0Vi9YRVRmK1Bjd3hMUzlmSU9TCjFIeEgrUDNScFR5cEZkOVd4S3ZJNnVMeStkZzRVUzVrT1UwRENXNVhNOFlNK2xaNmN3R2poVkJucnpQSjZJQy8KazhMZXlWcGtUWldpT0Zpdkc3ZkN0dGZrbTBsMktNS2V3ZWNYY1BmQldzU2JsblVGUWZQZnJDZjRaaExsYnk1bApXd0w4ZjZIOG0vNXEvMGhCWERvWk4vOG5objZWS0lGQWd5VDhybGdaaXljNDg5bG9ReXRWVWowUWZndGtqeE9kCm5tQTRuUkJZQzd3c2Q2TEFid0VpQVNTdkFTM21scHV5MVhkMHlCQ3lQT2Z0Titoa2FIRjV1dzFFQ3E2ZUR4V00KWUxvYkdPVTd1V2JFKzNaLzRVN1FEeU9yS3lVdVhnUG1RSnlnRlFJREFRQUJBb0lCQVFDWXRSZHZzd093THJYdgpuVEErTldnRDFkUThuYzJGRUtXOHlQbk1vcHNwN3kyVkpEMXBteUw0VmJCZFQxTFZsVTd4djZhYmkrME5oTUdHCjNyVXp6QWFCMjh1YmpxQ3ZXQ1VWZmR5MzVNUFVPdkI3QVh0dVQyTjFaRXJ2T25FeHBUOGhFMGVnMjJONGZxaVQKT1doMDExMUVnY2dTL2cwMWZIeFdPUyswSXFZVW9SalJIcVFCZ0dkR3NqYmJNTmF1Skl2bitQb1BKM1ZjeVJ4cgorWHhPOHlidHlzUlV3dkxYb2dpTmc5UTRsUENEZmtBdkFWNHdqRGFWeFBHUWs3YmFOOVJzeEVtVnZpZFU4aFpmCnFMbGlXUUwrdFdvWk5zcTM3VlhBRk51R203c2dISFptb3dUcEw5b1BjZU9rNkhsa2g0OEVCQWJmZm9zd0l0c0sKMUM1WWFtakJBb0dCQVBibGl5QzZuc3lMQ01EcUR1OEdYTHo2WER5UktHdnhlbmZjRjEzUzZpR2QxenkwUTVRawprdmdoMDlpVktVM21GQ1BLZElRMENDUWIza0lVU2hxL3c1MGw3KzBtcm5MUDI5ZjBOWHRFdDhNYTdCM1pPbmpBCnlzZnNCOU9KUkkzeDFVV0hwUXhnV3N2Q1RES0sxN2xhTnRYbDNiUGtNdGpRUWNwM21MYlJSeUpSQW9HQkFQYS8KZ0U3aCtSdUp4U2ZSVXNNbEVEOVV6ZGliNVphdmlSQ3Zaak56bkMrT1duYWpGMHBpeUZWRGlCYndVQTlkRkR5SApuaHlwRWlZVnRhWGFybm9FbEVkZmlYMGI2ZXJHWFYySWNVMDhWQU9wVGkveUZYQWprdDI5SytKVDVzN3lQVytGCklzcnZPRWJnQ3VFZmIvR1p4NmdETSsxMmlzSW9wTWdYb2dTeFRBeUZBb0dCQU5YUzJubFA1bk9TL2RQRllZV1UKNXdBcmUzSmc3TGIvZldjTXo1Zk1NRVZJNDcySkNQWGw3dnJDb1N2emtzQUtRT3IyVFk2cFdWdWNYeEt2YTdaYQoyZGpob0Rhc3gyeGJwRFFWSmJSS1FUUFJ2eWZpbUFjNFFPWi8vZzh2MUpWeUdaaUw3MThXbTh2WHpCSUJ1TzZuCnVOSHFyK1U1L3VkVEJZZUpxRks4VUhUaEFvR0JBTURpdkl0dGpJMGhhcFNReG5DMEpYcE1jY20xUElsSjJRekoKQUV5U1FITFFoaGtkcnRSQVdqaUUzUHFKaXh3bmQrMUZXcTB1NFhnU0duaDNkVkwvQjJhdjRVdUNxWjRVeU9HWQpDbklGQ2V2K3lwY2lWKzNjY1MrVGRKMnRWczFKZ2dzT2VUOUlONmIzOXFrN0tRZ2xYWFVTWStKcWUxZ0I2NlpiCkN4VTkvNlA5QW9HQkFNUDI0NWRKQ2dsc0dwQ3pjZVR3bFd3MUdmTUdhN0V3QWd4ZXNFQVd3NmE3d2lENVhDcDMKVHo3eE9YdzM4TyswQUNucEZTMXBpNTM4eWZCKzF5aFhtZU8wOUhTdUpsOEVYVlpRR0VNWUR2Q0M5OHVGb0xzWgprcHEwallqQlpzV0JScE9XaXVPdkQrU3pQa0xtNU10cytUemdQWURqWjVEamw2SExHQ3dDNjFKSQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= kind: Secret metadata: name: sha-agent-tls-cert namespace: nsx-system type: kubernetes.io/tls --- apiVersion: apps/v1 kind: Deployment metadata: # VMware NSX Container Plugin name: nsx-ncp namespace: nsx-system labels: tier: nsx-networking component: nsx-ncp version: v1 spec: # Active-Standby is supported from NCP 2.4.0 release, # so replica can be more than 1 if NCP HA is activated. # replica *must be* 1 if NCP HA is deactivated. selector: matchLabels: tier: nsx-networking component: nsx-ncp version: v1 replicas: 1 template: metadata: labels: tier: nsx-networking component: nsx-ncp version: v1 # annotations: # prometheus.io/scrape: "true" # prometheus.io/port: "8001" spec: # NCP shares the host management network. hostNetwork: true tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule - key: node-role.kubernetes.io/control-plane effect: NoSchedule # If configured with ServiceAccount, update the ServiceAccount # name below. serviceAccountName: ncp-svc-account # podAntiAffinity could ensure that NCP replicas are not be co-located # on a single node affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: component operator: In values: - nsx-ncp - key: tier operator: In values: - nsx-networking topologyKey: "kubernetes.io/hostname" containers: - name: nsx-ncp # Docker image for NCP image: nsx-ncp imagePullPolicy: IfNotPresent securityContext: capabilities: add: - AUDIT_WRITE env: - name: NCP_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: NCP_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace livenessProbe: exec: command: - /bin/sh - -c - check_pod_liveness nsx-ncp 30 initialDelaySeconds: 5 timeoutSeconds: 30 periodSeconds: 10 failureThreshold: 5 volumeMounts: - name: projected-volume mountPath: /etc/nsx-ujo readOnly: true - name: sha-agent image: nsx-ncp imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /etc/nsx-ujo name: projected-volume readOnly: true - mountPath: /var/log/nsx-ujo name: host-var-log-ujo - mountPath: /cert/ name: sha-agent-cert - mountPath: /etc/sha name: config-volume command: ["start_sha"] volumes: - name: host-var-log-ujo hostPath: path: /var/log/nsx-ujo type: DirectoryOrCreate - name: sha-agent-cert projected: defaultMode: 420 sources: - secret: items: - key: tls.key path: tls.key - key: tls.crt path: tls.crt - key: ca.crt path: ca.crt name: sha-agent-tls-cert - secret: items: - key: tls.crt path: nsx-cert/tls.crt - key: tls.key path: nsx-cert/tls.key name: nsx-secret - configMap: defaultMode: 420 name: sha-agent-config name: config-volume - name: projected-volume projected: sources: # ConfigMap nsx-ncp-config is expected to supply ncp.ini - configMap: name: nsx-ncp-config items: - key: ncp.ini path: ncp.ini # To use cert based auth, uncomment and update the secretName, # then update ncp.ini with the mounted cert and key file paths #- secret: # name: nsx-secret # items: # - key: tls.crt # path: nsx-cert/tls.crt # - key: tls.key # path: nsx-cert/tls.key #- secret: # name: lb-secret # items: # - key: tls.crt # path: lb-cert/tls.crt # - key: tls.key # path: lb-cert/tls.key # To use JWT based auth, uncomment and update the secretName. #- secret: # name: wcp-cluster-credentials # items: # - key: username # path: vc/username # - key: password # path: vc/password
2단계: 런북(runbook) 호출 시작 및 결과 가져오기
NCP에서는 sha-agent CLI를 사용하여 sha-agent 컨테이너에서 CLI 명령을 실행하여 관련 런북(runbook) 작업을 수행합니다. sha-agent 컨테이너에서
sha-appctl
은 런북(runbook) 명령의 실행을 담당합니다. 포드가 보류 중 상태에서 중단된 문제가 표시되면 먼저 포드의 이름과 네임스페이스를 가져와야 합니다. 또한 다음 명령을 사용하여 런북(runbook) 호출을 시작합니다.kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c start_invocation --runbook NCPPendingPod --pod_ns POD_NS --pod_name POD_NAME
명령은 호출 ID를 출력하여 호출 상태를 추적합니다. ID를 사용하여 다음 명령을 사용하여 호출 결과를 가져올 수 있습니다.
kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c get_invocation_result --invocation INVOCATION_ID
명령 출력에서 런북(runbook) 결과를 json 형식으로 가져와 디버깅 결과를 볼 수 있습니다.