Troubleshooting

This topic is a list of troubleshooting steps to use in case there are issues when using
AKO
.

AKO
Pod Does Not Run

To check why the pod is not running, do the following:
kubectl get pods -n avi-system NAME READY STATUS RESTARTS AGE ako-f776577b-5zpxh 0/1 ImagePullBackOff 0 15s
Ensure that:
  • Your Docker registry is optimally configured.
  • The image is configured locally.

AKO
Pod Automatically Restarts and Enters
Crashloopbackoff
State

  1. Check the
    AKO
    logs for any invalid input. If an invalid input is detected,
    AKO
    will reboot and retry.
  2. Check the connectivity between
    AKO
    Pod and the
    Avi Load Balancer Controller
    .

AKO
Does Not Respond to the Ingress Object Creations

Look into the
AKO
container logs and see if you find a reason on why the sync is deactivated like this:
2020-06-26T10:27:26.032+0530 INFO lib/lib.go:56 Setting AKOUser: ako-my-cluster for Avi Objects 2020-06-26T10:27:26.337+0530 ERROR cache/controller_obj_cache.go:1814 Required param networkName not specified, syncing will be disabled. 2020-06-26T10:27:26.337+0530 WARN cache/controller_obj_cache.go:1770 Invalid input detected, syncing will be disabled.

Ingress Object Does Not Sync in
VMware Avi Load Balancer

  1. The ingress class is set as something other than
    avi
    . The
    defaultIngController
    parameter is set to
    True
    .
  2. For TLS ingress, the Secret object does not exist. Ensure that the Secret object is pre-created.
  3. Check the connectivity between your
    AKO
    Pod and the
    Avi Load Balancer Controller
    .

Virtual Service Returns The Message CONNECTION REFUSED After Sometime

This is generally due to a duplicate IP in use in the network.

Virtual Service Settings Changed Directly on the
Avi Load Balancer Controller
is Overwritten

It is not recommended to change the properties of a virtual service by
AKO
, outside of
AKO
. If
AKO
has an ingress update that is related to this shared virtual service, then
AKO
will overwrite the configuration.

Static Routes are Populated, but the Pools are Down

Check if you have a dual network interface card (NIC) Kubernetes worker node setup. In case of a dual NIC setup,
AKO
would populate the static routes using the default gateway network. However, the default gateway network might not be the port group network that you want to use as the data network. Hence, the service engines might not be able to reach the pod CIDRs using the default gateway network.If it is not possible to make your data networks routable through the default gateway,
disableStaticRoute
sync in
AKO
and edit your static routes with the correct network.

AKO
Pod Restart is a loop due to an
AKO
version mismatch from the
AKO
config file

If there is a mismatch in the Controller Version Field in the
AKO
config file, the
AKO
Pod event will display the un-supported version logs. It is because the
AKO
version is outside the scope of support for min_version and max_version.
The version mismatch is causing the
AKO
to restart. Each restart consumes memory allocated to the POD. During each bootup,
AKO
initializes several objects and then quickly enters a restart loop. The GO garbage collector fails to reclaim memory before the POD runs out of memory.
AKO
prints error messages as part of
AKO
events, making it unnecessary to always refer to the logs. An early assessment of the events and logs can help avoid out-of-memory (OOM) issues.
For best pratices, see Compatibility Matrix.

Helm Install Throws a would violate PodSecurity Warning

Check if the
securityContext
is set correctly in
values.yaml
. The following is a sample
securityContext
:
securityContext: runAsNonRoot: true runAsGroup: 1000 readOnlyRootFilesystem: false runAsUser: 1000 seccompProfile: type: 'RuntimeDefault' allowPrivilegeEscalation: false capabilities: drop: - ALL
To select the best suitable configuration, see SecurityContext v1 core.

Log Collection

For every log collection, collect the following information:
  1. What kubernetes distribution are you using? For example, RKE, PKS, and so on.
  2. What is the CNI you are using with versions? For example, Calico v3.15.
  3. What is the
    Avi Load Balancer Controller
    version you are using? For example,
    VMware Avi Load Balancer
    version 18.2.8

Collecting
AKO
Logs

To collect the logs, use the log_collector.py script and collect all relevant information for the AKO pod. To download the script, go to https://github.com/avinetworks/devops/blob/master/tools/ako/log_collector.py.
To run the script, open an SSH terminal and run the following command:
root@vm:/var/# python3 log_collector.py --help
. The output provides usage information on how to run the script.
The script does the following:
  1. Collects the log file of
    AKO
    pod.
  2. Collects the
    configmap
    in a yaml file.
  3. Zips the folder and returns.
The following three cases are considered for log collection:
  1. A running
    AKO
    pod logging into a Persistent Volume Claim, in this case the logs are collected from the PVC that the pod uses.
  2. A running
    AKO
    pod logging into console, in this case the logs are collected from the pod directly.
  3. A dead
    AKO
    pod that uses a Persistent Volume Claim, in this case a backup pod is created with the same PVC attached to the
    AKO
    pod and the logs are collected from it.

Configuring PVC for the
AKO
Pod

It is recommended to use a Persistent Volume Claim for the
AKO
pod.
For more information on creating a persistent volume (PV) and a Persistent Volume Claim (PVC), see Configure a Pod to Use a Persistent Volume for Storage.
This is an example of hostpath persistent volume. Use the PV based on the storage class of your kubernetes environment.
  1. To create persistent volume,
    #persistent-volume.yaml apiVersion: v1 kind: PersistentVolume metadata: name: ako-pv namespace : avi-system labels: type: local spec: storageClassName: manual capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: <any-host-path-dir> # make sure that the directory exists
    A persistent volume claim can be created using the following:
    #persistent-volume-claim.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ako-pvc namespace : avi-system spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
  2. Add PVC name into the
    ako/helm/ako/values.yaml
    before the creation of the
    AKO
    pod as shown below:
    persistentVolumeClaim: ako-pvc mountPath: /log logFile: avi.log

Using the Script for
AKO

  1. Use case 1
    With PVC, (Mandatory) – akoNamespace (-ako) : The namespace in which the
    AKO
    pod is present.
    python3 log_collections.py -ako avi-system
  2. Use case 2
    Without PVC (Optional) –since (-s) : time duration from present time for logs.
    python3 log_collections.py -ako avi-system -s 24h
Sample Run:
At each stage of execution, the commands being executed are logged on the screen. The results are stored in a zip file with the format below:
`ako-<helmchart name>-<current time>
Sample Output with PVC:
2020-06-25 13:20:37,141 - ******************** AKO ******************** 2020-06-25 13:20:37,141 - For AKO : helm list -n avi-system 2020-06-25 13:20:38,974 - kubectl get pod -n avi-system -l app.kubernetes.io/instance=my-ako-release 2020-06-25 13:20:41,850 - kubectl describe pod ako-56887bd5b7-c2t6n -n avi-system 2020-06-25 13:20:44,019 - helm get all my-ako-release -n avi-system 2020-06-25 13:20:46,360 - PVC name is my-pvc 2020-06-25 13:20:46,361 - PVC mount point found - /log 2020-06-25 13:20:46,361 - Log file name is avi.log 2020-06-25 13:20:46,362 - Creating directory ako-my-ako-release-2020-06-25-132046 2020-06-25 13:20:46,373 - kubectl cp avi-system/ako-56887bd5b7-c2t6n:log/avi.log ako-my-ako-release-2020-06-25-132046/ako.log 2020-06-25 13:21:02,098 - kubectl get cm -n avi-system -o yaml > ako-my-ako-release-2020-06-25-132046/config-map.yaml 2020-06-25 13:21:03,495 - Zipping directory ako-my-ako-release-2020-06-25-132046 2020-06-25 13:21:03,525 - Clean up: rm -r ako-my-ako-release-2020-06-25-132046 Success, Logs zipped into ako-my-ako-release-2020-06-25-132046.zip

OpenShift Route Objects did not Sync with
VMware Avi Load Balancer

This could be due to different reasons. Some common issues are as follows:
  1. The problem is for all routes.
    Some configuration parameter is missing. Check for logs like
    Invalid input detected, syncing will be deactivated
    . Make the necessary changes in the configuration by checking the logs and restarting
    AKO
    .
  2. Some routes are not getting handled in
    AKO
    .
    Check if the sub-domain of the route is valid as per
    Avi Load Balancer Controller
    configuration,
    Didn't find match for hostname :foo.abc.com Available sub-domains:avi.internal
    .
  3. The problem is faced by one or few routes.
    Check for the status of the route. If you see the message
    MultipleBackendsWithSameServiceError
    , the same service has been added multiple times in the backend. This configuration is incorrect and the route configuration has to be changed.
  4. The route which is not getting synced, is a secure route with edge/re-encrypt termination
    Check if both the key and the certificate are specified in the route spec. If either of these keys are missing,
    AKO
    would not sync the route.

How do I debug an issue in
AKO
in EVH mode as
VMware Avi Load Balancer
object names are encoded?

Even though the EVH objects are encoded,
AKO
labels each EVH object on the controller with a set of key/values that act as metadata for the object. These markers can be used to know, the corresponding Kubernetes/OpenShift identifiers for the object. For more information on the list of markers associated with each object, see Markers.

The Policy Defined in the CRD Policy was not Applied to the Corresponding Ingress/Route Objects

  1. Make sure that the policy object being referred by the CRD is present in
    VMware Avi Load Balancer
    .
  2. Ensure that connectivity between the
    AKO
    pod and the
    Avi Load Balancer Controller
    is intact. For example, if the Controller is rebooting, connectivity might go down and cause this issue.

The service is annotated with
“nodeportlocal.antrea.io/enabled”: “true”
, but the backend Pod is not getting annotated with
nodeportlocal.antrea.io
.

Check the version of Antrea being used in the cluster. If the version of Antrea is less than 1.2.0, then in the Pod definition, container ports must be mentioned which matches with target port of the Service. For example, if we have the following Service,
apiVersion: v1 kind: Service metadata: labels: svc: avisvc1 name: avisvc1 spec: ports: - name: 8080-tcp port: 8080 protocol: TCP targetPort: 8080 selector: app: dep1
The following pod will not be annotated with the NPL annotation:
apiVersion: v1 kind: Pod metadata: labels: app: dep1 name: pod1 namespace: default spec: containers: - image: avinetworks/server-os name: dep1
Instead, use the following Pod definition:
apiVersion: v1 kind: Pod metadata: labels: app: dep1 name: pod1 namespace: default spec: containers: - image: avinetworks/server-os name: dep1 ports: - containerPort: 8080 protocol: TCP
This restriction is removed in Antrea version 1.2.0.