Troubleshooting

This topic is a list of troubleshooting steps to use in case there are issues when using

AKO

AKO
Pod Does Not Run

To check why the pod is not running, do the following:

kubectl get pods -n avi-system
NAME                 READY   STATUS             RESTARTS   AGE
ako-f776577b-5zpxh   0/1     ImagePullBackOff   0          15s

Ensure that:

Your Docker registry is optimally configured.

The image is configured locally.

AKO
Pod Automatically Restarts and Enters `Crashloopbackoff` State

Check the

AKO

logs for any invalid input. If an invalid input is detected,

AKO

will reboot and retry.

Check the connectivity between

AKO

Pod and the

Avi Load Balancer Controller

AKO
Does Not Respond to the Ingress Object Creations

Look into the

AKO

container logs and see if you find a reason on why the sync is deactivated like this:

2020-06-26T10:27:26.032+0530	INFO	lib/lib.go:56	Setting AKOUser: ako-my-cluster for Avi Objects
2020-06-26T10:27:26.337+0530	ERROR	cache/controller_obj_cache.go:1814	Required param networkName not specified, syncing will be disabled.
2020-06-26T10:27:26.337+0530	WARN	cache/controller_obj_cache.go:1770	Invalid input detected, syncing will be disabled.

Ingress Object Does Not Sync in
VMware Avi Load Balancer

The ingress class is set as something other than

avi

. The defaultIngController parameter is set to

True

For TLS ingress, the Secret object does not exist. Ensure that the Secret object is pre-created.

Check the connectivity between your

AKO

Pod and the

Avi Load Balancer Controller

Virtual Service Returns The Message CONNECTION REFUSED After Sometime

This is generally due to a duplicate IP in use in the network.

Virtual Service Settings Changed Directly on the
Avi Load Balancer Controller
is Overwritten

It is not recommended to change the properties of a virtual service by

AKO

, outside of

AKO

. If

AKO

has an ingress update that is related to this shared virtual service, then

AKO

will overwrite the configuration.

Static Routes are Populated, but the Pools are Down

Check if you have a dual network interface card (NIC) Kubernetes worker node setup. In case of a dual NIC setup,

AKO

would populate the static routes using the default gateway network. However, the default gateway network might not be the port group network that you want to use as the data network. Hence, the service engines might not be able to reach the pod CIDRs using the default gateway network.If it is not possible to make your data networks routable through the default gateway, disableStaticRoute sync in

AKO

and edit your static routes with the correct network.

AKO
Pod Restart is a loop due to an
AKO
version mismatch from the
AKO
config file

If there is a mismatch in the Controller Version Field in the

AKO

config file, the

AKO

Pod event will display the un-supported version logs. It is because the

AKO

version is outside the scope of support for min_version and max_version.

The version mismatch is causing the

AKO

to restart. Each restart consumes memory allocated to the POD. During each bootup,

AKO

initializes several objects and then quickly enters a restart loop. The GO garbage collector fails to reclaim memory before the POD runs out of memory.

AKO

prints error messages as part of

AKO

events, making it unnecessary to always refer to the logs. An early assessment of the events and logs can help avoid out-of-memory (OOM) issues.

For best pratices, see Compatibility Matrix.

Helm Install Throws a would violate PodSecurity Warning

Check if the securityContext is set correctly in

values.yaml

. The following is a sample securityContext:

securityContext:
  runAsNonRoot: true
  runAsGroup: 1000
  readOnlyRootFilesystem: false
  runAsUser: 1000
  seccompProfile:
    type: 'RuntimeDefault'
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL

To select the best suitable configuration, see SecurityContext v1 core.

Log Collection

For every log collection, collect the following information:

What kubernetes distribution are you using? For example, RKE, PKS, and so on.

What is the CNI you are using with versions? For example, Calico v3.15.

What is the

Avi Load Balancer Controller

version you are using? For example,

VMware Avi Load Balancer

version 18.2.8

Collecting
AKO
Logs

To collect the logs, use the log_collector.py script and collect all relevant information for the AKO pod. To download the script, go to https://github.com/avinetworks/devops/blob/master/tools/ako/log_collector.py.

To run the script, open an SSH terminal and run the following command:

root@vm:/var/# python3 log_collector.py --help

. The output provides usage information on how to run the script.

The script does the following:

Collects the log file of

AKO

pod.

Collects the configmap in a yaml file.

Zips the folder and returns.

The following three cases are considered for log collection:

A running

AKO

pod logging into a Persistent Volume Claim, in this case the logs are collected from the PVC that the pod uses.

A running

AKO

pod logging into console, in this case the logs are collected from the pod directly.

A dead

AKO

pod that uses a Persistent Volume Claim, in this case a backup pod is created with the same PVC attached to the

AKO

pod and the logs are collected from it.

Configuring PVC for the
AKO
Pod

It is recommended to use a Persistent Volume Claim for the

AKO

pod.

For more information on creating a persistent volume (PV) and a Persistent Volume Claim (PVC), see Configure a Pod to Use a Persistent Volume for Storage.

This is an example of hostpath persistent volume. Use the PV based on the storage class of your kubernetes environment.

To create persistent volume,

#persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ako-pv
  namespace : avi-system
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: <any-host-path-dir> # make sure that the directory exists

A persistent volume claim can be created using the following:

#persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ako-pvc
  namespace : avi-system
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

Add PVC name into the

ako/helm/ako/values.yaml

before the creation of the

AKO

pod as shown below:

persistentVolumeClaim: ako-pvc
mountPath: /log 
logFile: avi.log

Using the Script for
AKO

Use case 1

With PVC, (Mandatory) – akoNamespace (-ako) : The namespace in which the

AKO

pod is present.

python3 log_collections.py -ako avi-system

Use case 2

Without PVC (Optional) –since (-s) : time duration from present time for logs.

python3 log_collections.py -ako avi-system -s 24h

Sample Run:

At each stage of execution, the commands being executed are logged on the screen. The results are stored in a zip file with the format below:

`ako-<helmchart name>-<current time>

Sample Output with PVC:

2020-06-25 13:20:37,141 - ******************** AKO ********************
2020-06-25 13:20:37,141 - For AKO : helm list -n avi-system
2020-06-25 13:20:38,974 - kubectl get pod -n avi-system -l app.kubernetes.io/instance=my-ako-release
2020-06-25 13:20:41,850 - kubectl describe pod ako-56887bd5b7-c2t6n -n avi-system
2020-06-25 13:20:44,019 - helm get all my-ako-release -n avi-system
2020-06-25 13:20:46,360 - PVC name is my-pvc
2020-06-25 13:20:46,361 - PVC mount point found - /log
2020-06-25 13:20:46,361 - Log file name is avi.log
2020-06-25 13:20:46,362 - Creating directory ako-my-ako-release-2020-06-25-132046
2020-06-25 13:20:46,373 - kubectl cp avi-system/ako-56887bd5b7-c2t6n:log/avi.log ako-my-ako-release-2020-06-25-132046/ako.log
2020-06-25 13:21:02,098 - kubectl get cm -n avi-system -o yaml > ako-my-ako-release-2020-06-25-132046/config-map.yaml
2020-06-25 13:21:03,495 - Zipping directory ako-my-ako-release-2020-06-25-132046
2020-06-25 13:21:03,525 - Clean up: rm -r ako-my-ako-release-2020-06-25-132046

Success, Logs zipped into ako-my-ako-release-2020-06-25-132046.zip

OpenShift Route Objects did not Sync with
VMware Avi Load Balancer

This could be due to different reasons. Some common issues are as follows:

The problem is for all routes.

Some configuration parameter is missing. Check for logs like

Invalid input detected, syncing will be deactivated

. Make the necessary changes in the configuration by checking the logs and restarting

AKO

Some routes are not getting handled in

AKO

Check if the sub-domain of the route is valid as per

Avi Load Balancer Controller

configuration,

Didn't find match for hostname :foo.abc.com Available sub-domains:avi.internal

The problem is faced by one or few routes.

Check for the status of the route. If you see the message

MultipleBackendsWithSameServiceError

, the same service has been added multiple times in the backend. This configuration is incorrect and the route configuration has to be changed.

The route which is not getting synced, is a secure route with edge/re-encrypt termination

Check if both the key and the certificate are specified in the route spec. If either of these keys are missing,

AKO

would not sync the route.

How do I debug an issue in
AKO
in EVH mode as
VMware Avi Load Balancer
object names are encoded?

Even though the EVH objects are encoded,

AKO

labels each EVH object on the controller with a set of key/values that act as metadata for the object. These markers can be used to know, the corresponding Kubernetes/OpenShift identifiers for the object. For more information on the list of markers associated with each object, see Markers.

The Policy Defined in the CRD Policy was not Applied to the Corresponding Ingress/Route Objects

Make sure that the policy object being referred by the CRD is present in

VMware Avi Load Balancer

Ensure that connectivity between the

AKO

pod and the

Avi Load Balancer Controller

is intact. For example, if the Controller is rebooting, connectivity might go down and cause this issue.

The service is annotated with
“nodeportlocal.antrea.io/enabled”: “true”
, but the backend Pod is not getting annotated with
nodeportlocal.antrea.io
.

Check the version of Antrea being used in the cluster. If the version of Antrea is less than 1.2.0, then in the Pod definition, container ports must be mentioned which matches with target port of the Service. For example, if we have the following Service,

apiVersion: v1
kind: Service
metadata:
  labels:
    svc: avisvc1
  name: avisvc1
spec:
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: dep1

The following pod will not be annotated with the NPL annotation:


apiVersion: v1
kind: Pod
metadata:
  labels:
    app: dep1
  name: pod1
  namespace: default
spec:
  containers:
  - image: avinetworks/server-os
    name: dep1

Instead, use the following Pod definition:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: dep1
  name: pod1
  namespace: default
spec:
  containers:
  - image: avinetworks/server-os
    name: dep1
    ports:
    - containerPort: 8080
      protocol: TCP

This restriction is removed in Antrea version 1.2.0.

Avi Kubernetes Operator 1.13

Kubernetes Events

Troubleshooting for GatewayAPI

Avi Kubernetes Operator 1.12

Kubernetes Events

Troubleshooting for GatewayAPI

Avi Kubernetes Operator 1.11

Kubernetes Events

Troubleshooting for GatewayAPI

Avi Kubernetes Operator 1.10

Kubernetes Events

Content feedback and comments

Avi Kubernetes Operator All Versions

Troubleshooting

AKO Pod Does Not Run

AKO Pod Automatically Restarts and Enters Crashloopbackoff State

AKO Does Not Respond to the Ingress Object Creations

Ingress Object Does Not Sync in VMware Avi Load Balancer

Virtual Service Returns The Message CONNECTION REFUSED After Sometime

Virtual Service Settings Changed Directly on the Avi Load Balancer Controller is Overwritten

Static Routes are Populated, but the Pools are Down

AKO Pod Restart is a loop due to an AKO version mismatch from the AKO config file

Helm Install Throws a would violate PodSecurity Warning

Log Collection

Collecting AKO Logs

Configuring PVC for the AKO Pod

Using the Script for AKO

OpenShift Route Objects did not Sync with VMware Avi Load Balancer

How do I debug an issue in AKO in EVH mode as VMware Avi Load Balancer object names are encoded?

The Policy Defined in the CRD Policy was not Applied to the Corresponding Ingress/Route Objects

The service is annotated with “nodeportlocal.antrea.io/enabled”: “true”, but the backend Pod is not getting annotated with nodeportlocal.antrea.io.

AKO
Pod Does Not Run

AKO
Pod Automatically Restarts and Enters `Crashloopbackoff` State

AKO
Does Not Respond to the Ingress Object Creations

Ingress Object Does Not Sync in
VMware Avi Load Balancer

Virtual Service Settings Changed Directly on the
Avi Load Balancer Controller
is Overwritten

AKO
Pod Restart is a loop due to an
AKO
version mismatch from the
AKO
config file

Collecting
AKO
Logs

Configuring PVC for the
AKO
Pod

Using the Script for
AKO

OpenShift Route Objects did not Sync with
VMware Avi Load Balancer

How do I debug an issue in
AKO
in EVH mode as
VMware Avi Load Balancer
object names are encoded?

The service is annotated with
“nodeportlocal.antrea.io/enabled”: “true”
, but the backend Pod is not getting annotated with
nodeportlocal.antrea.io
.