Troubleshooting the Trivy Operator
The Trivy Operator installs several Kubernetes resources into your Kubernetes cluster.
Here are the common steps to check whether the operator is running correctly and to troubleshoot common issues.
So in addition to this section, you might want to check issues, discussion forum, or Slack to see if someone from the community had similar problems before.
Also note that Trivy Operator is based on existing Aqua OSS project - [Starboard], and shares some of the design, principles and code with it. Existing content that relates to Starboard Operator might also be relevant for Trivy Operator, and Starboard's issues, discussion forum, or Slack might also be interesting to check.
In some cases you might want to refer to Starboard's Design documents
Installation
Make sure that the latest version of the Trivy Operator is installed. For this, have a look at the installation options.
For instance, if your are using the Helm deployment, you need to check the Helm Chart version deployed to your cluster. You can check the Helm Chart version with the following command:
helm list -n trivy-system
Operator Pod Not Running
The Trivy Operator will run a pod inside your cluster. If you have followed the installation guide, you will have installed the Operator to the trivy-system
.
Make sure that the pod is in the Running
status:
kubectl get pods -n trivy-system
This is how it will look if it is running okay:
NAMESPACE NAME READY STATUS RESTARTS AGE
trivy-system trivy-operator-6c9bd97d58-hsz4g 1/1 Running 5 (19m ago) 30h
If the pod is in Failed
, Pending
, or Unknown
check the events and the logs of the pod.
First, check the events, since they might be more descriptive of the problem. However, if the events do not give a clear reason why the pod cannot spin up, then you want to check the logs, which provide more detail.
kubectl describe pod <POD-NAME> -n trivy-system
To check the logs, use the following command:
kubectl logs deployment/trivy-operator -n trivy-system
If your pod is not running, try to look for errors as they can give an indication on the problem.
If there are too many logs messages, try deleting the Trivy pod and observe its behavior upon restarting. A new pod should spin up automatically after deleting the failed pod.
ImagePullBackOff or ErrImagePull
Check the status of the Trivy Operator pod running inside of your Kubernetes cluster. If the Status is ImagePullBackOff or ErrImagePull, it means that the Operator either
- tries to access the wrong image
- cannot pull the image from the registry
Make sure that you are providing the right resources upon installing the Trivy Operator.
CrashLoopBackOff
If your pod is in CrashLoopBackOff
, it is likely the case that the pod cannot be scheduled on the Kubernetes node that it is trying to schedule on.
In this case, you want to investigate further whether there is an issue with the node. It could for instance be the case that the node does not have sufficient resources.
Reconciliation Error
It could happen that the pod appears to be running normally but does not reconcile the resources inside of your Kubernetes cluster.
Check the logs for Reconciliation errors:
kubectl logs deployment/trivy-operator -n trivy-system
If this is the case, the Trivy Operator likely does not have the right configurations to access your resource.
Operator does not Create VulnerabilityReports
VulnerabilityReports are owned and controlled by the immediate Kubernetes workload. Every VulnerabilityReport of a pod is thus, linked to a ReplicaSet. In case the Trivy Operator does not create a VulnerabilityReport for your workloads, it could be that it is not monitoring the namespace that your workloads are running on.
An easy way to check this is by looking for the ClusterRoleBinding
for the Trivy Operator:
kubectl get ClusterRoleBinding | grep "trivy-operator"
Alternatively, you could use the kubectl-who-can
plugin by Aqua:
$ kubectl who-can list vulnerabilityreports
No subjects found with permissions to list vulnerabilityreports assigned through RoleBindings
CLUSTERROLEBINDING SUBJECT TYPE SA-NAMESPACE
cluster-admin system:masters Group
trivy-operator trivy-operator ServiceAccount trivy-system
system:controller:generic-garbage-collector generic-garbage-collector ServiceAccount kube-system
system:controller:namespace-controller namespace-controller ServiceAccount kube-system
system:controller:resourcequota-controller resourcequota-controller ServiceAccount kube-system
system:kube-controller-manager system:kube-controller-manager User
If the ClusterRoleBinding
does not exist, Trivy currently cannot monitor any namespace outside of the trivy-system
namespace.
For instance, if you are using the Helm Chart, you want to make sure to set the targetNamespace
to the namespace that you want the Operator to monitor.
The operator also could not be configured to scan the workload you are expecting. Check to make sure OPERATOR_TARGET_WORKLOADS
is set correctly in your configuration. This allows you to specify which workload types to be scanned.
For example, by default in the Helm Chart values, the following Kubernetes workloads are configured to be scanned
"pod,replicaset,replicationcontroller,statefulset,daemonset,cronjob,job"
.