Run Vulnerability Scan Job in Same namespace of workload¶
Overview¶
When user runs a workload with private managed registry image(eg. image from ECR, ACR) and user is not using ImagePullSecret
method to provide access to registry, then starboard operator has challenges to scan such workloads.
- Consider an example of ECR registry, there is one option available in which that user can associate IAM role to service account,
then workloads which are associated with this service account will get authorised to run with the image from that registry.
If user wants to get these images scanned using Starboard operator then currently we have only one way to do that.
User has to associate IAM role to starboard service account, so with when scan job run with starboard-operator
service
account, then Trivy will get appropriate permission to pull the image. To know more on how this mechanism works, please
refer to the documents ECR registry configuration, IAM role to service account, but, starboard cannot use permission
set on service account of workload.
Recently, there is one option added in Trivy plugin with Trivy fs command, In which Trivy scans the image which is cached on a node. And to do that scan job is scheduled on same node where workload is running, so that Trivy can use a cached image from a node. But, if we want to schedule these scan job on any node, then currently we dont have option to do that, coz image might not be available on that node. Also, starboard cannot attach imagePullSecret available on the workload pull the image. We also thought that when we have ImagePullSecret available on a workload, then we can use existing option of Trivy image scan with which we can scan workload. To do that, starboard operator creates another secret from existing ImagePullSecret so that registry credentials are provided to Trivy as Env var. But again, we cannot reuse the same ImagePullSecret available on the workload.
Solution¶
Consider there is an option given to enable running vulnerability scan jobs in the same namespace of workload. Operator detects it, so it can schedule and monitor scan jobs in same namespace where workload is running. And plugins will act accordingly to utilize the service account and ImagePullSecret available on the workload.
Example¶
Example 1¶
Consider starboard operator is running with Trivy image scan mode. And let's assume that there is an nginx
deployment in poc-ns
namespace. It is running with image 12344534.dkr.ecr.us-west-2.amazonaws.com/amazon/nginx:1.16
.
This deployment is running with service account poc-sa
, which is annotated with ARN: arn:aws:iam::<ACCOUNT_ID>:role/IAM_ROLE_NAME
---
apiVersion: v1
kind: Namespace
metadata:
name: poc-ns
---
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/IAM_ROLE_NAME
name: poc-sa
namespace: poc-ns
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
namespace: poc-ns
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
serviceAccountName: poc-sa
containers:
- name: nginx
image: 12344534.dkr.ecr.us-west-2.amazonaws.com/amazon/nginx:1.16
When a pod(
nginx-65b78bbbd4-nb5kl
) comes into running state from above deployment then pod will have these env var to get access to ECR registry:AWS_REGION
,AWS_ROLE_ARN
,AWS_WEB_IDENTITY_TOKEN_FILE
To scan the nginx
deployment, starboard-operator create following scan job in poc-ns
namespace. And starboard-operator
will monitor this job, and it will parse the result based on completion state of job. This job will run with same
service account(poc-sa
) of workload.
---
apiVersion: batch/v1
kind: Job
metadata:
name: scan-vulnerabilityreport-ab3134
namespace: poc-ns
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: poc-sa
restartPolicy: Never
containers:
# containers from pod spec returned from existing Trivy plugin
When a pod(
scan-vulnerabilityreport-ab3134-nfkst
) gets created from above job spec, then that pod will get injected with these env var which will help scanner to get access to registry image:AWS_REGION
,AWS_ROLE_ARN
,AWS_WEB_IDENTITY_TOKEN_FILE
Pod will get injected with respective env vars to get access to registry image and Trivy scanner will use these credentials to pull an image for scanning.
Example 2¶
Consider another example, in which we want to perform vulnerability scan using Trivy fs
command.
Deployment demo-nginx
is running in poc-ns
namespace. This deployment is running with image
example.registry.com/nginx:1.16
from private registry example.registry.com
. Registry credentials are stored in
ImagePullSecret private-registry
.
---
apiVersion: v1
kind: Namespace
metadata:
name: poc-ns
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: demo-nginx
name: demo-nginx
namespace: poc-ns
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
imagePullSecrets:
- name: private-registry
containers:
- name: nginx
image: example.registry.com/nginx:1.16
To scan the demo-nginx
deployment, starboard-operator create following scan job in poc-ns
namespace. And starboard-operator
will monitor job, and it will parse the result based on completion state of job.
---
apiVersion: batch/v1
kind: Job
metadata:
name: scan-vulnerabilityreport-ab3134
namespace: poc-ns
spec:
backoffLimit: 0
template:
spec:
# ImagePullSecret value will be copied from workload which we are scanning
imagePullSecrets:
- name: private-registry
restartPolicy: Never
volumes:
- name: scan-volume
emptyDir: { }
initContainers:
- name: trivy-get-binary
image: aquasec/trivy:0.19.2
command:
- cp
- -v
- /usr/local/bin/trivy
- /var/starboard/trivy
volumeMounts:
- name: scan-volume
mountPath: /var/starboard
- name: trivy-download-db
image: aquasec/trivy:0.19.2
command:
- /var/starboard/trivy
- --download-db-only
- --cache-dir
- /var/starboard/trivy-db
volumeMounts:
- name: scan-volume
mountPath: /var/starboard
containers:
- name: nginx
image: example.registry.com/nginx:1.16
imagePullPolicy: IfNotPresent
securityContext:
# Trivy must run as root, so we set UID here.
runAsUser: 0
command:
- /var/starboard/trivy
- --cache-dir
- /var/starboard/trivy-db
- fs
- --format
- json
- /
volumeMounts:
- name: scan-volume
mountPath: /var/starboard
If you observe in the job spec, this scan job will run in poc-ns
namespace and it is running with image
example.registry.com/nginx:1.16
. It is using ImagePullSecret private-registry
which is available in same namespace.
With this approach starboard operator will not have to worry about managing(create/delete) of secret required for scanning.
Notes¶
- There are some points to consider before using this option
- Scan jobs will run in different namespaces. This will create some activity in each namespace available in the cluster.
If we dont use this option then all scan jobs will only run in
starboard-operator
namespace, and user can see all activity confined to single namespace i.estarboard-operator
. - As we will run scan job with service account of workload and if there are some very strict PSP defined in the cluster then scan job will be blocked due to the PSP.
- Scan jobs will run in different namespaces. This will create some activity in each namespace available in the cluster.
If we dont use this option then all scan jobs will only run in