Author: Sascha Grunert, Red Hat
This blog post is about a new Kubernetes feature introduced in v1.22, which adds an additional security layer on top of the existing seccomp support. Seccomp is a security mechanism for Linux processes to filter system calls (syscalls) based on a set of defined rules. Applying seccomp profiles to containerized workloads is one of the key tasks when it comes to enhancing the security of the application deployment. Developers, site reliability engineers and infrastructure administrators have to work hand in hand to create, distribute and maintain the profiles over the applications life-cycle.
You can use the securityContext
field of Pods and their containers can be used to adjust security related configurations of the workload. Kubernetes introduced dedicated seccomp related API fields in this SecurityContext
with the graduation of seccomp to General Availability (GA) in v1.19.0. This enhancement allowed an easier way to specify if the whole pod or a specific container should run as:
-
Unconfined
: seccomp will not be enabled -
RuntimeDefault
: the container runtimes default profile will be used -
Localhost
: a node local profile will be applied, which is being referenced by a relative path to the seccomp profile root (<kubelet-root-dir>/seccomp
) of the kubelet
With the graduation of seccomp, nothing has changed from an overall security perspective, because Unconfined
is still the default. This is totally fine if you consider this from the upgrade path and backwards compatibility perspective of Kubernetes releases. But it also means that it is more likely that a workload runs without seccomp at all, which should be fixed in the long term.
SeccompDefault
to the rescue
Kubernetes v1.22.0 introduces a new kubelet feature gateSeccompDefault
, which has been added in alpha
state as every other new feature. This means that it is disabled by default and can be enabled manually for every single Kubernetes node.
What does the feature do? Well, it just changes the default seccomp profile fromUnconfined
to RuntimeDefault
. If not specified differently in the pod manifest, then the feature will add a higher set of security constraints by using the default profile of the container runtime. These profiles may differ between runtimes like CRI-O or containerd. They also differ for its used hardware architectures. But generally speaking, those default profiles allow a common amount of syscalls while blocking the more dangerous ones, which are unlikely or unsafe to be used in a containerized application.
Enabling the feature
Two kubelet configuration changes have to be made to enable the feature:
-
Enable the feature gate by setting the
SeccompDefault=true
via the command line (--feature-gates
) or the kubelet configuration file. -
Turn on the feature by enabling the feature by adding the
--seccomp-default
command line flag or via the kubelet configuration file (seccompDefault: true
).
The kubelet will error on startup if only one of the above steps have been done.
Trying it out
If the feature is enabled on a node, then you can create a new workload like this:
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container
image: nginx:1.21
Now it is possible to inspect the used seccomp profile by usingcrictl
while investigating the containers runtime specification:
CONTAINER_ID=$(sudo crictl ps -q --name=test-container)
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": ["_llseek", "_newselect", "accept", …, "write", "writev"],
"action": "SCMP_ACT_ALLOW"
},
…
]
}
You can see that the lower level container runtime (CRI-O andrunc in our case), successfully applied the default seccomp profile. This profile denies all syscalls per default, while allowing commonly used ones like accept
or write
.
Please note that the feature will not influence any Kubernetes API for now. Therefore, it is not possible to retrieve the used seccomp profile via kubectl
get
or describe
if the SeccompProfile
field is unset within theSecurityContext
.
The feature also works when using multiple containers within a pod, for example if you create a pod like this:
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container-nginx
image: nginx:1.21
securityContext:
seccompProfile:
type: Unconfined
- name: test-container-redis
image: redis:6.2
then you should see that the test-container-nginx
runs without a seccomp profile:
sudo crictl inspect $(sudo crictl ps -q --name=test-container-nginx) |
jq '.info.runtimeSpec.linux.seccomp == null'
true
Whereas the container test-container-redis
runs with RuntimeDefault
:
sudo crictl inspect $(sudo crictl ps -q --name=test-container-redis) |
jq '.info.runtimeSpec.linux.seccomp != null'
true
The same applies to the pod itself, which also runs with the default profile:
sudo crictl inspectp (sudo crictl pods -q --name test-pod) |
jq '.info.runtimeSpec.linux.seccomp != null'
true
Upgrade strategy
It is recommended to enable the feature in multiple steps, whereas different risks and mitigations exist for each one.
Feature gate enabling
Enabling the feature gate at the kubelet level will not turn on the feature, but will make it possible by using the SeccompDefault
kubelet configuration or the--seccomp-default
CLI flag. This can be done by an administrator for the whole cluster or only a set of nodes.
Testing the Application
If you're trying this within a dedicated test environment, you have to ensure that the application code does not trigger syscalls blocked by theRuntimeDefault
profile before enabling the feature on a node. This can be done by:
Recommended: Analyzing the code (manually or by running the application withstrace) for any executed syscalls which may be blocked by the default profiles. If that's the case, then you can override the default by explicitly setting the pod or container to run as
Unconfined
. Alternatively, you can create a custom seccomp profile (see optional step below). profile based on the default by adding the additional syscalls to the"action": "SCMP_ACT_ALLOW"
section.Recommended: Manually set the profile to the target workload and use a rolling upgrade to deploy into production. Rollback the deployment if the application does not work as intended.
Optional: Run the application against an end-to-end test suite to trigger all relevant code paths with
RuntimeDefault
enabled. If a test fails, use the same mitigation as mentioned above.Optional: Create a custom seccomp profile based on the default and change its default action from
SCMP_ACT_ERRNO
toSCMP_ACT_LOG
. This means that the seccomp filter for unknown syscalls will have no effect on the application at all, but the system logs will now indicate which syscalls may be blocked. This requires at least a Kernel version 4.14 as well as a recent runcrelease. Monitor the application hosts audit logs (defaults to/var/log/audit/audit.log
) or syslog entries (defaults to/var/log/syslog
) for syscalls viatype=SECCOMP
(for audit) ortype=1326
(for syslog). Compare the syscall ID with those listed in the Linux Kernel sources and add them to the custom profile. Be aware that custom audit policies may lead into missing syscalls, depending on the configuration of auditd.Optional: Use cluster additions like the Security Profiles Operatorfor profiling the application via its log enrichment capabilities or recording a profile by using its recording feature. This makes the above mentioned manual log investigation obsolete.
Deploying the modified application
Based on the outcome of the application tests, it may be required to change the application deployment by either specifying Unconfined
or a custom seccomp profile. This is not the case if the application works as intended withRuntimeDefault
.
Enable the kubelet configuration
If everything went well, then the feature is ready to be enabled by the kubelet configuration or its corresponding CLI flag. This should be done on a per-node basis to reduce the overall risk of missing a syscall during the investigations when running the application tests. If it's possible to monitor audit logs within the cluster, then it's recommended to do this for eventually missed seccomp events. If the application works as intended then the feature can be enabled for further nodes within the cluster.
Conclusion
Thank you for reading this blog post! I hope you enjoyed to see how the usage of seccomp profiles has been evolved in Kubernetes over the past releases as much as I do. On your own cluster, change the default seccomp profile toRuntimeDefault
(using this new feature) and see the security benefits, and, of course, feel free to reach out any time for feedback or questions.
Editor's note: If you have any questions or feedback about this blog post, feel free to reach out via the Kubernetes slack in #sig-node.
Top comments (0)