others-prepare for cks exam with me 6: Seccomp in Linux, Docker and Kubernetes

1. Purpose

In this post, I would continue to write about preparing for the CKS (Certified Kubernetes Security Specialist) exam. I would write my own notes about the exam, and you can refer to these articles to prepare your own.

List of the series of posts:

-prepare for cks exam with me 1: Linux user and group management

-prepare for cks exam with me 2: Linux ssh hardening

-prepare for cks exam with me 3: Linux remove obsolete packages and services

-prepare for cks exam with me 4: Linux kernal hardening

-prepare for cks exam with me 5: Linux UFW(Uncomplicated firewall)

-prepare for cks exam with me 6: Seccomp in Linux, Docker and Kubernetes

-prepare for cks exam with me 7: Apparmor in Linux, Docker and Kubernetes

-prepare for cks exam with me 8: Security context in Kubernetes

-prepare for cks exam with me 9: Admission controllers in Kubernetes

-prepare for cks exam with me 10: Pod security policy in Kubernetes

-prepare for cks exam with me 11: Open policy agent in Kubernetes

-prepare for cks exam with me 12: Secrets in Kubernetes

-prepare for cks exam with me 13: Container runtimes(gvisor/kata containers) in Kubernetes

-prepare for cks exam with me 14: Container Image security in Docker and Kubernetes

-prepare for cks exam with me 15: How to print docker images of all pods in kubernetes

2. Environment

  • CKS
  • Ubuntu System

3. Seccomp

3.1 What is Seccomp?

Seccomp stands for securing component

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a “secure” state where it cannot make any system calls except exit, sigreturn, read and write to already-open file descriptors

A9D03260-85AC-4783-B7C9-278D43523414

The idea behind seccomp is very simple. The above picture shows it. First, the process should set the seccomp policy to strict or filter mode. This will cause the kernel set the seccomp flag in task_struct and if the process sets the filter mode, the kernel will add the program to a seccomp filter list in task_struct. Later for every system call the process made, the kernel will check that based the seccomp filter.

Caution: Setcomp is only available in linux systems version greater than 2.6.12.

3.2 Seccomp in linux

Say if we have a process which PID is 1, then we can check its seccomp status by running this command:

$ grep -i seccomp /proc/1/status
Seccomp:    2

What does the above ‘2’ means?

There are three modes in seccomp:

* 0 disabled
* 1 restricted, can only access a few fixed system calls
* 2 filtered, can only access the specified system call

3.3 Seccomp in docker

We can set custom seccomp settings when running a docker container:

$ docker run -it —rm —security-opt seccomp=/root/custom.json docker/whalesay /bin/sh

Or if we want to allow the docker container to call any system calls, we can do this:

$ docker run -it —rm —security-opt seccomp=unconfined docker/whalesay /bin/sh

3.4 Seccomp in kubernetes

If you want to set seccomp for a POD in kubernetes, you need to add securityContext/seccompProfile settings in yaml:

apiVersion: v1
kind: Pod
metadata:
  name: audit-pod
  labels:
    app: audit-pod
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: test-container
    image: hashicorp/http-echo:0.2.3
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

the type can be specified as:

  • RuntimeDefault uses the default seccomp setting of the docker environment
  • Unconfined is not restricted
  • Localhost uses a customized seccomp configuration file json,generally used in conjunction with localhostProfile

If you want to customize seccomp parameters, you can do as follows:

  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/audit.json

The above localhostProfile must be a relative path on the host, relative to the default seccomp file location: /var/lib/kubelet/seccomp, that is, there should exist the following file in the host:

$ mkdir /var/lib/kubelet/seccomp/profiles/
$ cd /var/lib/kubelet/seccomp/profiles
$ touch audit.json

For the content of audit.json, there are three types:

Content of audit.json #1: Allow every syscall
{
    "defaultAction": "SCMP_ACT_LOG"
}

This SCMP_ACT_LOG indicates that the seccomp filter will have no effect on the thread calling the syscall if it does not match any of the configured seccomp filter rules but the syscall will be logged.

Content of audit.json #2: Deny every syscall
{
    "defaultAction": "SCMP_ACT_ERRNO"
}

This SCMP_ACT_ERRNO indicates that the thread will receive a return value of errno when it calls a syscall that does not match any of the configured seccomp filter rules.

Content of audit.json #3: Filter the syscalls
{
    "defaultAction": "SCMP_ACT_ERRNO",
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "names": [
                "accept4",
                "epoll_wait",
                "pselect6",
                "futex",
                "madvise",
                "epoll_ctl",
                "getsockname",
                "setsockopt",
                "vfork",
                "mmap",
                "read",
                "poll",
                "recvfrom",
                "sendto",
                "set_tid_address",
                "setitimer",
                "writev"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}

The above configuration is a white-list , only syscalls listed in the names are allowed, others are denied.

And you can view the logs of syscalls by running:

$ grep syscall /var/log/syslog

4. Summary

In this post, I write some examples about how to do linux syscall hardening by seccomp when using linux operating systems.