This is the second part of the Docker security article. In this part, we will look at the processes of granting and revoking permissions for users and applications in Docker. It is important to know how to properly manage access to ensure the maximum level of security of your containers and protect them from unauthorized access.
By default, Docker starts containers with a certain standard set of permissions. The table provides a list of these permissions.
These permissions allow the container to perform certain operations in the context of the system kernel (Docker Host). A current list of all permissions is in the documentation.
For example, by default every container is granted the NET_RAW privilege. This means that from the container level we can send, among other things, an ICMP packet using the ping or traceroute command (Listing 13).
# commands to be executed in the context of Docker Host docker run -it --rm --name ubuntu-ping ubuntu:22.04 bash # commands to be executed in the context of ubuntu-ping containter apt update && apt install -y iputils-ping ping -c 3 8.8.8.8
Listing 13. Starting the container, installing packages, and executing the ping command.
A simple task. The result corresponds to expectations (Fig. 21).
Now we’ll run the container a second time, but this time with the NET_RAW capability turned off, that is, with the –cap-drop=CAP_NET_RAW flag (Listing 14).
# Command to be executed in the context of Docker Host. docker run -it --rm --cap-drop=CAP_NET_RAW --name ubuntu-ping ubuntu:22.04 bash # Commands to be executed in the context of the ubuntu-ping container. apt update && apt install -y iputils-ping ping -c 3 8.8.8.8
Listing 14. Restarting the container, this time with the permissions revoked.
Again, everything went according to plan (Fig. 22, Fig. 23). We have removed the permissions required to send, in particular, ICMP packets from the container.
We don’t need to manually determine which container permissions to revoke. We may revoke all permissions. This is done using the ALL flag, which means that if we want to drop all permissions, we can add the –cap-drop=ALL flag (Listing 15, Figure 24).
docker run -it --rm --cap-drop=ALL --name ubuntu-drop-all-cap ubuntu:22.04 bash
Listing 15. Running a container with all permissions revoked.
We should be especially careful if we notice constructs that use the –cap-addflag – the opposite of –cap-drop, which is used to grant permissions. As a reminder, Table #1 only shows the default permissions provided by Docker. The actual list is much larger. Our goal should be to limit container permissions, so granting additional permissions should be alarming.
Docker allows you to run containers in privileged mode. This is somewhat the opposite of the –cap-drop=ALL command, which gives the container virtually every possible privilege.
The easiest way to explain it is with an example. Let’s try to run two containers: one in privileged mode, the other in standard (default) mode. Then I’ll run a series of tests, including a quick scout to see what I can achieve at the container level with an elevated privilege set.
Before proceeding, make sure you disable the Linux namespace engine (/etc/docker/daemon.json file). Otherwise, after executing the command, you will see an error message: docker: Error response from daemon: privileged mode is incompatible with user namespaces. You must run the container in the host namespace when running privileged mode. Also don’t forget to restart the daemon: sudo systemctl restart docker.
# Commands to be executed in the context of Docker Host. docker run -itd --privileged --name ubuntu-privileged ubuntu:22.04 docker run -itd --name ubuntu-unprivileged ubuntu:22.04 docker exec -it ubuntu-unprivileged bash # Commands to be executed in the context of the ubuntu-unprivileged container. ls /dev ls /dev | grep sda exit # Command to be executed in the context of Docker Host. docker exec -it ubuntu-privileged bash # Commands to be executed in the context of the ubuntu-privileged container. ls /dev ls /dev | grep sda exit
Listing 16. Running containers in standard and privileged modes.
Do you see a significant difference (Figure 25)? When running a container without the privileged flag, we only have access to a limited list of devices in the /dev directory. The situation is quite different for the privileged container. How can a potential attacker take advantage of this? Let’s look for something more interesting. We should be especially interested in the presence of devices with sd* identifiers (in this case sda), which are most often understood as hard drives. Let’s see what else can be found (Fig. 26).
The /dev/mapper directory and its contents assume that we are dealing with LVM volumes (Logical Volume Manager) on the device. LVM allows you to create logical volumes that can be easily resized and moved between hard drives and partitions
By default, our container may not have the appropriate drivers needed to handle LVM. For this reason, we need to install them – apt install lvm2 (Figure 27).
After installing all the necessary tools, we can complete the exploration using the lvscan command (Figure 28).
We’ve just gained access to the Docker host files, which are only available to admin users!
We can even go further and use the chroot command to start executing commands directly in the context of the Docker host.
From a practical point of view, we have taken control of Docker Host (Figure 29)!
We need to be alert to more than just the –privileged option. Other constructs can also be unsafe, especially those using the –device option.
Docker’s –device option allows you to map devices from the host to the container. This is used when an application inside a container needs direct access to the physical hardware of the host system. This can apply to different types of devices, such as graphics processing units (GPUs), hard drives, printers, and other peripherals.
Using this option, containerized applications can interact with a specific device as if they were running directly on the host system. This is especially useful in cases where performance and access to specialized hardware functions are key.
If we use this construct to grant disk access, we will effectively achieve the same effect as we discussed in the context of the –privileged option.
Docker’s no-new-privileges option is used to control the granting of permissions in a container. If this option is set to true, processes in the container will not be able to obtain new permissions other than those granted to them at startup. This can help improve system security by limiting the possibility of privilege escalation by potentially malicious processes.
Let’s see how it works in practice. Let’s prepare the test environment. We will be using a bash shell executable to elevate privileges that is used by the setuid user root.
setuid is a permission flag in the Linux and Unix family of operating systems that allows programs to run with the permissions of another user, usually the root user. This means that when a program with the setuid flag is run, it runs with the permissions of the owner of the file, not the user who ran it.
Let’s prepare a simple Dockerfile (Listing 17).
FROM ubuntu:22.04 RUN cp /bin/bash /bin/givemeroot RUN chmod 4755 /bin/givemeroot RUN useradd -ms /bin/bash unprivilegeduser USER unprivilegeduser CMD ["/bin/bash"]
Listing 17. Docker file prepared to demonstrate privilege escalation.
The Dockerfile we prepared creates an image based on Ubuntu 22.04, copies it from /bin/bash to a new file /bin/givemeroot, and assigns setuid permissions to this file. This allows you to run it as root. unprivilegeduser A new user is then created with the default bash shell name. The user context is switched to unprivilegeduser (with the USER command), which means that all further instructions will be executed with the privileges of that user.
The time the file is saved on disk, in any directory. Remember to name this file correctly, which is Dockerfile (Listing 18, Figure 30).
mkdir test-priv-esc && cd test-priv-esc nano Dockerfile cat Dockerfile docker build -t ubuntu-setuid-escalation .
Listing 18. The process of building a new image.
Check time. Let’s create a new container using the image we just created. Let’s also check if our attempt to elevate the privilege was successful (Listing 19).
# Instruction to be executed in the context of Docker Host docker run --rm -it ubuntu-setuid-escalation bash # Instruction to be executed in the context of container id head -n 1 /etc/shadow /bin/givemeroot -p id head -n 1 /etc/shadow
Listing 19. Running a container using a prebuilt image.
Great (Figure 31)! Of course, “great” from the point of view of someone who wants to take control of a vulnerable container. We confirmed that we were able to elevate permissions to the root user level (in the context of the container).
Now let’s try to protect ourselves from such a possibility. We’ll run another container, but this time with an additional option, which is –security-opt=”no-new-privileges=true”(Listing 20).
# Commands to be executed in the context of the "old" container. exit exit # Command to be executed in the context of Docker Host. docker run --rm -it --security-opt="no-new-privileges=true" ubuntu-setuid-escalation bash # Commands to be executed in the context of the new container. id /bin/givemeroot -p id
Listing 20. Running a container with restrictions imposed.
As you can see (Fig. 32), this time the attempt to increase privileges failed!
Both –security-opt=”no-new-privileges=true” and –cap-drop=ALL increase the security of the Docker environment. However, they function differently and can be used together to provide an additional level of protection. In short, no-new-privileges prevents privilege escalation, while –cap-drop=ALL limits the permissions of the running container by cutting off all privileges.
During the process of securing Docker, one of the most important aspects is to limit the risk of privilege escalation on the Docker host that can occur due to possible errors in the Docker Daemon configuration. We have already mentioned this in several previous paragraphs. The Linux namespaces mechanism built into the Linux kernel is an effective tool for this purpose.
Namespaces in a Linux system allow you to isolate and separate system resources for processes, which in turn allows you to effectively protect the host system from the potentially harmful effects of applications running in containers. Each namespace functions as an independent layer, limiting visibility and access to system resources for processes running in it.
Docker, using Linux namespaces, gives each container its own isolated environment with specific network settings, file systems, users, and processes. All this allows containers to work in parallel without affecting each other, thus providing a high level of security and flexibility in managing the resources of the host machine.
Despite existing security measures, there are situations when applications in a container must be run as root. In the context of the container itself, this may be necessary and safe, but it is very important to ensure that these processes do not have root privileges on the Docker host. This is possible by configuring the Docker Daemon to run containers as unprivileged users in the context of the host system.
Let’s check out what it’s all about.
We will start the first container by entering the command docker run -itd –name ubuntu1 ubuntu:22.04.
We can access the system shell and check the list of running processes. To do this, you need to enter the following commands: docker exec -it ubuntu1 bashand ps -u (Figure 33).
As you can see, processes running in a Docker container run in the context of the root user. Although such a decision is not recommended, in some cases it is necessary. As mentioned earlier, there are processes that must work in this mode.
We will now use another Docker command, namely docker container top ubuntu1, to check how the processes running in the container map to the processes on the Docker host (Figure 34).
Our discovery, unfortunately, does not inspire much optimism. Processes running in a container on the Docker host also run in the context of the root user. This leads to the discovery of significant risks if a security vulnerability is discovered. If an attacker discovers a way to “break out” of the container, they can gain unauthorized access to the Docker host. So how can we minimize this risk? The answer is to apply a container isolation mechanism known as “remapping”.
To enable this mechanism, we need to use the userns-remap configuration option and save the corresponding value in the daemon.json file. Note that this file may not exist by default, so you may need to create it in the /etc/docker/daemon.json path. Docker Desktop users will be able to find the file in the $HOME/.docker/daemon.json path. The correct configuration is shown in Listing 21 (cat /etc/docker/daemon.json).
{ "userns-remap": "default" }
Listing 21. The contents of the daemon.json file
According to the Docker documentation, after setting userns-remap to default and restarting Docker, the system will automatically create a user named dockremap. Containers will run in their own context, not as root.
After restarting the Docker service ( sudo service docker restart ), it’s a good idea to check if the dockremap user was actually created and if the namespace configuration was saved in the Docker host configuration files. First of all, this concerns the /etc/subuid file (Fig. 35).
Everything in its place!
Now we’ll repeat the exercise of starting a container and running a few commands (Listing 22).
docker run -itd --name ubuntu1 ubuntu:22.04 docker exec -it ubuntu1 bash ps u exit docker container top ubuntu1
Listing 22. Restarting the container.
Again it seems that everything is in its place (Fig. 36)!
We started the ubuntu1container and then checked if the processes started in the container were still running in the user’s root context (inside the container). However, significant changes occurred after the docker container top ubuntu1 command was issued. We observe that now, after the changes, the container process is running on the Docker host in the context of the newly created unprivileged user dockeremap (Figure 37).
This configuration significantly limits the ability to elevate privileges on the Docker Host system.