Continuing from our large-scale infrastructure security guide, we now transition into the next stage — which includes the actual hardening of the systems core. We will start by hardening the major control plane components that make all the important decisions in your environment. In this segment, we are going to go over how to block anonymous access for users, why default Kubernetes Secrets provide no true security benefit, and finally how to correctly implement data encryption in your cluster. Note: To get the full view of what we’ve covered so far, check out the first post as well.
The center of gravity for all operations within the System is the API Server. All operations pass through it. Leaving the API server’s configuration as the default setting is an enormous and unwarranted risk.
The first, most basic, and simultaneously extremely critical initial task is to completely disable anonymous access. Often, the API server is set up with anonymous access enabled on some endpoints (for example, health check endpoints) for unauthenticated users. Attackers take advantage of this type of configuration to perform reconnaissance and obtain version numbers of components without hindrance.
By adding one single flag to the API server’s configuration document:
# Фрагмент конфігурації /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
containers:
- command:
- kube-apiserver
- --anonymous-auth=false
After saving the file, kubelet will automatically restart the component with the new, stricter settings applied.
It is also crucial to ensure that all core components — the API server, etcd, and kubelet — are updated regularly, as developers continuously patch newly discovered zero-day vulnerabilities.
As soon as all “anonymous” user activity is removed from consideration, then Role Based Access Control (RBAC) is the main method for controlling access to your clusters. The only hard and fast rule to remember in RBAC is that of “least privilege.” Each Service Account and each Engineer must be granted just enough privileges to complete the tasks they are responsible for; no more.
For example, if a CI/CD pipeline requires updating only the image of containers for articles deployed on the Hack Your Mom Portal, then this pipeline cannot be given the ability to read Secrets or, worst case scenario, be granted the Cluster Admin role.
Auditing the roles and permissions assigned to engineers within a project can rapidly become unmanageable as a project grows. Tools such as the command line plugin kubectl-who-can are very useful when there is a need for rapid and transparent validation of what access rights have been assigned to an individual. This tool allows you to immediately determine who has the authority to take a specific action:
# Приклад перевірки того, хто може читати секрети у просторі імен production kubectl who-can get secrets -n production
To gain a greater appreciation for designing roles and role bindings correctly, you can find great value in reviewing the official RBAC documentation.
It’s very important to keep in mind that all of your clusters must have a “break-glass” or emergency access plan, commonly referred to as a break-glass strategy. In addition to being highly reliable, most of today’s large scale enterprise clusters have many third-party service integrations using OIDC protocol to connect with other IdP’s.
However, the unfortunate truth is that some of these third party services do crash from time to time. When an external service goes down, it will leave your cluster without any authentication process, which means when you need to access your cluster most (i.e., during a crisis), you won’t be able to access it at all due to loss of access.
As such, each cluster will generate a static admin certificate that has full admin privileges ahead of time. The certificate file needs to be stored in the safest possible place (for example, a physical safe on an encrypted flash drive). It can be used as a last resort, after all other forms of authentication fail.
The etcd database is the core of your infrastructure and contains all the configuration information, cluster state and the very sensitive secrets of your environment. How the default treatment of secrets may create some discomfort. The standard Secret resource does not actually provide you with cryptographic encryption. It merely encodes the data as base-64. Therefore anyone who has direct access to the etcd files on a physical server — or someone who unintentionally finds an etcd database backup in a publicly accessible Amazon S3 bucket — will have production database passwords readable in a simple text editor without needing anything more than that.
In order to address this architectural gap, data at rest encryption must be configured. This will ensure that regardless of whether a physical disk is removed from a server rack or etcd backups are completely compromised, sensitive data will remain unreadable.
There is one way to do this – the most secure, reliable and modern method – using the Key Management Service (KMS) v2 with an external hardware or software encryption provider. The best-in-class option for this purpose is HashiCorp Vault. When you configure your cluster to work with Vault, you never store your key encryption keys (KEKs) within the cluster itself; they reside in a totally independent, highly secure system. Your cluster only talks to Vault for the purposes of securely encrypting and/or decrypting data encryption keys (DEKs).
Below is a sample EncryptionConfiguration that specifies a basic local encryption using the AES-CBC algorithm – which should be sufficient for any environments you are working with while you continue to implement a full Vault integration.
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: primary-key
secret: <сгенерований-надійний-ключ-у-форматі-base64>
- identity: {}
A default setting of full communication between all pods, (i.e., no internal firewalls) was implemented in order to allow developers complete freedom of development. An unauthorized user could exploit a minor security flaw in one pod (a “test” pod), which would then allow them to scan the entire internal network for vulnerabilities, including those associated with a major Payment Gateway, in another pod, located at the opposite end of the system.
Network Policies address this design flaw as it relates to an Intelligent Internal Firewall for each Microservice.
Implementing Network Policies will require a strict transition from the Legacy Model of Implicit Trust to the New Zero-Trust Model.
In any secure environment, there are several steps to follow, however, the first and most important is implementing a Default-Deny Policy, which blocks ALL inbound and outbound traffic. When the Default-Deny Policy is implemented, engineers must create explicit, deliberate policies for all allowed connections.
Below is an example of a manifest file which creates a complete blockage of all network communications for a specific namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: secure-production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
As a result of being so restrictive when defining the manifest, there are now very precise connections established between each component. An example of this would be: ingress traffic is allowed to enter into the pods hosting the web server only from the Ingress controller; egress traffic from the web server may only exit to a specific pod hosting the database, and only through a specified port.
As one can imagine, developing these specific rules takes some time and effort, however by doing so, you will be able to guarantee that a breach of a single service will not cascade into a breach of multiple services throughout your entire infrastructure. Examples of more practical “allow” rules can also be found within the Network Policies Documentation.
The primary doors of the infrastructure are currently secured. All anonymous access to the API server has been disabled, all permissions are enforced based on the principle of least privilege, and an emergency access plan is available should there ever need to be immediate access to the infrastructure during a critical incident. All secrets are encrypted with symmetric cryptography to prevent them from being stored in plaintext, and the internal network is divided into sub-networks with zero-trust policies that will deny any unauthorized traffic flow.
However, locking down the control plane is only the first step in the process. This cluster was created to run applications — and application code is where attackers typically begin their attempts to attack a cluster.
In Part Three, we will transition our focus to workloads and examine container-based security. Specifically, we will explore how to use a minimalistic base image that includes only what is required, configure Security Contexts to limit the types of system calls that can be made, implement Admission Controllers (like OPA Gatekeeper) to prevent maliciously configured deployments, and establish continuous auditing and real-time threat detection with Falco. The technical aspects will become significantly more intense moving forward.