How to hack Google Kubernetes Engine (GKE)? Securing against GKE threats

A recent investigation by Unit 42 of Palo Alto Networks has uncovered a dual privilege escalation chain in Google Kubernetes Engine (GKE). This vulnerability, stemming from specific configurations in GKE’s logging agent FluentBit and Anthos Service Mesh (ASM), presents a significant security risk, potentially allowing attackers unauthorized access to Kubernetes clusters.

Kubernetes and GKE Overview: Kubernetes, the most widely adopted open-source container platform, is used for application deployment and management. GKE, Google’s Kubernetes Engine, offers additional features and capabilities, enhancing the deployment and management of Kubernetes clusters. However, the complexity of Kubernetes environments often makes them susceptible to security breaches due to misconfiguration and excessive privileges.

Issues in FluentBit and Anthos Service Mesh:

  • FluentBit: The default configuration of FluentBit, a lightweight log processor and forwarder, includes a volume mount that provides unnecessary access to the pod directory, including projected service account tokens.
  • Anthos Service Mesh (ASM): ASM’s Container Network Interface (CNI) DaemonSet retains excessive permissions post-installation, which can be exploited to create a new pod with elevated privileges.

FluentBit Flaw

The vulnerability described in the FluentBit container within a Kubernetes cluster is a significant issue. This vulnerability arises from the way FluentBit is configured to access volumes within the cluster. Let’s break down this vulnerability and its implications:

Understanding the Vulnerability

  1. FluentBit’s Volume Mount Configuration:
    • Misconfiguration: FluentBit is mounted with access to the /var/lib/kubelet/pods volume. This directory contains subdirectories for each pod running on a node.
    • Access to Sensitive Data: Within each pod’s directory, there is a kube-api-access volume that stores projected service account tokens. These tokens are used for authenticating with the Kubernetes API and are highly sensitive.
  2. Exploitation of the Misconfiguration:
    • Compromise of FluentBit: If an attacker gains access to the FluentBit container, they can exploit this misconfiguration.
    • Access to Tokens: The attacker can access any service account token of the pods on the same node.
    • Impersonation and Unauthorized Access: Using these tokens, the attacker can impersonate pods with varying levels of privileges, potentially gaining unauthorized access to the Kubernetes API server.
  3. Scope of the Attack:
    • Mapping the Cluster: The attacker could potentially list all running pods in the cluster (get pods command), allowing them to map the entire cluster.
    • Potential for Privilege Escalation: Depending on the permissions associated with the compromised tokens, the attacker could escalate their privileges within the cluster.
    • Harmful Actions: The attacker could perform various harmful actions, such as data theft, service disruption, or further exploitation of cluster resources.

The Role of the Sidecar Container

  • Functionality of Sidecar Container: In a typical Kubernetes setup, a sidecar container like FluentBit is used for log collection. It operates within the context of its pod, collecting, parsing, and forwarding logs from the main application container.
  • No Direct API Access Needed: The sidecar container generally doesn’t require direct access to the Kubernetes API server. It uses the Kubernetes infrastructure to access log files and container runtime metadata.

Anthos Service Mesh (ASM) Flaw

Imagine you are managing a Kubernetes cluster that utilizes Anthos Service Mesh (ASM) with Istio’s CNI plugin. The cluster hosts various applications critical to your organization.

Initial Setup

  • ASM Installation: During the setup of ASM, the Istio-cni-node DaemonSet is installed on the cluster.
  • DaemonSet’s Role: This DaemonSet is responsible for installing the Istio CNI plugin on each node. It also has a repair mode to handle misconfigured pods.

The Flaw

  • Excessive Permissions: After the installation, the Istio-cni-node DaemonSet retains high-level permissions, which are no longer necessary for its daily operation. This is where the flaw lies.

Exploitation Example

  1. Attacker’s Entry: An attacker, who already has limited access to the cluster (maybe as a low-privileged user), discovers the excessive permissions of the Istio-cni-node DaemonSet.
  2. Creating a Powerful Pod:
    • The attacker creates a new pod in the cluster, assigning it the same permissions as the Istio-cni-node DaemonSet. This is possible due to the excessive permissions that the DaemonSet still holds.
    • This new pod, which we can call a “powerful pod,” now has abilities far beyond what a regular pod should have.
  3. Misuse of Permissions:
    • The attacker uses the powerful pod to perform actions that are normally restricted, like accessing sensitive data or modifying critical configurations.
    • The pod could also manipulate other pods or services, disrupt operations, or even spread to other nodes, escalating the attack’s impact.
  4. Privilege Escalation:
    • Leveraging the capabilities of the powerful pod, the attacker escalates their privileges to that of a cluster administrator.
    • With admin-level access, they gain complete control over the Kubernetes cluster, leading to a severe security breach.

The Privilege Escalation Chain

The combination of these two issues can be exploited in a second-stage attack to gain full control of a Kubernetes cluster. The attack involves exploiting FluentBit permissions to read projected service account tokens and then leveraging ASM’s post-installation permissions to escalate privileges.

Let’s break down this attack chain to understand how an attacker could escalate privileges to become a cluster admin:

Step-by-Step Breakdown of the Attack Chain

1. Initial Access via FluentBit Container

  • Prerequisite: The attacker needs the Anthos Service Mesh feature to be enabled in the Kubernetes cluster.
  • Exploiting FluentBit: The attacker gains control of the FluentBit container. FluentBit, being a logging tool, often has extensive access within a cluster for log collection purposes.
  • Mounting Sensitive Volume: The attacker exploits FluentBit to mount the /var/lib/kubelet/pods volume, which contains the kube-api-access-<random-suffix> directory. This directory holds tokens from all pods on a node.

2. Token Harvesting Across the Cluster

  • Leveraging DaemonSet Nature of FluentBit: Since FluentBit runs as a DaemonSet (a pod on every node), the attacker replicates the initial compromise on each node.
  • Mapping the Cluster: By doing so, the attacker can access mounted tokens of other pods across the cluster.
  • Targeting Istio-Installer-container Token: Among these tokens, the attacker specifically looks for the Istio-Installer-container token.

3. Exploiting ASM CNI DaemonSet’s Permissions

  • Creating a New Pod: Utilizing the ASM CNI DaemonSet’s retained permissions, the attacker creates a new pod in the kube-system namespace.
  • Targeting a Powerful Service Account: The aim is to associate this pod with a service account that has extensive privileges.

4. Choosing the CRAC Service Account

  • Selecting CRAC: The ClusterRoleAggregationController (CRAC) service account is a prime target due to its ability to add permissions to cluster roles.
  • Updating Cluster Role: The attacker modifies the cluster role bound to the CRAC service account to gain full privileges.

5. Final Steps to Gain Cluster Admin Access

  • Mounting CRAC Token: The CRAC service account token is mounted onto the newly created pod.
  • Exploiting FluentBit Again: The attacker then exploits the FluentBit misconfiguration to extract the CRAC token from their pod.
  • Using CRAC Token: With the CRAC token, which has cluster admin permissions, the attacker can operate with full control over the Kubernetes cluster.

Google’s Response and Fixes:

Google addressed these configuration issues on December 14, 2023, with the release of GCP-2023-047. The fixes involved removing the /var/lib/kubelet/pod volume mount from the Fluent Bit pod and modifying ASM’s ClusterRole to remove excessive RBAC permissions.

Fixes and Mitigations Implemented

1. FluentBit Configuration Update

  • Issue: Initially, FluentBit had excessive access due to a hostPath volume mount of the /var/lib/kubelet/pods directory, which included access to sensitive service account tokens.
  • Fix: Google’s security team restricted FluentBit’s access, removing the unnecessary volume mount. This change ensures that FluentBit can only access the logs it requires for its operation, significantly reducing the risk of token compromise.

2. Anthos Service Mesh (ASM) Permissions Adjustment

  • Issue: ASM’s CNI DaemonSet had high privileges, as identified in an internal report.
  • Action Taken: Before the external report, Google was already working on reducing these permissions.
  • Fix: Google modified the ASM’s ClusterRole and restructured some functionalities to eliminate unnecessary RBAC permissions. This change addresses the excessive permissions that previously allowed for potential exploitation.

Impact of the Fixes

  • Security Hardening: These updates significantly enhance the security of both FluentBit and ASM within Kubernetes clusters, mitigating the specific vulnerabilities and strengthening the overall security posture against similar threats.
  • Preventing Privilege Escalation: By rectifying these issues, Google has effectively closed the attack vector that allowed for escalation to cluster admin privileges.
  • Proactive Vulnerability Management: Google’s response, especially their pre-emptive work on ASM’s permissions, highlights the importance of ongoing security assessments and proactive vulnerability management.

Broader Implications for Kubernetes Security

  • Continuous Monitoring and Auditing: Kubernetes environments should be continuously monitored and audited for misconfigurations and excessive permissions, especially for components with wide-ranging access like DaemonSets.
  • Principle of Least Privilege: This principle should be rigorously applied to all Kubernetes components, ensuring that each component has only the permissions necessary for its function.
  • Prompt Patching and Updates: Regularly updating Kubernetes and its associated components is crucial for maintaining security, as vulnerabilities can be discovered and exploited rapidly.

This discovery highlights the importance of vigilant security practices in cloud environments. Kubernetes, while powerful, can be vulnerable to sophisticated attacks due to misconfigurations and excessive privileges in system pods. The proactive response from Google and the detailed analysis by Palo Alto Networks underscore the ongoing need for robust security measures in cloud infrastructures.