Enhancing Kubernetes Scalability and Responsiveness with Pod Priority and Over-Provisioning — Terraform

4 min readSep 1, 2023

Introduction

As organizations increasingly adopt Kubernetes for container orchestration, ensuring efficient resource utilization and responsiveness becomes a critical consideration. In this blog, we’ll explore how to eliminate Kubernetes node scaling lag by leveraging pod priority and over-provisioning strategies. By implementing these techniques, you can enhance your cluster’s scalability, optimize resource allocation, and maintain consistent performance even during peak demand.

The Challenge of Node Scaling Lag

Kubernetes node scaling lag is a common issue that arises when clusters experience sudden spikes in workload demand. When new pods are scheduled, especially high-priority ones, it can take time for the cluster autoscaler to react and provision additional nodes. This lag can lead to performance degradation and delays in deploying critical workloads.

Pod Priority and Its Benefits

Pod priority is a feature in Kubernetes that assigns priority classes to pods based on their importance. High-priority pods are given preference during scheduling and eviction, ensuring that critical workloads are always accommodated promptly. By leveraging pod priority, you can ensure that mission-critical applications receive the resources they need, reducing the risk of performance bottlenecks.

Over-Provisioning for Rapid Scaling

Over-provisioning involves deliberately allocating more resources to your Kubernetes cluster than what is strictly required. This surplus capacity provides a buffer that absorbs sudden spikes in workload demand, eliminating the need for immediate node scaling. Over-provisioning, in combination with pod priority, creates a safety net that prevents performance degradation during peak periods.

Implementing Pod Priority and Over-Provisioning

Defining Priority Classes: Identify different priority classes based on the criticality of your workloads. Assign priority values and labels to your pods to indicate their importance.
Taints and Tolerations: Taint nodes with varying degrees of priority to signal their suitability for different classes of pods. Pods can then be configured with tolerations to match the taints.
Pod Disruption Budgets (PDBs): Implement PDBs to control the maximum number of evictions that can occur simultaneously. This prevents disruptions to critical applications.
Resource Requests and Limits: Accurately define resource requests and limits for your pods. This information helps the scheduler allocate resources effectively.
Monitoring and Tuning: Regularly monitor cluster performance and adjust priority class settings and over-provisioning levels based on real-time demands.

Implementation with terraform

We would create simple 3 files. 1. prioritypod.tf , 2. variable.tf , 3. stage.tfvars. This would create priority classes & low-priority pods.

#--------------------
#prioritypod.tf
#--------------------
resource "kubernetes_priority_class" "service_priority" {
  count = var.kubernetes_enable_priority_pod == true ? 1 : 0
  metadata {
    name = "service-priority"
  }
  description    = "This priority class should be used for high priority service pods only."
  value          = 1000000
  global_default = true
}

resource "kubernetes_priority_class" "dummy_service_priority" {
  count = var.kubernetes_enable_priority_pod == true ? 1 : 0
  metadata {
    name = "low-priority"
  }
  description    = "This priority class should be used for dummy service pods only."
  value          = -1
  global_default = false
}
resource "kubernetes_deployment" "low_priority_pods" {
  for_each = var.kubernetes_enable_priority_pod == true ? var.priority_pods : {}
  metadata {
    name   = "${each.key}-low-priority-pods-deployment"
    labels = lookup(each.value, "labels")
  }
  spec {
    replicas = lookup(each.value, "count")
    selector {
      match_labels = lookup(each.value, "labels")
    }
    template {
      metadata {
        labels = lookup(each.value, "labels")
      }
      spec {
        priority_class_name             = "low-priority"
        node_selector                   = lookup(each.value, "labels")
        automount_service_account_token = false
        dynamic "toleration" {
          for_each = lookup(each.value, "tolerations")
          content {
            key      = toleration.value.key
            operator = toleration.value.operator
            value    = toleration.value.value
            effect   = toleration.value.effect
          }
        }

        container {
          image = "registry.k8s.io/pause"
          name  = "pause"
          resources {
            limits = {
              cpu    = lookup(each.value, "cpu")
              memory = lookup(each.value, "memory")
            }
            requests = {
              cpu    = lookup(each.value, "cpu")
              memory = lookup(each.value, "memory")
            }
          }
        }
      }
    }
  }
}

variable file:

variable "kubernetes_enable_priority_pod" {
  description = "Enable Priority pods on Kubernetes environments"
  default     = false
  type        = bool
}

variable "priority_pods" {
  description = "List of priority pods with respective taints"
  default     = {}
  type        = any
}

Sample tfvars:

kubernetes_enable_priority_pod = true
priority_pods = {
  "my-app" = {
    cpu    = "1"
    memory = "4Gi"
    count  = "5"
    labels = {
      service = "my-app"
    }
    tolerations = [
      {
        effect   = "NoSchedule",
        key      = "service",
        operator = "Equal",
        value    = "my-app"
      }
    ]
  }
}

Real-World Benefits

Reduced Scaling Lag: By prioritizing pods and having surplus resources available through over-provisioning, you can significantly reduce node scaling lag during workload spikes.
Improved Responsiveness: High-priority pods are immediately scheduled and granted resources, ensuring that critical applications remain responsive at all times.
Enhanced Resource Utilization: Over-provisioning allows for efficient resource usage by accommodating temporary load surges without the need for immediate node scaling.
Consistent Performance: Through a combination of pod priority and over-provisioning, you can maintain consistent performance levels even during peak traffic periods.

Considerations and Best Practices

Balancing Priority Levels: Strike a balance between different priority classes to ensure that both critical and less critical workloads are adequately addressed.
Regular Review: Continuously assess the performance of your priority-based scheduling strategy and over-provisioning levels to adapt to changing demands.
Testing and Simulation: Simulate load spikes and evaluate how well your cluster responds. Use testing to fine-tune your priority and over-provisioning settings.

Conclusion

The combination of pod priority and over-provisioning offers a powerful approach to eliminating Kubernetes node scaling lag and ensuring the optimal use of cluster resources. By strategically assigning priority classes, implementing tolerations, and embracing over-provisioning, you can create a more resilient and responsive environment that meets the needs of your critical workloads. As organizations strive for efficient and consistent performance in their Kubernetes clusters, these strategies stand as valuable tools in achieving that goal.

Reference

Eliminate Kubernetes node scaling lag with pod priority and over-provisioning | Amazon Web Services

Introduction In Kubernetes, the Data Plane consists of two layers of scaling: a pod layer and a worker node layer. The…

aws.amazon.com