EKSCluster Custom Resource

On this page

An EKSCluster provisions an EKS cluster with dedicated node groups for GPU inference and system workloads. It outputs a Secret containing the cluster kubeconfig that consumers use to target the cluster. The kubeconfig embeds a static bearer token that the AWS provider refreshes.

#Metadata

API version: infrastructure.modelplane.ai/v1alpha1
Kind: EKSCluster
Scope: Namespaced
Short names: eks

#Example

Manifest

apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: EKSCluster
metadata:
  name: inference-west
  namespace: platform
spec:
  region: us-west-2
  kubernetesVersion: "1.36"
  nodePools:
    - name: system
      role: System
      instanceType: m6i.xlarge
      nodeCount: 2
    - name: gpu-a10g
      role: GPU
      instanceType: g6.12xlarge
      diskSizeGb: 200
      nodeCount: 0
      maxNodeCount: 8
      zones: [us-west-2a, us-west-2b]
      gpu:
        acceleratorType: nvidia-a10g

#Spec

EKSClusterSpec defines the desired state of EKSCluster.

EKS cluster Kubernetes version. Must be a version EKS currently supports. Defaults to a version where Dynamic Resource Allocation (how GPUs bind to pods) is generally available.

VPC networking configuration. Defaults give a /16 VPC carved into three /20 subnets, one per Availability Zone. Override when VPC-peering multiple clusters to avoid CIDR collisions.

Primary CIDR block for the VPC.

Capacity Block reservation backing this node group. Large GPU instances (e.g. p5en.48xlarge) are rarely available on demand; AWS allocates them via Capacity Blocks for ML. Set this to back the node group with a Capacity Block you have purchased. When set, Modelplane composes a launch template targeting the reservation and creates the node group with CAPACITY_BLOCK capacity type. The node group’s zones must match the reservation’s Availability Zone, and nodeCount must not exceed the reserved instance count. Omit for on-demand node groups.

The ID of the Capacity Reservation backing the Capacity Block (e.g. cr-0123456789abcdef0). Purchasing a Capacity Block yields this ID.

pattern: ^cr-[0-9a-f]+$

Root volume size in GB.

High-performance node-to-node fabric for multi-node engines. None uses standard VPC networking (ENA/TCP). EFA attaches Elastic Fabric Adapter interfaces to each node via the launch template and an all-self-traffic security group, for GPUDirect RDMA across nodes. EFA is only useful on EFA-capable instance types (e.g. p5en.48xlarge) and needs the EFA DRA driver on the cluster, which Modelplane installs when any pool sets EFA.

GPU configuration. Required when role is GPU.

GPU accelerator type (e.g. nvidia-a10g, nvidia-h100, nvidia-l4). Used to label GPU nodes; the actual GPU and count are determined by the instance type.

EC2 instance type (e.g. m6i.large, g6.xlarge, p4d.24xlarge).

Maximum number of nodes for autoscaling.

Minimum number of nodes for autoscaling. Set to 1 or higher for groups that must always be available.

Unique name for this node group. Used as a suffix in the EKS NodeGroup resource name.

Initial number of nodes.

Determines what workloads this group runs. System groups host controllers, gateways, and infrastructure. GPU groups host inference workloads and use a GPU-enabled AMI.

AWS region for the cluster (e.g. us-west-2, eu-west-1).

#Status

Observed ModelCache RWX storage state.

Name of the Modelplane-managed ReadWriteMany StorageClass composed on this cluster for ModelCache PVCs. ModelCache reads this to target the cache PVC.

Key within the Secret that holds the credential data.

Name of the Secret.

The type of credential this secret contains. Kubeconfig contains a kubeconfig file with the cluster endpoint, CA certificate, and a static bearer token that ClusterAuth refreshes every 10 minutes using the AWS provider’s credentials.