Modelplane Modelplane docs

EKSCluster Custom Resource

An EKSCluster provisions an EKS cluster with dedicated node groups for GPU inference and system workloads. It outputs a Secret containing the cluster kubeconfig that consumers use to target the cluster. The kubeconfig embeds a static bearer token that the AWS provider refreshes.

#Metadata

API version
infrastructure.modelplane.ai/v1alpha1
Kind
EKSCluster
Scope
Namespaced
Short names
eks

#Example

Manifest
apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: EKSCluster
metadata:
  name: inference-west
  namespace: platform
spec:
  region: us-west-2
  kubernetesVersion: "1.36"
  nodePools:
    - name: system
      role: System
      instanceType: m6i.xlarge
      nodeCount: 2
    - name: gpu-a10g
      role: GPU
      instanceType: g6.12xlarge
      diskSizeGb: 200
      nodeCount: 0
      maxNodeCount: 8
      zones: [us-west-2a, us-west-2b]
      gpu:
        acceleratorType: nvidia-a10g

#Spec

EKSClusterSpec defines the desired state of EKSCluster.

# kubernetesVersion optional string 1–16 chars default: 1.36

EKS cluster Kubernetes version. Must be a version EKS currently supports. Defaults to a version where Dynamic Resource Allocation (how GPUs bind to pods) is generally available.

# networking optional object

VPC networking configuration. Defaults give a /16 VPC carved into three /20 subnets, one per Availability Zone. Override when VPC-peering multiple clusters to avoid CIDR collisions.

# subnetCidrs optional string[] 2–6 items ≤ 18 chars default: [10.0.0.0/20 10.0.16.0/20 10.0.32.0/20]
# vpcCidr optional string ≤ 18 chars default: 10.0.0.0/16

Primary CIDR block for the VPC.

# nodePools required object[] 1–8 items
# capacityBlock optional object

Capacity Block reservation backing this node group. Large GPU instances (e.g. p5en.48xlarge) are rarely available on demand; AWS allocates them via Capacity Blocks for ML. Set this to back the node group with a Capacity Block you have purchased. When set, Modelplane composes a launch template targeting the reservation and creates the node group with CAPACITY_BLOCK capacity type. The node group’s zones must match the reservation’s Availability Zone, and nodeCount must not exceed the reserved instance count. Omit for on-demand node groups.

# capacityReservationId required string 4–64 chars

The ID of the Capacity Reservation backing the Capacity Block (e.g. cr-0123456789abcdef0). Purchasing a Capacity Block yields this ID.

pattern: ^cr-[0-9a-f]+$

# diskSizeGb optional integer 10–65536 default: 100

Root volume size in GB.

# fabric optional enum: None | EFA default: None

High-performance node-to-node fabric for multi-node engines. None uses standard VPC networking (ENA/TCP). EFA attaches Elastic Fabric Adapter interfaces to each node via the launch template and an all-self-traffic security group, for GPUDirect RDMA across nodes. EFA is only useful on EFA-capable instance types (e.g. p5en.48xlarge) and needs the EFA DRA driver on the cluster, which Modelplane installs when any pool sets EFA.

# gpu optional object

GPU configuration. Required when role is GPU.

# acceleratorType required string 1–63 chars

GPU accelerator type (e.g. nvidia-a10g, nvidia-h100, nvidia-l4). Used to label GPU nodes; the actual GPU and count are determined by the instance type.

# instanceType required string 1–63 chars

EC2 instance type (e.g. m6i.large, g6.xlarge, p4d.24xlarge).

# maxNodeCount optional integer ≤ 1000 default: 8

Maximum number of nodes for autoscaling.

# minNodeCount optional integer ≤ 1000

Minimum number of nodes for autoscaling. Set to 1 or higher for groups that must always be available.

# name required string 1–40 chars

Unique name for this node group. Used as a suffix in the EKS NodeGroup resource name.

# nodeCount optional integer ≤ 1000 default: 1

Initial number of nodes.

# role required enum: System | GPU

Determines what workloads this group runs. System groups host controllers, gateways, and infrastructure. GPU groups host inference workloads and use a GPU-enabled AMI.

# zones optional string[] 1–8 items 1–63 chars
# region required string 1–32 chars

AWS region for the cluster (e.g. us-west-2, eu-west-1).

#Status

# cache optional object

Observed ModelCache RWX storage state.

# storageClassName optional string ≤ 253 chars

Name of the Modelplane-managed ReadWriteMany StorageClass composed on this cluster for ModelCache PVCs. ModelCache reads this to target the cache PVC.

# secrets optional object[]
# key required string ≤ 253 chars

Key within the Secret that holds the credential data.

# name required string ≤ 253 chars

Name of the Secret.

# type required enum: Kubeconfig

The type of credential this secret contains. Kubeconfig contains a kubeconfig file with the cluster endpoint, CA certificate, and a static bearer token that ClusterAuth refreshes every 10 minutes using the AWS provider’s credentials.