EKSCluster Custom Resource
An EKSCluster provisions an EKS cluster with dedicated node groups for GPU inference and system workloads. It outputs a Secret containing the cluster kubeconfig that consumers use to target the cluster. The kubeconfig embeds a static bearer token that the AWS provider refreshes.
#Metadata
#Example
Manifest
apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: EKSCluster
metadata:
name: inference-west
namespace: platform
spec:
region: us-west-2
kubernetesVersion: "1.36"
nodePools:
- name: system
role: System
instanceType: m6i.xlarge
nodeCount: 2
- name: gpu-a10g
role: GPU
instanceType: g6.12xlarge
diskSizeGb: 200
nodeCount: 0
maxNodeCount: 8
zones: [us-west-2a, us-west-2b]
gpu:
acceleratorType: nvidia-a10g
#Spec
EKSClusterSpec defines the desired state of EKSCluster.
EKS cluster Kubernetes version. Must be a version EKS currently supports. Defaults to a version where Dynamic Resource Allocation (how GPUs bind to pods) is generally available.
VPC networking configuration. Defaults give a /16 VPC carved into three /20 subnets, one per Availability Zone. Override when VPC-peering multiple clusters to avoid CIDR collisions.
Capacity Block reservation backing this node group. Large GPU instances (e.g. p5en.48xlarge) are rarely available on demand; AWS allocates them via Capacity Blocks for ML. Set this to back the node group with a Capacity Block you have purchased. When set, Modelplane composes a launch template targeting the reservation and creates the node group with CAPACITY_BLOCK capacity type. The node group’s zones must match the reservation’s Availability Zone, and nodeCount must not exceed the reserved instance count. Omit for on-demand node groups.
The ID of the Capacity Reservation backing the Capacity Block (e.g. cr-0123456789abcdef0). Purchasing a Capacity Block yields this ID.
Root volume size in GB.
High-performance node-to-node fabric for multi-node engines. None uses standard VPC networking (ENA/TCP). EFA attaches Elastic Fabric Adapter interfaces to each node via the launch template and an all-self-traffic security group, for GPUDirect RDMA across nodes. EFA is only useful on EFA-capable instance types (e.g. p5en.48xlarge) and needs the EFA DRA driver on the cluster, which Modelplane installs when any pool sets EFA.
GPU configuration. Required when role is GPU.
GPU accelerator type (e.g. nvidia-a10g, nvidia-h100, nvidia-l4). Used to label GPU nodes; the actual GPU and count are determined by the instance type.
EC2 instance type (e.g. m6i.large, g6.xlarge, p4d.24xlarge).
Maximum number of nodes for autoscaling.
Minimum number of nodes for autoscaling. Set to 1 or higher for groups that must always be available.
Unique name for this node group. Used as a suffix in the EKS NodeGroup resource name.
Initial number of nodes.
Determines what workloads this group runs. System groups host controllers, gateways, and infrastructure. GPU groups host inference workloads and use a GPU-enabled AMI.
AWS region for the cluster (e.g. us-west-2, eu-west-1).
#Status
Observed ModelCache RWX storage state.
Name of the Modelplane-managed ReadWriteMany StorageClass composed on this cluster for ModelCache PVCs. ModelCache reads this to target the cache PVC.
Key within the Secret that holds the credential data.
Name of the Secret.
The type of credential this secret contains. Kubeconfig contains a kubeconfig file with the cluster endpoint, CA certificate, and a static bearer token that ClusterAuth refreshes every 10 minutes using the AWS provider’s credentials.