Modelplane Modelplane docs

InferenceCluster Custom Resource

A Kubernetes cluster registered with Modelplane for model serving.

Concept guide: Register a Cluster →

#Metadata

API version
modelplane.ai/v1alpha1
Kind
InferenceCluster
Scope
Cluster
Short names
ic

#Example

Manifest
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: west-gke
spec:
  cluster:
    source: GKE
    gke:
      project: my-gcp-project
      region: us-central1
  nodePools:
    - name: h100-pool
      className: h100-8x-byo
      nodeCount: 2
      maxNodeCount: 10
      zones: [us-central1-a]

#Spec

# cluster required object
# eks optional object

EKS cluster configuration. Required when source is EKS.

# kubernetesVersion optional string default: 1.36

EKS cluster Kubernetes version. Defaults to a version where Dynamic Resource Allocation (how GPUs bind to pods) is generally available.

# region required string 1–32 chars

AWS region for the cluster (e.g. us-west-2).

# existing optional object

Bring-your-own cluster configuration. Required when source is Existing. Modelplane manages the inference stack on the cluster but does not provision the cluster itself.

# cache optional object

ModelCache configuration for this cluster.

# storageClassName optional string 1–253 chars

Name of an existing ReadWriteMany StorageClass for ModelCache PVCs. Modelplane doesn’t provision storage on an existing cluster, so the admin must create the StorageClass (it must support ReadWriteMany dynamic provisioning).

# identitySecretRef optional object

Optional reference to a Secret containing cloud provider credentials for IAM-based authentication.

# key optional string 1–253 chars default: private_key
# name required string 1–253 chars
# secretRef required object

Reference to a Secret containing a kubeconfig for the existing cluster. The Secret must exist in the modelplane-system namespace.

# key optional string 1–253 chars default: kubeconfig
# name required string 1–253 chars
# gke optional object

GKE cluster configuration. Required when source is GKE.

# kubernetesVersion optional string default: 1.35
# project required string 6–30 chars
# region required string 1–32 chars
# source required enum: GKE | EKS | Existing

Cluster provisioning method.

# nodePools optional object[] 1–8 items
# capacityBlock optional object

Capacity Block reservation backing this node pool. EKS only. Large GPU instances (e.g. p5en.48xlarge) are rarely available on demand; AWS allocates them via Capacity Blocks for ML. Set this to back the pool with a Capacity Block you have purchased. The pool’s zones must match the reservation’s Availability Zone, and nodeCount must not exceed the reserved instance count. Omit for on-demand pools.

# capacityReservationId required string 4–64 chars

The ID of the Capacity Reservation backing the Capacity Block (e.g. cr-0123456789abcdef0). Purchasing a Capacity Block yields this ID.

pattern: ^cr-[0-9a-f]+$

# className required string 1–253 chars

Name of the InferenceClass describing this pool’s hardware.

# fabric optional enum: None | EFA default: None

High-performance node-to-node fabric for multi-node engines. None uses standard VPC networking (ENA/TCP). EFA attaches Elastic Fabric Adapter interfaces to each node for GPUDirect RDMA across nodes, so a gang’s tensor-parallel traffic isn’t capped by TCP. EKS only. Only useful on EFA-capable instance types (e.g. p5en.48xlarge). When any pool sets EFA, Modelplane installs the EFA DRA driver on the cluster and the gang’s pods claim EFA devices alongside their GPUs.

# maxNodeCount optional integer ≥ 1

Maximum node count for autoscaling. Omit for fixed-size pools.

# minNodeCount optional integer
# name required string ≤ 40 chars
# nodeCount optional integer default: 1
# zones optional string[]

#Status

# cache optional object

Observed ModelCache RWX storage state.

# storageClassName optional string ≤ 253 chars

Effective ReadWriteMany StorageClass name for ModelCache PVCs on this cluster. ModelCache reads this to target the cache PVC.

# gateway optional object
# address optional string

External IP of the inference gateway on the remote cluster. Used by ModelDeployment for unified endpoint routing.

# gpuPools optional object[] ≤ 8 items
# devices optional object[] ≤ 16 items
# attributes optional map[string]object
# capacity optional map[string]object
# claim optional enum: DRA | Synthetic
# count optional integer
# deviceClassName optional string ≤ 253 chars
# driver required string ≤ 253 chars
# name required string ≤ 63 chars
# name required string

Node pool name, matching spec.nodePools[].name. Used to pin a ModelReplica to a specific pool via spec.nodePoolName.

# nodes optional integer

Number of nodes in this pool. Derived from maxNodeCount (if autoscaling) or nodeCount.

# namespace optional string

Namespace where the internal XRs (cluster, backend) were created.

# providerConfigRef optional object
# name optional string

Name of the ProviderConfig targeting the remote cluster. Used by ModelReplica to create resources on the cluster.