Modelplane Modelplane docs

ModelReplica Custom Resource

#Metadata

API version
modelplane.ai/v1alpha1
Kind
ModelReplica
Scope
Namespaced
Short names
mr

#Example

Manifest
# ModelReplicas are composed automatically by ModelDeployment, with the
# scheduler's placement resolved onto each member (nodePoolName, deviceRequests).
# Avoid creating them directly.
apiVersion: modelplane.ai/v1alpha1
kind: ModelReplica
metadata:
  name: qwen3-8b-eks-l4-0
  namespace: ml-team
spec:
  clusterName: eks-l4
  engines:
    - name: qwen3-8b
      members:
        - role: Standalone
          nodePoolName: gpu-l4
          deviceRequests:
            - name: gpu
              count: 1
              deviceClassName: gpu.nvidia.com
          template:
            spec:
              containers:
                - name: engine
                  image: vllm/vllm-openai:v0.23.0
                  args:
                    - --model=Qwen/Qwen3-8B

#Spec

# clusterName required string ≥ 1 chars

Name of the InferenceCluster this replica is pinned to. Replicas are pinned at creation time. If the cluster is temporarily unavailable the replica stays pinned and the parent ModelDeployment surfaces the degraded state via its conditions. If the cluster is deleted entirely the parent ModelDeployment re-places the replica on another viable cluster.

# engines required object[] 1–8 items
# copies optional integer 1–64 default: 1
# members required EngineMember[] → 1–2 items
# name required string 1–63 chars
# phase optional enum: Prefill | Decode
# modelCacheRef optional object

Optional reference to a ModelCache mounted into the engine pods. Inherited verbatim from the parent ModelDeployment.

# name required string ≥ 1 chars
# serving optional object

Serving mode, inherited verbatim from the parent ModelDeployment. PrefillDecode fronts the engines with an InferencePool + endpoint picker and role-labels the engine each marks with its phase; Unified (or absent) fronts them with a Service.

# mode optional enum: Unified | PrefillDecode default: Unified

#EngineMember object

One member of an engine’s gang — a role (Standalone, Leader, or Worker) with its hardware and pod template.

# deviceRequests optional object[] 1–16 items
# count optional integer 1–64 default: 1

How many devices to claim.

# deviceClassName required string 1–253 chars

Cluster-scoped DRA DeviceClass to claim through, from the matched InferenceClass device.

# name required string 1–63 chars

Request name; becomes the DeviceRequest name.

# selectors optional object[] ≤ 8 items
# cel optional string 1–10240 chars
# nodePoolName required string ≥ 1 chars

Name of the node pool on the pinned InferenceCluster the scheduler selected for this member. The scheduler pins every member to a specific pool, so this is always set - a member with no device requests of its own is pinned to its engine’s pool.

# role optional enum: Standalone | Leader | Worker default: Standalone
# template required PodTemplate →
# worker optional object
# nodes required integer 1–63

#PodTemplate object

A curated subset of the Kubernetes PodTemplateSpec: the pod shape Modelplane uses for a member’s engine pods.

# metadata optional object
# annotations optional map[string]string
# labels optional map[string]string
# spec optional object
# containers required object[] 1–1 items
# args optional string[]
# command optional string[]
# env optional object[]
# name required string
# value optional string
# valueFrom optional object
# configMapKeyRef optional object
# key required string
# name required string
# optional optional boolean
# fieldRef optional object

Reference a pod field via the downward API, e.g. status.podIP, metadata.name, or metadata.namespace.

# apiVersion optional string
# fieldPath required string
# secretKeyRef optional object
# key required string
# name required string
# optional optional boolean
# image required string ≥ 1 chars
# name required string ≥ 1 chars
# imagePullSecrets optional object[]
# name required string