ModelReplica Custom Resource

On this page

#Metadata

API version: modelplane.ai/v1alpha1
Kind: ModelReplica
Scope: Namespaced
Short names: mr

#Example

Manifest

# ModelReplicas are composed automatically by ModelDeployment, with the
# scheduler's placement resolved onto each member (nodePoolName, deviceRequests).
# Avoid creating them directly.
apiVersion: modelplane.ai/v1alpha1
kind: ModelReplica
metadata:
  name: qwen3-8b-eks-l4-0
  namespace: ml-team
spec:
  clusterName: eks-l4
  engines:
    - name: qwen3-8b
      members:
        - role: Standalone
          nodePoolName: gpu-l4
          deviceRequests:
            - name: gpu
              count: 1
              deviceClassName: gpu.nvidia.com
          template:
            spec:
              containers:
                - name: engine
                  image: vllm/vllm-openai:v0.23.0
                  args:
                    - --model=Qwen/Qwen3-8B

#Spec

Name of the InferenceCluster this replica is pinned to. Replicas are pinned at creation time. If the cluster is temporarily unavailable the replica stays pinned and the parent ModelDeployment surfaces the degraded state via its conditions. If the cluster is deleted entirely the parent ModelDeployment re-places the replica on another viable cluster.

Optional reference to a ModelCache mounted into the engine pods. Inherited verbatim from the parent ModelDeployment.

Serving mode, inherited verbatim from the parent ModelDeployment. PrefillDecode fronts the engines with an InferencePool + endpoint picker and role-labels the engine each marks with its phase; Unified (or absent) fronts them with a Service.

#EngineMember object

One member of an engine’s gang — a role (Standalone, Leader, or Worker) with its hardware and pod template.

How many devices to claim.

Cluster-scoped DRA DeviceClass to claim through, from the matched InferenceClass device.

Request name; becomes the DeviceRequest name.

Name of the node pool on the pinned InferenceCluster the scheduler selected for this member. The scheduler pins every member to a specific pool, so this is always set - a member with no device requests of its own is pinned to its engine’s pool.

#PodTemplate object

A curated subset of the Kubernetes PodTemplateSpec: the pod shape Modelplane uses for a member’s engine pods.

Reference a pod field via the downward API, e.g. status.podIP, metadata.name, or metadata.namespace.