ModelCache Custom Resource

On this page

Stage model weights on cluster storage before serving.

#Metadata

API version: modelplane.ai/v1alpha1
Kind: ModelCache
Scope: Namespaced
Short names: mc

#Example

Manifest

apiVersion: modelplane.ai/v1alpha1
kind: ModelCache
metadata:
  name: qwen-72b
  namespace: ml-team
spec:
  source: HuggingFace
  huggingFace:
    repo: Qwen/Qwen2.5-72B-Instruct
    sizeGiB: 150
    authSecret:
      name: hf-token
      key: HF_TOKEN

#Spec

Label selector to pick the InferenceClusters that stage this artifact. If omitted, the cache replicates to every cluster.

HuggingFace source. Required when source is HuggingFace.

Optional Secret holding an HF token for gated or private repos. Names a Secret in the ModelCache’s own namespace; Modelplane propagates it to each matched cluster for the hydration Job to read.

HuggingFace repository ID.

Branch, tag, or commit SHA. Defaults to the repo’s default branch.

Capacity to allocate for the staged artifact on each matched cluster.

Which kind of artifact source to stage from. The matching source object (e.g. spec.huggingFace) must be set.

#Status

Per-cluster ready / total counts.

e.g. “2/3”