Modelplane Modelplane docs

ModelCache Custom Resource

Stage model weights on cluster storage before serving.

Concept guide: Cache Model Weights →

#Metadata

API version
modelplane.ai/v1alpha1
Kind
ModelCache
Scope
Namespaced
Short names
mc

#Example

Manifest
apiVersion: modelplane.ai/v1alpha1
kind: ModelCache
metadata:
  name: qwen-72b
  namespace: ml-team
spec:
  source: HuggingFace
  huggingFace:
    repo: Qwen/Qwen2.5-72B-Instruct
    sizeGiB: 150
    authSecret:
      name: hf-token
      key: HF_TOKEN

#Spec

# clusterSelector optional object

Label selector to pick the InferenceClusters that stage this artifact. If omitted, the cache replicates to every cluster.

# matchLabels optional map[string]string
# huggingFace optional object

HuggingFace source. Required when source is HuggingFace.

# authSecret optional object

Optional Secret holding an HF token for gated or private repos. Names a Secret in the ModelCache’s own namespace; Modelplane propagates it to each matched cluster for the hydration Job to read.

# key optional string default: HF_TOKEN
# name required string ≥ 1 chars
# repo required string ≥ 1 chars

HuggingFace repository ID.

# revision optional string

Branch, tag, or commit SHA. Defaults to the repo’s default branch.

# sizeGiB required integer 1–100000

Capacity to allocate for the staged artifact on each matched cluster.

# source required enum: HuggingFace

Which kind of artifact source to stage from. The matching source object (e.g. spec.huggingFace) must be set.

#Status

# clusters optional object[]
# message optional string
# name required string
# phase optional enum: Pending | Hydrating | Ready | Failed
# conditions optional object[]
# summary optional object

Per-cluster ready / total counts.

# ready optional string

e.g. “2/3”