ModelCache Custom Resource
On this page
Stage model weights on cluster storage before serving.
Concept guide: Cache Model Weights →
#Metadata
#Example
Manifest
apiVersion: modelplane.ai/v1alpha1
kind: ModelCache
metadata:
name: qwen-72b
namespace: ml-team
spec:
source: HuggingFace
huggingFace:
repo: Qwen/Qwen2.5-72B-Instruct
sizeGiB: 150
authSecret:
name: hf-token
key: HF_TOKEN
#Spec
Label selector to pick the InferenceClusters that stage this artifact. If omitted, the cache replicates to every cluster.
HuggingFace source. Required when source is HuggingFace.
Optional Secret holding an HF token for gated or private repos. Names a Secret in the ModelCache’s own namespace; Modelplane propagates it to each matched cluster for the hydration Job to read.
HuggingFace repository ID.
Branch, tag, or commit SHA. Defaults to the repo’s default branch.
Capacity to allocate for the staged artifact on each matched cluster.
Which kind of artifact source to stage from. The matching source object (e.g. spec.huggingFace) must be set.