# Register a Cluster A Kubernetes cluster registered with Modelplane for model serving. Source: https://docs.modelplane.ai/platform/inference-cluster/ **API:** [`modelplane.ai/v1alpha1` · InferenceCluster]({{< ref "/reference/inferenceclusters" >}}) An `InferenceCluster` represents a Kubernetes cluster configured for model serving. Platform teams create these to provide GPU capacity. Each cluster has: - A **cluster source**: `GKE` or `EKS` (Modelplane provisions the full cluster) or `Existing` (bring a cluster you manage yourself). See [Supported Providers]({{< ref "platform/providers.md" >}}) for the clouds and neoclouds Modelplane runs on. - One or more **node pools**, each referencing an `InferenceClass` for its hardware capabilities and provisioning recipe. - **Labels** for organizational metadata: tier, region, provider. These are the matching surface for `ModelDeployment.clusterSelector`. Modelplane installs the serving stack it needs on every cluster it manages, including existing clusters, which it assumes are solely for its use. ## Ownership and requirements Modelplane assumes exclusive ownership of every `InferenceCluster`. The fleet scheduler's capacity accounting relies on Modelplane being the only thing placing GPU workloads on the cluster, so dedicate each cluster to Modelplane rather than sharing it with other workloads. Modelplane also has opinions about how a cluster is set up: its Kubernetes version, the components it installs, and required features like DRA for binding GPUs to pods. On provisioned clusters Modelplane handles this for you. On an existing cluster the platform team must meet the requirements. ## Provisioned and existing clusters The `cluster.source` discriminator picks one of two models: - **Provisioned (`GKE`, `EKS`).** Modelplane creates the cluster and its GPU node pools from each pool's `InferenceClass`, labels the pool's nodes so the scheduler's placement is enforced, and provisions the storage class for model weights. It also injects a non-GPU **system pool** with opinionated defaults to run the inference stack, so you only declare the GPU pools you want. - **Existing (`Existing`).** A kubeconfig `Secret` provides access to a cluster you run yourself. Modelplane installs the serving stack it needs but doesn't provision infrastructure, and each pool's `InferenceClass` provides hardware capabilities for scheduling only. You're responsible for the cluster meeting Modelplane's requirements, including labeling each pool's nodes `modelplane.ai/pool=` (see [how scheduling pins placement]({{< ref "/architecture/scheduling.md#pinning-placement-to-a-pool" >}})). ## Examples {{< tabs >}} {{< tab "GKE" >}} {{< manifests path="concepts/inference-cluster-gke.yaml" apply="false" >}} {{< /tab >}} {{< tab "EKS" >}} {{< manifests path="concepts/inference-cluster-eks.yaml" apply="false" >}} {{< /tab >}} {{< tab "Existing" >}} {{< manifests path="concepts/inference-cluster-existing.yaml" apply="false" >}} {{< /tab >}} {{< /tabs >}} ## Cache storage A [ModelCache]({{< ref "/models/model-cache.md" >}}) stages model weights on a `ReadWriteMany` (RWX) StorageClass on the workload cluster. Where that comes from depends on the source: - **`GKE`** (Filestore Enterprise) and **`EKS`** (EFS): auto-provisioned. Those classes are fixed; nothing for the admin to do. - **`Existing`**: bring your own. Create an RWX StorageClass on the cluster, with any backend that supports automatic PVC provisioning (WekaIO, NetApp Trident, `FSx` for NetApp, and similar), and name it in `cluster.existing.cache.storageClassName`. The ML team's `ModelCache` and `ModelDeployment` specs are the same regardless of which backing storage a cluster uses.