Glossary

On this page

Modelplane

The open source control plane software. You install Modelplane on a Kubernetes cluster (the control cluster). Modelplane never serves tokens itself; it orchestrates the clusters and engines that do.

Control cluster

The Kubernetes cluster where Modelplane runs. It needs no GPUs. It holds Modelplane’s Crossplane-based components and the API resources you apply to declare your fleet.

Inference cluster

A GPU cluster in the fleet where serving engines run and tokens are produced. Modelplane can provision inference clusters on EKS, GKE, and other providers, or you can bring your own through an InferenceCluster with source: Existing.

Fleet

All inference clusters managed by a single Modelplane control cluster.

Platform

The inference infrastructure the platform team provisions using InferenceGateway, InferenceClass, and InferenceCluster resources. This is distinct from Modelplane itself, which runs on the control cluster above the fleet.

Platform team

The infrastructure team responsible for GPU capacity. They create InferenceCluster, InferenceClass, and InferenceGateway resources, provisioning the fleet that ML teams deploy against.

ML team

The development team deploying models. They create ModelDeployment, ModelService, and ModelCache resources, declaring what a model needs without knowing which cluster it runs on.