Get started on Modelplane Docs

Installation

The control plane is where everything in Modelplane runs. In this step you’ll install it on a local kind cluster, using Crossplane for reconciliation and the Modelplane APIs. No cloud yet, that comes next.

This step takes about five minutes.

Prerequisites

Install kind, kubectl, and Helm on your machine.

Note

You can run your Modelplane control plane anywhere. This tour uses kind for illustration.

Install the control plane

Crossplane provides the reconciliation engine and package management. Create the kind cluster and install it with Helm:

Build the platform

This is the platform team’s side of Modelplane. You set up the gateway that fronts your models, give the control plane cloud credentials, and register your first GPU cluster: a hardware profile published as an InferenceClass and an InferenceCluster that offers it.

In the next step, the ML team will create a model deployment that schedules against this capacity without knowing which cluster it runs on.

Prerequisites

An AWS account with permissions to create EKS clusters, VPCs, and IAM roles
AWS access key ID and secret access key

A GCP account with permissions to create GKE clusters, VPCs, and IAM roles
A GCP service account JSON key

Set up the InferenceGateway

The InferenceGateway installs Traefik Proxy and MetalLB on the control plane. Traefik routes inference traffic to model replicas. MetalLB assigns Traefik’s LoadBalancer service an external IP on kind, which doesn’t have a cloud load balancer. You need one named default per control plane.

Deploying a model

Now that the platform is provisioned, the ML team can declare what a model needs with a ModelDeployment. Describe the hardware requirements and the scheduler schedules against the capacity the platform team published.

Create a deployment

Create a namespace for the model:

kubectl create namespace ml-team

The device selector matches against the capacity declared in the InferenceClass, not the pod’s resource requests. Any L4 node satisfies >= 20Gi, so this deployment runs on the cluster you just added:

Scale the platform

You have one L4 cluster with a running model. In this guide, you’ll add two larger-GPU clusters in different regions to grow the fleet available to the ML team.

Provisioning two more clusters takes about 10 to 15 minutes.

Register more clusters

Scale the model

A ModelService can front more than one ModelDeployment. Here you add a second deployment, pinned to a different region, and point the same service at both. The endpoint you already curled stays the same. Behind it, traffic now load-balances across two regions.

graph LR
subgraph fleet ["Fleet"]
IC1["us-east\nL4"]
IC2["us-west\nlarger GPU"]
end
subgraph ml ["ML team"]
MD1["ModelDeployment\nqwen-demo"]
MD2["ModelDeployment\nqwen-west\nclusterSelector: us-west"]
MS["ModelService qwen\n/ml-team/qwen/v1/..."]
end
IC1 --> MD1
IC2 --> MD2
MD1 --> MS
MD2 --> MS

Deploy to a second region

The new deployment uses a clusterSelector to pin its replica to the us-west cluster you added in the last step, and selects the larger GPU there:

Clean up

Delete the model resources, clusters, and finally the control plane.

Delete model resources

Delete model resources before clusters. Deleting a cluster first leaves the deployments reconciling against infrastructure that no longer exists.

kubectl delete md --all -n ml-team
kubectl delete ms --all -n ml-team

Wait for all model replicas to finish: