Build the platform
This is the platform team’s side of Modelplane. You set up the gateway that
fronts your models, give the control plane cloud credentials, and register your
first GPU cluster: a hardware profile published as an InferenceClass and an
InferenceCluster that offers it.
In the next step, the ML team will create a model deployment that schedules against this capacity without knowing which cluster it runs on.
Prerequisites
- An AWS account with permissions to create EKS clusters, VPCs, and IAM roles
- AWS access key ID and secret access key
- A GCP account with permissions to create GKE clusters, VPCs, and IAM roles
- A GCP service account JSON key
Set up the InferenceGateway
The InferenceGateway installs Traefik Proxy and MetalLB on the control plane.
Traefik routes inference traffic to model replicas. MetalLB assigns Traefik’s
LoadBalancer service an external IP on kind, which doesn’t have a cloud load
balancer. You need one named default per control plane.
If you run the control plane on a cloud cluster with native LoadBalancer
support, omit the loadBalancer field.
# The InferenceGateway creates a unified, OpenAI-compatible endpoint on the
# control plane cluster. It installs Traefik Proxy and creates a Gateway that
# routes traffic to model replicas on remote inference clusters.
#
# Create one InferenceGateway per control plane. It must be named "default".
#
# For kind or bare-metal clusters, set loadBalancer to MetalLB and configure an
# address pool. For cloud clusters with native LoadBalancer support, omit the
# loadBalancer field entirely.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceGateway
metadata:
name: default
spec:
backend: Traefik
traefik:
version: "40.2.0"
# Remove the loadBalancer section if your cluster supports LoadBalancer
# services natively (e.g. GKE, EKS).
loadBalancer: MetalLB
metallb:
addressPool: "172.18.255.200-172.18.255.250"
Wait until the gateway is ready:
kubectl wait --for=condition=Ready ig/default --timeout=5mConfigure cloud credentials
Give the control plane credentials so it can provision clusters in your cloud account.
Create an AWS credentials file:
[default]
aws_access_key_id =
aws_secret_access_key = Create a Kubernetes secret:
kubectl create secret generic aws-creds \
--from-file=credentials= \
-n crossplane-systemApply the ClusterProviderConfig referencing your secret:
# Points the AWS provider at the credentials Secret you created. Named default,
# so InferenceClusters with an EKS source use it without further configuration.
apiVersion: aws.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
name: default
spec:
credentials:
source: Secret
secretRef:
namespace: crossplane-system
name: aws-creds
key: credentials
Create a Kubernetes secret:
kubectl create secret generic gcp-creds \
--from-file=credentials=.json \
-n crossplane-systemApply the ClusterProviderConfig, setting projectID to your GCP project:
# Points the GCP provider at the credentials Secret you created. Named default,
# so InferenceClusters with a GKE source use it without further configuration.
apiVersion: gcp.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
name: default
spec:
projectID: my-gcp-project # replace with your GCP project
credentials:
source: Secret
secretRef:
namespace: crossplane-system
name: gcp-creds
key: credentials
curl -fsSL https://docs.modelplane.ai/examples/getting-started/clusterproviderconfig-gke.yaml \
| sed 's/my-gcp-project//' \
| kubectl apply -f -Publish hardware and register the cluster
The InferenceClass describes a hardware profile and how to provision it. The
InferenceCluster registers a cluster that offers it. Apply both:
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: l4-1x-g6
spec:
description: "EKS g6.xlarge, 1x NVIDIA L4"
provisioning:
provider: EKS
eks:
instanceType: g6.xlarge
diskSizeGb: 50
accelerator:
type: nvidia-l4
count: 1
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 1
attributes:
architecture: { string: Ada Lovelace }
capacity:
memory: { value: "23034Mi" } # L4's real reported VRAM (not the nominal 24GB)
---
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
name: eks-us-east
labels:
modelplane.ai/region: us-east
spec:
cluster:
source: EKS
eks:
region: us-east-1
nodePools:
- name: gpu-l4
className: l4-1x-g6
nodeCount: 1
minNodeCount: 1
maxNodeCount: 1
zones:
- us-east-1b
Modelplane provisions the cluster. This takes about 15 minutes:
kubectl wait --for=condition=Ready ic/eks-us-east --timeout=20mApply the manifest, setting the cluster’s project to your GCP project:
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: gke-l4-1x-g2
spec:
description: "GKE g2-standard-8, 1x NVIDIA L4"
provisioning:
provider: GKE
gke:
machineType: g2-standard-8
diskSizeGb: 100
accelerator:
type: nvidia-l4
count: 1
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 1
attributes:
architecture: { string: Ada Lovelace }
capacity:
memory: { value: "23034Mi" } # L4's real reported VRAM (not the nominal 24GB)
---
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
name: starter
labels:
modelplane.ai/region: us-central
spec:
cluster:
source: GKE
gke:
project: my-gcp-project
region: us-central1
nodePools:
- name: gpu-l4
className: gke-l4-1x-g2
nodeCount: 1
minNodeCount: 1 # keep >=1; the autoscaler can't scale a GPU pool up from 0 for DRA pods
maxNodeCount: 2
zones:
- us-central1-a
curl -fsSL https://docs.modelplane.ai/examples/getting-started/gke/platform.yaml \
| sed 's/my-gcp-project//' \
| kubectl apply -f -Modelplane provisions the cluster. This takes about 15 minutes:
kubectl wait --for=condition=Ready ic/starter --timeout=20mModelplane is reconciling the infrastructure against the source of truth, the manifest you just applied.
While you wait, Modelplane is creating the EKS or GKE cluster and its GPU node pool, then installing the inference stack with LeaderWorkerSet for multi-node serving, llm-d for inference-aware routing, Envoy Gateway for traffic management, and the storage class for model weights. This is the same reconciliation loop Crossplane uses to configure other infrastructure, extended to the inference layer.
Once the cluster is Ready the ML team can deploy a model on it.
Next step
Now that the platform is provisioned, the ML team can deploy a model by describing what the model needs, not the infrastructure.