<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Get started on Modelplane Docs</title><link>https://docs.modelplane.ai/getting-started/</link><description>Recent content in Get started on Modelplane Docs</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 0001 00:00:00 +0000</lastBuildDate><atom:link href="https://docs.modelplane.ai/getting-started/index.xml" rel="self" type="application/rss+xml"/><item><title>Installation</title><link>https://docs.modelplane.ai/getting-started/installation/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/installation/</guid><description>&lt;p&gt;The control plane is where everything in Modelplane runs. In this step you&amp;rsquo;ll install it on a local kind cluster, using Crossplane for reconciliation and the Modelplane APIs. No cloud yet, that comes next.&lt;/p&gt;
&lt;p&gt;This step takes about five minutes.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites &lt;a class="anchor-link" id="prerequisites" href="#prerequisites" aria-label="Link to this section: Prerequisites"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Install &lt;a href="https://kind.sigs.k8s.io/"&gt;kind&lt;/a&gt;,
&lt;a href="https://kubernetes.io/docs/tasks/tools/"&gt;kubectl&lt;/a&gt;, and
&lt;a href="https://helm.sh/docs/intro/install/"&gt;Helm&lt;/a&gt; on your machine.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;div class="admonition-title"&gt;
&lt;svg class="bi flex-shrink-0" role="img" aria-label="note:"&gt;&lt;use
xlink:href="#info"/&gt;&lt;/svg&gt;
&lt;span class="ps-1"&gt;Note&lt;/span&gt;
&lt;/div&gt;
&lt;div class="admonition-content"&gt;
You can run your Modelplane control plane anywhere. This tour uses kind for
illustration.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="install-the-control-plane"&gt;Install the control plane &lt;a class="anchor-link" id="install-the-control-plane" href="#install-the-control-plane" aria-label="Link to this section: Install the control plane"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Crossplane provides the reconciliation engine and package management. Create the
kind cluster and install it with Helm:&lt;/p&gt;</description></item><item><title>Build the platform</title><link>https://docs.modelplane.ai/getting-started/build-the-platform/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/build-the-platform/</guid><description>&lt;p&gt;This is the platform team&amp;rsquo;s side of Modelplane. You set up the gateway that
fronts your models, give the control plane cloud credentials, and register your
first GPU cluster: a hardware profile published as an &lt;code&gt;InferenceClass&lt;/code&gt; and an
&lt;code&gt;InferenceCluster&lt;/code&gt; that offers it.&lt;/p&gt;
&lt;p&gt;In the next step, the ML team will create a model deployment that schedules
against this capacity without knowing which cluster it runs on.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites &lt;a class="anchor-link" id="prerequisites" href="#prerequisites" aria-label="Link to this section: Prerequisites"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;ul class="nav nav-tabs" id="665" role="tablist"&gt;
&lt;li class="nav-item" role="presentation"&gt;
&lt;button class="nav-link active "
id="tab-334"
data-bs-toggle="tab"
data-bs-target="#tab-pane-334"
type="button"
role="tab"
aria-controls="tab-pane-334" aria-selected="false"&gt;EKS&lt;/button&gt;
&lt;/li&gt;
&lt;li class="nav-item" role="presentation"&gt;
&lt;button class="nav-link "
id="tab-1645"
data-bs-toggle="tab"
data-bs-target="#tab-pane-1645"
type="button"
role="tab"
aria-controls="tab-pane-1645" aria-selected="false"&gt;GKE&lt;/button&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="tab-content" id="Content-665"&gt;
&lt;div
class="tab-pane fade active show pt-sm-3"
id="tab-pane-334"
role="tabpanel"
aria-labelledby="tab-334"
tabindex="0"&gt;&lt;ul&gt;
&lt;li&gt;An AWS account with permissions to create EKS clusters, VPCs, and IAM roles&lt;/li&gt;
&lt;li&gt;AWS access key ID and secret access key&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div
class="tab-pane fade pt-sm-3"
id="tab-pane-1645"
role="tabpanel"
aria-labelledby="tab-1645"
tabindex="0"&gt;&lt;ul&gt;
&lt;li&gt;A GCP account with permissions to create GKE clusters, VPCs, and IAM roles&lt;/li&gt;
&lt;li&gt;A GCP service account JSON key&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="set-up-the-inferencegateway"&gt;Set up the InferenceGateway &lt;a class="anchor-link" id="set-up-the-inferencegateway" href="#set-up-the-inferencegateway" aria-label="Link to this section: Set up the InferenceGateway"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;!-- vale ai-tells.EmptyPadding = NO --&gt;
&lt;p&gt;The &lt;code&gt;InferenceGateway&lt;/code&gt; installs Traefik Proxy and MetalLB on the control plane.
Traefik routes inference traffic to model replicas. MetalLB assigns Traefik&amp;rsquo;s
&lt;code&gt;LoadBalancer&lt;/code&gt; service an external IP on kind, which doesn&amp;rsquo;t have a cloud load
balancer. You need one named &lt;code&gt;default&lt;/code&gt; per control plane.&lt;/p&gt;</description></item><item><title>Deploying a model</title><link>https://docs.modelplane.ai/getting-started/deploying-a-model/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/deploying-a-model/</guid><description>&lt;p&gt;Now that the platform is provisioned, the ML team can declare what a model needs
with a &lt;code&gt;ModelDeployment&lt;/code&gt;. Describe the hardware requirements and the scheduler
schedules against the capacity the platform team published.&lt;/p&gt;
&lt;h2 id="create-a-deployment"&gt;Create a deployment &lt;a class="anchor-link" id="create-a-deployment" href="#create-a-deployment" aria-label="Link to this section: Create a deployment"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Create a namespace for the model:&lt;/p&gt;
&lt;div class="code-card"&gt;
&lt;div class="code-card__header"&gt;
&lt;span class="code-card__name"&gt;bash&lt;/span&gt;
&lt;div class="code-card__actions"&gt;
&lt;button class="code-card__btn code-card__copy" type="button" aria-label="Copy contents" title="Copy contents"&gt;
&lt;svg class="icon-main" width="15" height="15" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" focusable="false"&gt;&lt;path d="M4 1.5H3a2 2 0 0 0-2 2V14a2 2 0 0 0 2 2h10a2 2 0 0 0 2-2V3.5a2 2 0 0 0-2-2h-1v1h1a1 1 0 0 1 1 1V14a1 1 0 0 1-1 1H3a1 1 0 0 1-1-1V3.5a1 1 0 0 1 1-1h1z"/&gt;&lt;path d="M9.5 1a.5.5 0 0 1 .5.5v1a.5.5 0 0 1-.5.5h-3a.5.5 0 0 1-.5-.5v-1a.5.5 0 0 1 .5-.5zm-3-1A1.5 1.5 0 0 0 5 1.5v1A1.5 1.5 0 0 0 6.5 4h3A1.5 1.5 0 0 0 11 2.5v-1A1.5 1.5 0 0 0 9.5 0z"/&gt;&lt;/svg&gt;
&lt;svg class="icon-check" width="16" height="16" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" focusable="false"&gt;&lt;path d="M13.485 1.929a.75.75 0 0 1 .086 1.057l-7 8.5a.75.75 0 0 1-1.117.063l-3.5-3.5a.75.75 0 0 1 1.06-1.06l2.92 2.92 6.494-7.894a.75.75 0 0 1 1.057-.086z"/&gt;&lt;/svg&gt;
&lt;/button&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="code-card__body"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;kubectl create namespace ml-team&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The device selector matches against the capacity declared in the
&lt;code&gt;InferenceClass&lt;/code&gt;, not the pod&amp;rsquo;s resource requests. Any L4 node satisfies
&lt;code&gt;&amp;gt;= 20Gi&lt;/code&gt;, so this deployment runs on the cluster you just added:&lt;/p&gt;</description></item><item><title>Scale the platform</title><link>https://docs.modelplane.ai/getting-started/scale-the-platform/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/scale-the-platform/</guid><description>&lt;p&gt;You have one L4 cluster with a running model. In this guide, you&amp;rsquo;ll add two
larger-GPU clusters in different regions to grow the fleet available to the ML team.&lt;/p&gt;
&lt;p&gt;Provisioning two more clusters takes about 10 to 15 minutes.&lt;/p&gt;
&lt;h2 id="register-more-clusters"&gt;Register more clusters &lt;a class="anchor-link" id="register-more-clusters" href="#register-more-clusters" aria-label="Link to this section: Register more clusters"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;ul class="nav nav-tabs" id="1011" role="tablist"&gt;
&lt;li class="nav-item" role="presentation"&gt;
&lt;button class="nav-link active "
id="tab-1987"
data-bs-toggle="tab"
data-bs-target="#tab-pane-1987"
type="button"
role="tab"
aria-controls="tab-pane-1987" aria-selected="false"&gt;EKS&lt;/button&gt;
&lt;/li&gt;
&lt;li class="nav-item" role="presentation"&gt;
&lt;button class="nav-link "
id="tab-1632"
data-bs-toggle="tab"
data-bs-target="#tab-pane-1632"
type="button"
role="tab"
aria-controls="tab-pane-1632" aria-selected="false"&gt;GKE&lt;/button&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="tab-content" id="Content-1011"&gt;
&lt;div
class="tab-pane fade active show pt-sm-3"
id="tab-pane-1987"
role="tabpanel"
aria-labelledby="tab-1987"
tabindex="0"&gt;&lt;p&gt;Register two more clusters with a bigger hardware class: &lt;code&gt;L40S&lt;/code&gt; (&lt;code&gt;48 Gi&lt;/code&gt;) in
&lt;code&gt;us-west&lt;/code&gt; and &lt;code&gt;eu-central&lt;/code&gt;:&lt;/p&gt;</description></item><item><title>Scale the model</title><link>https://docs.modelplane.ai/getting-started/scale-the-model/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/scale-the-model/</guid><description>&lt;p&gt;A &lt;code&gt;ModelService&lt;/code&gt; can front more than one &lt;code&gt;ModelDeployment&lt;/code&gt;. Here you add a second
deployment, pinned to a different region, and point the same service at both. The
endpoint you already curled stays the same. Behind it, traffic now load-balances
across two regions.&lt;/p&gt;
&lt;pre class="mermaid"&gt;graph LR
subgraph fleet ["Fleet"]
IC1["us-east\nL4"]
IC2["us-west\nlarger GPU"]
end
subgraph ml ["ML team"]
MD1["ModelDeployment\nqwen-demo"]
MD2["ModelDeployment\nqwen-west\nclusterSelector: us-west"]
MS["ModelService qwen\n/ml-team/qwen/v1/..."]
end
IC1 --&gt; MD1
IC2 --&gt; MD2
MD1 --&gt; MS
MD2 --&gt; MS
&lt;/pre&gt;
&lt;h2 id="deploy-to-a-second-region"&gt;Deploy to a second region &lt;a class="anchor-link" id="deploy-to-a-second-region" href="#deploy-to-a-second-region" aria-label="Link to this section: Deploy to a second region"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The new deployment uses a &lt;code&gt;clusterSelector&lt;/code&gt; to pin its replica to the &lt;code&gt;us-west&lt;/code&gt;
cluster you added in the last step, and selects the larger GPU there:&lt;/p&gt;</description></item><item><title>Clean up</title><link>https://docs.modelplane.ai/getting-started/clean-up/</link><pubDate/><guid>https://docs.modelplane.ai/getting-started/clean-up/</guid><description>&lt;p&gt;Delete the model resources, clusters, and finally the control plane.&lt;/p&gt;
&lt;h2 id="delete-model-resources"&gt;Delete model resources &lt;a class="anchor-link" id="delete-model-resources" href="#delete-model-resources" aria-label="Link to this section: Delete model resources"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Delete model resources before clusters. Deleting a cluster first leaves the
deployments reconciling against infrastructure that no longer exists.&lt;/p&gt;
&lt;div class="code-card"&gt;
&lt;div class="code-card__header"&gt;
&lt;span class="code-card__name"&gt;bash&lt;/span&gt;
&lt;div class="code-card__actions"&gt;
&lt;button class="code-card__btn code-card__copy" type="button" aria-label="Copy contents" title="Copy contents"&gt;
&lt;svg class="icon-main" width="15" height="15" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" focusable="false"&gt;&lt;path d="M4 1.5H3a2 2 0 0 0-2 2V14a2 2 0 0 0 2 2h10a2 2 0 0 0 2-2V3.5a2 2 0 0 0-2-2h-1v1h1a1 1 0 0 1 1 1V14a1 1 0 0 1-1 1H3a1 1 0 0 1-1-1V3.5a1 1 0 0 1 1-1h1z"/&gt;&lt;path d="M9.5 1a.5.5 0 0 1 .5.5v1a.5.5 0 0 1-.5.5h-3a.5.5 0 0 1-.5-.5v-1a.5.5 0 0 1 .5-.5zm-3-1A1.5 1.5 0 0 0 5 1.5v1A1.5 1.5 0 0 0 6.5 4h3A1.5 1.5 0 0 0 11 2.5v-1A1.5 1.5 0 0 0 9.5 0z"/&gt;&lt;/svg&gt;
&lt;svg class="icon-check" width="16" height="16" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" focusable="false"&gt;&lt;path d="M13.485 1.929a.75.75 0 0 1 .086 1.057l-7 8.5a.75.75 0 0 1-1.117.063l-3.5-3.5a.75.75 0 0 1 1.06-1.06l2.92 2.92 6.494-7.894a.75.75 0 0 1 1.057-.086z"/&gt;&lt;/svg&gt;
&lt;/button&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="code-card__body"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;kubectl delete md --all -n ml-team
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;kubectl delete ms --all -n ml-team&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Wait for all model replicas to finish:&lt;/p&gt;</description></item></channel></rss>