<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Overview on Modelplane Docs</title><link>https://docs.modelplane.ai/overview/</link><description>Recent content in Overview on Modelplane Docs</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 0001 00:00:00 +0000</lastBuildDate><atom:link href="https://docs.modelplane.ai/overview/index.xml" rel="self" type="application/rss+xml"/><item><title>Why Modelplane</title><link>https://docs.modelplane.ai/overview/why/</link><pubDate/><guid>https://docs.modelplane.ai/overview/why/</guid><description>&lt;!-- vale write-good.TooWordy = NO --&gt;
&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;Open-weight models are becoming the choice for organizations: they can be
post-trained, including with reinforcement learning, to compete with frontier
models, and they put cost, governance, and data sovereignty back under the
organization&amp;rsquo;s control. As they do, platform teams are
increasingly asked to provide GPU inference to their ML and development teams the
same way they already provide cloud infrastructure.&lt;/p&gt;</description></item><item><title>How Modelplane works</title><link>https://docs.modelplane.ai/overview/how-it-works/</link><pubDate/><guid>https://docs.modelplane.ai/overview/how-it-works/</guid><description>&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;Modelplane runs as a control plane on its own cluster, the &lt;strong&gt;control cluster&lt;/strong&gt;,
above the &lt;strong&gt;inference clusters&lt;/strong&gt; that actually serve models. It&amp;rsquo;s built on
&lt;a href="https://crossplane.io"&gt;Crossplane&lt;/a&gt;: platform teams and developers describe what
they want as Kubernetes resources, and Modelplane continuously reconciles the
fleet to match, composing the clusters, scheduling replicas, and exposing
endpoints. This page is the full tour. It covers the architecture and resources, then walks through what happens when you deploy a model.&lt;/p&gt;</description></item><item><title>FAQ</title><link>https://docs.modelplane.ai/overview/faq/</link><pubDate/><guid>https://docs.modelplane.ai/overview/faq/</guid><description>&lt;!-- vale write-good.TooWordy = NO --&gt;
&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;Short answers to the questions that come up first, with links to the full
treatment. If you&amp;rsquo;re new here, read the &lt;a href="https://docs.modelplane.ai/overview/"&gt;Introduction&lt;/a&gt;
and &lt;a href="https://docs.modelplane.ai/overview/how-it-works/"&gt;How Modelplane works&lt;/a&gt; first.&lt;/p&gt;
&lt;h2 id="what-modelplane-is"&gt;What Modelplane is &lt;a class="anchor-link" id="what-modelplane-is" href="#what-modelplane-is" aria-label="Link to this section: What Modelplane is"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;Is Modelplane a serving engine like vLLM?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
No, Modelplane is the control plane &lt;em&gt;above&lt;/em&gt; the engine. It composes serving
engines like vLLM, SGLang, and NVIDIA TensorRT-LLM, and operates them across a
fleet of clusters. It doesn&amp;rsquo;t serve tokens itself. You bring the engine; Modelplane schedules
it, routes to it, scales it, and caches its weights across your inference fleet.
&lt;/div&gt;
&lt;/details&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;Does Modelplane replace vLLM or SGLang?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
No, they run the model; Modelplane runs the fleet. A &lt;code&gt;ModelDeployment&lt;/code&gt; carries
your engine container and its flags, and Modelplane composes it onto the right
cluster. Switching or upgrading engines is a change to your deployment, not to
Modelplane.
&lt;/div&gt;
&lt;/details&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;How is Modelplane different from KServe or NVIDIA Dynamo?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
Scope. KServe and Dynamo are cluster orchestrators: they schedule, scale, route,
and cache within a single Kubernetes cluster. Modelplane runs its operations across a
fleet of clusters, clouds, and regions. Modelplane uses llm-d for multi-node serving,
and KV-cache management, as do KServe and Dynamo. Modelplane is planning deeper integrations
with NVIDIA Dynamo in future releases.
&lt;/div&gt;
&lt;/details&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;How is Modelplane different from a managed provider like Baseten or Fireworks?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
Managed providers run fleet-scale serving inside their own closed platform.
Modelplane is the open equivalent that runs in infrastructure you own. The
difference is open, in your own infrastructure, community-driven, and neutral
across the stack, not scope. You can still route to a managed provider from Modelplane.
&lt;/div&gt;
&lt;/details&gt;
&lt;h2 id="what-it-supports"&gt;What it supports &lt;a class="anchor-link" id="what-it-supports" href="#what-it-supports" aria-label="Link to this section: What it supports"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;What models does Modelplane support?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
Modelplane supports any model, including open weights, custom models, and just about
anything that can be downloaded from Hugging Face, NVIDIA NGC, and other registries.
&lt;/div&gt;
&lt;/details&gt;
&lt;details class="mp-faq"&gt;
&lt;summary class="mp-faq-q"&gt;Does Modelplane support NVIDIA?&lt;/summary&gt;
&lt;div class="mp-faq-a"&gt;
&lt;p&gt;Yes, across the stack. NVIDIA is the most widely available accelerator on the
clouds Modelplane runs on and the primary target today. Modelplane binds NVIDIA
GPUs to pods through Dynamic Resource Allocation (DRA), matching devices by
attributes such as GPU memory and architecture with CEL selectors.&lt;/p&gt;</description></item><item><title>Glossary</title><link>https://docs.modelplane.ai/overview/glossary/</link><pubDate/><guid>https://docs.modelplane.ai/overview/glossary/</guid><description>&lt;h2 id="modelplane"&gt;Modelplane &lt;a class="anchor-link" id="modelplane" href="#modelplane" aria-label="Link to this section: Modelplane"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The open source control plane software. You install Modelplane on a Kubernetes
cluster (the &lt;strong&gt;control cluster&lt;/strong&gt;). Modelplane never serves tokens itself; it
orchestrates the clusters and engines that do.&lt;/p&gt;
&lt;h2 id="control-cluster"&gt;Control cluster &lt;a class="anchor-link" id="control-cluster" href="#control-cluster" aria-label="Link to this section: Control cluster"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Kubernetes cluster where Modelplane runs. It needs no GPUs. It holds
Modelplane&amp;rsquo;s Crossplane-based components and the API resources you apply to
declare your fleet.&lt;/p&gt;
&lt;h2 id="inference-cluster"&gt;Inference cluster &lt;a class="anchor-link" id="inference-cluster" href="#inference-cluster" aria-label="Link to this section: Inference cluster"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A GPU cluster in the fleet where serving engines run and tokens are produced.
Modelplane can provision inference clusters on EKS, GKE, and other providers, or
you can bring your own through an &lt;code&gt;InferenceCluster&lt;/code&gt; with &lt;code&gt;source: Existing&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>AI tools</title><link>https://docs.modelplane.ai/overview/ai-tools/</link><pubDate/><guid>https://docs.modelplane.ai/overview/ai-tools/</guid><description>&lt;!-- vale write-good.TooWordy = NO --&gt;
&lt;p&gt;The Modelplane docs are built to be read by AI assistants as well as people. You
can connect a coding agent directly to this site, pull any page as Markdown, or
point a model at a single index file that lists the whole documentation set.
Every page also carries a &lt;strong&gt;Copy page&lt;/strong&gt; menu next to its title with the same
shortcuts.&lt;/p&gt;
&lt;h2 id="connect-to-the-mcp-server"&gt;Connect to the MCP server &lt;a class="anchor-link" id="connect-to-the-mcp-server" href="#connect-to-the-mcp-server" aria-label="Link to this section: Connect to the MCP server"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The documentation MCP server lets an assistant search these docs and read any
page in real time, so its answers track the current content instead of its
training data. It exposes two tools:&lt;/p&gt;</description></item></channel></rss>