feat(router): add model router integration with complexity-based routing (#2202)

## Summary - Add NVIDIA LLM Router v3 (prefill-based) as an inference provider, enabling complexity-based routing that automatically picks the most efficient model for each query - Add Model Router as a provider option in the onboard wizard, following the same pattern as Ollama/NIM (onboard starts the service, configures the provider) - Router runs on the host; sandbox reaches it through the OpenShell gateway L7 proxy ## Changes - Add llm-router v3 as a git submodule at nemoclaw-blueprint/router/llm-router/ - Add pool-config.yaml defining the model pool and routing parameters - Add routed inference profile to blueprint.yaml with model name nvidia-routed - Add router startup logic to onboard wizard (startModelRouter in onboard.ts) - Add router schema to schemas/blueprint.schema.json - Remove router lifecycle code from blueprint runner (onboard owns service startup) - Remove model-router-toolkit install from Dockerfile (router is host-only) - Add model-router-toolkit install to scripts/install.sh - Update README with Model Router documentation and architecture ## Verification - [x] npx prek run --all-files passes (hadolint and CLI test failures are pre-existing) - [x] npm test passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes ## Test plan - [x] Unit tests pass (322 plugin tests, 10 files) - [x] nemoclaw onboard --non-interactive with NEMOCLAW_PROVIDER=routed completes all 8 steps - [x] Router starts on port 4000, both pool models healthy - [x] Inference works end-to-end: sandbox to OpenShell gateway to router to NVIDIA API - [x] nvidia-routed model name resolves via model_group_alias in LiteLLM proxy config - [x] Routing strategy injects eagerly at startup (no lazy first-request 400 error) Made with [Cursor](https://cursor.com) Signed-off-by: Vinay Bagade <vbagade@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Optional Model Router: complexity-based routed inference selectable during onboarding; host router runs on port 4000 with configurable model pool and provider mapping; onboarding now starts and monitors the router and persists its process ID. * Installer detects and installs the Model Router when present. * **Documentation** * README updated with Model Router architecture, enablement, and pool configuration guidance. * **Tests** * New tests covering onboarding and routed/blueprint scenarios.  --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Vinay Bagade <vbagade@nvidia.com> Co-authored-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
2026-07-03 03:37:16 +00:00 · 2026-05-06 15:09:30 -07:00 · 2026-05-06 15:09:30 -07:00 · ca1d6b84a5
commit ca1d6b84a5
parent 283ee50d83
14 changed files with 992 additions and 3 deletions
--- a/.agents/skills/nemoclaw-user-triage-instructions/SKILL.md
+++ b/.agents/skills/nemoclaw-user-triage-instructions/SKILL.md
@ -0,0 +1,144 @@
+---
+name: "nemoclaw-user-triage-instructions"
+description: "AI-assisteds label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard."
+---
+
+<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# NemoClaw User Triage Instructions
+
+AI-assisted label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard.
+
+This document is the single source of truth for AI-assisted label triage on NVIDIA/NemoClaw issues and PRs.
+It is read at runtime by the `nemoclaw-maintainer-triage` CLI skill and fetched at generation time by the nvoss-velocity dashboard.
+
+---
+
+## Step 1: Role
+
+You are a GitHub issue and PR labeler for NemoClaw, NVIDIA's open-source agentic AI assistant framework.
+
+For each item:
+
+1. Assign 1–5 labels from the provided list that best match the content. Be thorough — if a bug also involves a specific platform and is a good first issue, assign all applicable labels. Only skip a label if it genuinely does not apply.
+2. Write a short triage comment appropriate to the item's tier (see Comment Tiers below).
+
+---
+
+## Step 2: Output Format
+
+Return ONLY valid JSON — no markdown fences, no explanation:
+
+```json
+{"results": [{"number": 123, "labels": ["bug", "good first issue"], "reason": "One sentence explaining label choices.", "comment": "Comment text."}]}
+```
+
+Fields:
+
+- `number` — the issue or PR number
+- `labels` — array of label names, exactly as provided in the label list
+- `reason` — one concise sentence explaining why these labels apply
+- `comment` — triage comment text (see Comment Tiers)
+
+---
+
+## Step 3: Label Assignment Rules
+
+- Use only label names exactly as provided in the label list
+- Assign 1–5 labels per item — apply every label that genuinely fits
+- If a specific `enhancement: *` sub-label is assigned, do NOT also assign the bare `enhancement` label — the sub-label is sufficient
+- If genuinely unclear, assign `question`
+
+---
+
+## Step 4: Skip Labels
+
+Never assign these — they require human judgment:
+
+- `duplicate`
+- `invalid`
+- `wontfix`
+- `priority: medium`
+- `priority: low`
+- `status: triage`
+- `NV QA`
+
+`priority: high` is allowed ONLY when the issue clearly blocks critical functionality, causes data loss, or describes a production outage — not based on the author's frustration or urgency language alone.
+
+---
+
+## Step 5: Label Guide
+
+Use these descriptions to match labels to issue/PR content:
+
+- `bug`: User reports something broken — unexpected error, crash, exception, traceback, "not working", "fails", "broken", unexpected behavior
+- `enhancement`: Generic enhancement — use only if none of the specific `enhancement: *` sub-types clearly apply
+- `enhancement: feature`: Request for a new capability — "would be great if", "feature request", "add support for", "please add"
+- `enhancement: inference`: Inference routing, model support, provider configuration
+- `enhancement: security`: Security controls, policies, audit logging
+- `enhancement: policy`: Network policy, egress rules, sandbox policy
+- `enhancement: ui`: CLI UX, output formatting, terminal display
+- `enhancement: platform`: Cross-platform support (pair with a `Platform: *` label)
+- `enhancement: provider`: Cloud or inference provider support (pair with a `Provider: *` label)
+- `enhancement: performance`: Speed, resource usage, memory, latency
+- `enhancement: reliability`: Stability, error handling, recovery, retries
+- `enhancement: testing`: Test coverage, CI/CD quality, test infrastructure
+- `enhancement: MCP`: MCP protocol support, tool integration
+- `enhancement: CI/CD`: Pipeline, build system, automation
+- `enhancement: documentation`: Docs improvements, examples, guides
+- `question`: Asking how to do something — "how do I", "is it possible", "does X support"
+- `documentation`: Missing or incorrect docs, README errors, API doc gaps
+- `good first issue`: Small well-scoped fix, doc typo, clear simple change — easy entry point for new contributors
+- `help wanted`: Clear fix or improvement that needs a community contribution
+- `security`: Auth issues, API key exposure, CVE, vulnerability, unauthorized access
+- `status: needs-info`: Issue or PR has no description, no reproduction steps, or so little detail the team cannot act on it
+- `priority: high`: Issue blocks critical functionality, causes data loss, or describes a production outage — apply only when the report clearly describes severe, reproducible impact
+- `Platform: MacOS`: Issue specific to macOS, Mac OS X, or Apple Silicon (M1/M2/M3/M4). Apply when the user mentions macOS, Darwin, Homebrew, or Mac-specific behavior
+- `Platform: Windows`: Issue specific to Windows OS. Apply when the user mentions Windows, Win32, PowerShell, WSL, or Windows-specific errors
+- `Platform: Linux`: Issue specific to Linux. Apply when the user mentions a Linux distro (Ubuntu, CentOS, RHEL, Debian, etc.) or Linux-specific behavior
+- `Platform: DGX Spark`: Issue specific to DGX Spark hardware or software environment
+- `Platform: Brev`: Issue specific to the Brev.dev cloud environment
+- `Platform: ARM64`: Issue specific to ARM64 / aarch64 architecture
+- `Integration: Slack`: Issue or feature involving the Slack integration or Slack bridge
+- `Integration: Discord`: Issue or feature involving the Discord integration
+- `Integration: Telegram`: Issue or feature involving the Telegram integration
+- `Integration: GitHub`: Issue or feature involving GitHub-specific behavior (not the repo itself)
+- `Provider: NVIDIA`: Issue or feature specific to NVIDIA inference endpoints or NIM
+- `Provider: OpenAI`: Issue or feature specific to OpenAI API or models
+- `Provider: Anthropic`: Issue or feature specific to Anthropic / Claude models
+- `Provider: Azure`: Issue or feature specific to Azure OpenAI or Azure cloud
+- `Provider: AWS`: Issue or feature specific to AWS Bedrock or AWS cloud
+- `Provider: GCP`: Issue or feature specific to Google Cloud / Vertex AI
+
+---
+
+## Step 6: Comment Tiers
+
+Items are classified as `quality_tier` or `standard_tier` before generation. This is passed in the item metadata.
+
+- **quality_tier** (influencer author, company-affiliated author, or body > 800 chars): Write 2–3 sentences. Start with "Thanks," then naturally reference specific details from the body. Avoid "I've taken a look at", "I've reviewed", "it appears to", "I can see that" — these sound bot-generated. Write like a human maintainer giving a warm, specific response.
+- **standard_tier**: Write 1 sentence acknowledging the report and mentioning the labels applied.
+
+---
+
+## Step 7: Tone Rules (strictly enforced)
+
+- Use "could" not "should"; use "may" not "will" — this is a first response, not a commitment
+- Never say "Thanks for fixing" — say "Thanks for the proposed fix" or "Thanks for submitting this"
+- Never say "Thanks for adding" — say "Thanks for the suggested addition"
+- Never claim the submission accomplishes something before review
+- Do not say "I'll" or "we'll"
+- For issues (bugs, questions, enhancements): use "this identifies a..." or "this reports a..."
+- For PRs: use "this proposes a way to..."
+- For security-related items: never confirm a vulnerability is real; use neutral language
+- Do NOT open with praise about detail or thoroughness. Only reference the quality of the report if the body is genuinely exceptional — multiple reproduction steps, version info, logs, and clear expected vs actual behavior. For most reports, skip the praise entirely and go straight to the triage acknowledgment.
+- Do not add generic closing filler phrases
+- If a "Spam signal:" line is present in the item metadata, assign only `status: needs-info` and ask for more detail politely
+- If a "Note: Author also opened..." line is present, briefly acknowledge if the relationship is plausible
+
+---
+
+## Related Skills
+
+- `nemoclaw-user-skills-coding` — Agent Skills — all available maintainer and user skills
--- a/.gitmodules
+++ b/.gitmodules
@ -0,0 +1,4 @@
+[submodule "nemoclaw-blueprint/router/llm-router"]
+	path = nemoclaw-blueprint/router/llm-router
+	url = https://github.com/NVIDIA-AI-Blueprints/llm-router.git
+	branch = v3
--- a/README.md
+++ b/README.md
@ -138,6 +138,58 @@ Alternatively, send a single message and print the response:
 openclaw agent --agent main --local -m "hello" --session-id test
 ```

+### Model Router (Complexity-Based Routing)
+
+NemoClaw includes an optional model router that automatically picks the most efficient model for each query. Instead of sending every request to a single large model, the router uses a lightweight encoder to predict which model in a pool can handle each query correctly, then routes to the cheapest one that meets an accuracy threshold.
+
+The router uses the [NVIDIA LLM Router v3](https://github.com/NVIDIA-AI-Blueprints/llm-router/tree/v3) prefill routing engine and runs on the host as a LiteLLM proxy. The sandbox reaches it through the OpenShell gateway and continues to call `https://inference.local/v1`; do not probe `localhost:4000` or `host.openshell.internal` directly from inside the sandbox.
+
+#### Enable during onboard
+
+Select **Model Router (complexity-based routing)** during the onboard wizard, or set `NEMOCLAW_PROVIDER=routed` for non-interactive mode:
+
+```bash
+NEMOCLAW_PROVIDER=routed nemoclaw onboard --non-interactive
+```
+
+The onboard wizard starts the router, configures the OpenShell provider, and creates the sandbox. The router process runs on the host on port 4000.
+
+#### Configure the model pool
+
+Edit `nemoclaw-blueprint/router/pool-config.yaml` to define which models the router can choose from:
+
+```yaml
+routing:
+  method: prefill
+  checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
+  tolerance: 0.20
+  encoder: Qwen/Qwen3.5-0.8B
+
+models:
+  - name: nano
+    litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
+    cost_per_m_input_tokens: 0.05
+    api_base: "https://inference-api.nvidia.com"
+
+  - name: super
+    litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
+    cost_per_m_input_tokens: 0.10
+    api_base: "https://inference-api.nvidia.com"
+```
+
+The `tolerance` parameter controls the accuracy-cost tradeoff: 0.0 always picks the most accurate model, 1.0 always picks the cheapest, and 0.20 (default) allows up to 20 percentage points below the best for a cheaper model.
+
+#### Architecture
+
+The router runs on the host, not inside the sandbox:
+
+```text
+Sandbox (OpenClaw) ──> OpenShell Gateway (L7 proxy) ──> Model Router (:4000) ──> NVIDIA API
+                                                         └── PrefillRouter selects model
+```
+
+Credentials flow through the OpenShell provider system. The sandbox never sees raw API keys.
+
 ### Uninstall

 To remove NemoClaw and all resources created during setup, run the CLI's built-in uninstall command:
@ -197,6 +249,9 @@ NemoClaw/
 │       ├── commands/     # Slash commands, migration state
 │       └── onboard/      # Onboarding config
 ├── nemoclaw-blueprint/   # Blueprint YAML and network policies
+│   └── router/
+│       ├── pool-config.yaml  # Model pool and routing config
+│       └── llm-router/      # LLM Router v3 submodule (prefill routing engine)
 ├── scripts/          # Install helpers, setup, automation
 ├── test/             # Integration and E2E tests
 └── docs/             # User-facing docs (Sphinx/MyST)
--- a/nemoclaw-blueprint/blueprint.yaml
+++ b/nemoclaw-blueprint/blueprint.yaml
@ -18,6 +18,7 @@ profiles:
  - ncp
  - nim-local
  - vllm
+  - routed

 description: |
  NemoClaw blueprint: orchestrates OpenClaw sandbox creation, migration,
@ -70,6 +71,19 @@ components:
        credential_default: "dummy"
        timeout_secs: 180

+      routed:
+        provider_type: "openai"
+        provider_name: "nvidia-router"
+        endpoint: "http://localhost:4000/v1"
+        model: "nvidia-routed"
+        credential_env: "NVIDIA_API_KEY"
+        timeout_secs: 180
+
+  router:
+    enabled: true
+    port: 4000
+    pool_config_path: "router/pool-config.yaml"
+
  policy:
    base: "sandboxes/openclaw/policy.yaml"
    additions:
--- a/nemoclaw-blueprint/router/llm-router
+++ b/nemoclaw-blueprint/router/llm-router
@ -0,0 +1 @@
+Subproject commit 2bd8dfaa751efb60aa4e7e49b270490dfbc0a68a
--- a/nemoclaw-blueprint/router/pool-config.yaml
+++ b/nemoclaw-blueprint/router/pool-config.yaml
@ -0,0 +1,36 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Model Router Toolkit pool config for complexity-based routing.
+# Uses the NVIDIA LLM Router v3 (https://github.com/NVIDIA-AI-Blueprints/llm-router)
+# to route requests to the most efficient model that meets an accuracy threshold.
+#
+# The router runs an encoder (Qwen3.5-0.8B) on each query to predict P(correct)
+# per model, then selects the cheapest model above the tolerance threshold.
+#
+# tolerance controls the accuracy-cost tradeoff:
+#   0.0  = always pick the highest-confidence model (most expensive)
+#   0.20 = allow up to 20pp below the best for a cheaper model (default)
+#   1.0  = always pick the cheapest model
+
+routing:
+  method: prefill
+  checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
+  tolerance: 0.20
+  encoder: Qwen/Qwen3.5-0.8B
+  encoder_backend: transformers
+
+models:
+  - name: nemotron-3-nano-reasoning
+    display_name: "Nemotron 3 Nano (Reasoning)"
+    litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
+    cost_per_m_input_tokens: 0.05
+    cost_per_m_output_tokens: 0.20
+    api_base: "https://inference-api.nvidia.com"
+
+  - name: nemotron-3-super
+    display_name: "Nemotron 3 Super 120B"
+    litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
+    cost_per_m_input_tokens: 0.10
+    cost_per_m_output_tokens: 0.40
+    api_base: "https://inference-api.nvidia.com"
--- a/nemoclaw/src/blueprint/runner.test.ts
+++ b/nemoclaw/src/blueprint/runner.test.ts
@ -122,6 +122,38 @@ function minimalBlueprint(overrides?: Record<string, unknown>): Record<string, u
  };
 }

+function routedBlueprint(): Record<string, unknown> {
+  return {
+    version: "1.0",
+    components: {
+      inference: {
+        profiles: {
+          routed: {
+            provider_type: "openai",
+            provider_name: "nvidia-router",
+            endpoint: "http://localhost:4000/v1",
+            model: "routed",
+            credential_env: "NVIDIA_API_KEY",
+            credential_default: "router-local",
+            timeout_secs: 180,
+          },
+        },
+      },
+      sandbox: {
+        image: "openclaw",
+        name: "test-sandbox",
+        forward_ports: [18789],
+      },
+      router: {
+        enabled: true,
+        port: 4000,
+        pool_config_path: "router/pool-config.yaml",
+      },
+      policy: { additions: {} },
+    },
+  };
+}
+
 function seedBlueprintFile(bp?: Record<string, unknown>): void {
  addFile("blueprint.yaml", YAML.stringify(bp ?? minimalBlueprint()));
 }
@ -378,6 +410,25 @@ describe("runner", () => {
      expect(out).toContain("PROGRESS:10:Validating blueprint");
      expect(out).toContain("PROGRESS:100:Plan complete");
    });
+
+    it("includes router info when router is enabled", async () => {
+      captureStdout();
+      mockExeca.mockResolvedValue({ exitCode: 0 });
+
+      const plan = await actionPlan("routed", routedBlueprint());
+      expect(plan.router.enabled).toBe(true);
+      expect(plan.router.port).toBe(4000);
+      expect(plan.router.pool_config_path).toBe("router/pool-config.yaml");
+    });
+
+    it("defaults router to disabled when not in blueprint", async () => {
+      captureStdout();
+      mockExeca.mockResolvedValue({ exitCode: 0 });
+
+      const plan = await actionPlan("default", minimalBlueprint());
+      expect(plan.router.enabled).toBe(false);
+      expect(plan.router.port).toBe(4000);
+    });
  });

  describe("actionApply", () => {
@ -679,6 +730,24 @@ describe("runner", () => {
      if (!inferenceCall) throw new Error("inference set call not found");
      expect(inferenceCall[1]).not.toContain("--timeout");
    });
+
+    it("passes endpoint as-is from blueprint (no rewriting)", async () => {
+      process.env.NVIDIA_API_KEY = "test-key";
+      try {
+        await actionApply("routed", routedBlueprint());
+
+        const providerCall = mockExeca.mock.calls.find(
+          (c) => Array.isArray(c[1]) && c[1].includes("provider"),
+        );
+        if (!providerCall) throw new Error("provider create call not found");
+        const configArg = (providerCall[1] as string[]).find((a: string) =>
+          a.startsWith("OPENAI_BASE_URL="),
+        );
+        expect(configArg).toBe("OPENAI_BASE_URL=http://localhost:4000/v1");
+      } finally {
+        delete process.env.NVIDIA_API_KEY;
+      }
+    });
  });

  describe("actionStatus", () => {
--- a/nemoclaw/src/blueprint/runner.ts
+++ b/nemoclaw/src/blueprint/runner.ts
@ -50,6 +50,10 @@ function isOptionalFiniteNumber(value: unknown): value is number | undefined {
  return value === undefined || (typeof value === "number" && Number.isFinite(value));
 }

+function isOptionalBoolean(value: unknown): value is boolean | undefined {
+  return value === undefined || typeof value === "boolean";
+}
+
 function isValidPort(value: unknown): value is number {
  return typeof value === "number" && Number.isInteger(value) && value >= 1 && value <= 65535;
 }
@ -142,6 +146,20 @@ function isBlueprint(value: unknown): value is Blueprint {
    }
  }

+  const router = components.router;
+  if (router !== undefined) {
+    if (!isObjectLike(router)) {
+      return false;
+    }
+    if (
+      !isOptionalBoolean(router.enabled) ||
+      !(router.port === undefined || isValidPort(router.port)) ||
+      !isOptionalString(router.pool_config_path)
+    ) {
+      return false;
+    }
+  }
+
  const policy = components.policy;
  if (policy !== undefined) {
    if (!isObjectLike(policy)) {
@ -199,6 +217,7 @@ interface Blueprint {
      profiles?: InferenceProfileMap;
    };
    sandbox?: SandboxConfig;
+    router?: RouterConfig;
    policy?: {
      additions?: PolicyAdditions;
    };
@ -221,6 +240,14 @@ interface SandboxConfig {
  forward_ports?: number[];
 }

+interface RouterConfig {
+  enabled?: boolean;
+  port?: number;
+  pool_config_path?: string;
+}
+
+const DEFAULT_ROUTER_PORT = 4000;
+
 export function loadBlueprint(): Blueprint {
  const blueprintPath = process.env.NEMOCLAW_BLUEPRINT_PATH ?? ".";
  const bpFile = join(blueprintPath, "blueprint.yaml");
@ -272,6 +299,7 @@ async function resolveRunConfig(
  inferenceProfiles: InferenceProfileMap;
  inferenceCfg: InferenceProfile;
  sandboxCfg: SandboxConfig;
+  routerCfg: RouterConfig;
 }> {
  const inferenceProfiles = blueprint.components?.inference?.profiles ?? {};
  if (!(profile in inferenceProfiles)) {
@ -297,7 +325,8 @@ async function resolveRunConfig(
  }

  const sandboxCfg = blueprint.components?.sandbox ?? {};
-  return { inferenceProfiles, inferenceCfg, sandboxCfg };
+  const routerCfg = blueprint.components?.router ?? {};
+  return { inferenceProfiles, inferenceCfg, sandboxCfg, routerCfg };
 }

 // ── Actions ─────────────────────────────────────────────────────
@ -317,6 +346,11 @@ export interface RunPlan {
    model: string | undefined;
    credential_env: string | undefined;
  };
+  router: {
+    enabled: boolean;
+    port: number;
+    pool_config_path: string | undefined;
+  };
  policy_additions: PolicyAdditions;
  dry_run: boolean;
 }
@ -329,7 +363,7 @@ export async function actionPlan(
  const rid = emitRunId();
  progress(10, "Validating blueprint");

-  const { inferenceCfg, sandboxCfg } = await resolveRunConfig(
+  const { inferenceCfg, sandboxCfg, routerCfg } = await resolveRunConfig(
    profile,
    blueprint,
    options?.endpointUrl,
@ -342,6 +376,9 @@ export async function actionPlan(
    );
  }

+  const routerEnabled = routerCfg.enabled === true;
+  const routerPort = routerCfg.port ?? DEFAULT_ROUTER_PORT;
+
  const plan: RunPlan = {
    run_id: rid,
    profile,
@ -357,6 +394,11 @@ export async function actionPlan(
      model: inferenceCfg.model,
      credential_env: inferenceCfg.credential_env,
    },
+    router: {
+      enabled: routerEnabled,
+      port: routerPort,
+      pool_config_path: routerCfg.pool_config_path,
+    },
    policy_additions: blueprint.components?.policy?.additions ?? {},
    dry_run: options?.dryRun ?? false,
  };
--- a/schemas/blueprint.schema.json
+++ b/schemas/blueprint.schema.json
@ -79,6 +79,27 @@
            }
          }
        },
+        "router": {
+          "type": "object",
+          "description": "Model router configuration for complexity-based routing.",
+          "additionalProperties": false,
+          "properties": {
+            "enabled": {
+              "type": "boolean",
+              "description": "Whether the model router is active."
+            },
+            "port": {
+              "type": "integer",
+              "minimum": 1,
+              "maximum": 65535,
+              "description": "Port the router proxy listens on."
+            },
+            "pool_config_path": {
+              "type": "string",
+              "description": "Path to the router pool config YAML relative to the blueprint directory."
+            }
+          }
+        },
        "policy": {
          "type": "object",
          "required": ["base"],
--- a/scripts/install.sh
+++ b/scripts/install.sh
@ -1320,6 +1320,44 @@ is_source_checkout() {
  return 1
 }

+init_nemoclaw_submodules() {
+  local root="$1"
+  [[ -f "$root/.gitmodules" ]] || return 0
+  git -C "$root" rev-parse --git-dir >/dev/null 2>&1 || return 0
+  git -C "$root" submodule update --init --depth 1 2>/dev/null
+}
+
+is_routed_provider_requested() {
+  local provider="${NEMOCLAW_PROVIDER:-}"
+  provider="$(printf '%s' "$provider" | tr '[:upper:]' '[:lower:]')"
+  [[ "$provider" == "routed" ]]
+}
+
+install_model_router_if_present() {
+  local root="$1"
+  local router_dir="$root/nemoclaw-blueprint/router/llm-router"
+
+  if ! command_exists pip3; then
+    is_routed_provider_requested && error "pip3 is required for routed inference."
+    return 0
+  fi
+  if [[ ! -d "$router_dir" ]]; then
+    is_routed_provider_requested && error "llm-router is required for routed inference but is missing."
+    return 0
+  fi
+  if [[ ! -f "$router_dir/pyproject.toml" && ! -f "$router_dir/setup.py" ]]; then
+    is_routed_provider_requested && error "llm-router is required for routed inference but is not initialized."
+    warn "Skipping model router install — llm-router submodule is not initialized."
+    return 0
+  fi
+  if ! spin "Installing model router" pip3 install --quiet --user "${router_dir}[prefill,proxy]"; then
+    if is_routed_provider_requested; then
+      error "pip3 install of llm-router failed"
+    fi
+    warn "Skipping model router install — pip3 install failed"
+  fi
+}
+
 install_nemoclaw() {
  command_exists git || error "git was not found on PATH."
  local repo_root package_json
@ -1335,9 +1373,14 @@ install_nemoclaw() {
      spin "Preparing OpenClaw package" bash -c "$(declare -f info warn resolve_openclaw_version pre_extract_openclaw); pre_extract_openclaw \"\$1\"" _ "$NEMOCLAW_SOURCE_ROOT" \
        || warn "Pre-extraction failed — npm install may fail if openclaw tarball is broken"
    fi
+    if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$NEMOCLAW_SOURCE_ROOT"; then
+      is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
+      warn "Submodule initialization failed — model router support may be unavailable"
+    fi
    spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm install --ignore-scripts"
    spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm run --if-present build:cli"
    spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\"/nemoclaw && npm install --ignore-scripts && npm run build"
+    install_model_router_if_present "$NEMOCLAW_SOURCE_ROOT"
    spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm link"
  else
    if [[ -f "$package_json" ]]; then
@ -1356,6 +1399,10 @@ install_nemoclaw() {
    mkdir -p "$(dirname "$nemoclaw_src")"
    NEMOCLAW_SOURCE_ROOT="$nemoclaw_src"
    spin "Cloning ${_CLI_DISPLAY} source" clone_nemoclaw_ref "$release_ref" "$nemoclaw_src"
+    if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$nemoclaw_src"; then
+      is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
+      warn "Submodule initialization failed — model router support may be unavailable"
+    fi
    # Fetch version tags into the shallow clone so `git describe --tags
    # --match "v*"` works at runtime (the shallow clone only has the
    # single ref we asked for).
@ -1371,6 +1418,7 @@ install_nemoclaw() {
    spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$nemoclaw_src\" && npm install --ignore-scripts"
    spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$nemoclaw_src\" && npm run --if-present build:cli"
    spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$nemoclaw_src\"/nemoclaw && npm install --ignore-scripts && npm run build"
+    install_model_router_if_present "$nemoclaw_src"
    spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$nemoclaw_src\" && npm link"

    # Install/upgrade the OpenShell CLI on the GitHub-clone path (curl|bash).
--- a/src/lib/onboard-providers.ts
+++ b/src/lib/onboard-providers.ts
@ -119,6 +119,8 @@ function getProviderLabel(provider) {
  switch (provider) {
    case "nvidia-nim":
      return "NVIDIA Endpoints";
+    case "nvidia-router":
+      return "Model Router";
    case "vllm-local":
      return "Local vLLM";
    case "ollama-local":
@ -142,6 +144,8 @@ function getEffectiveProviderName(providerKey) {
      return "ollama-local";
    case "vllm":
      return "vllm-local";
+    case "routed":
+      return "nvidia-router";
    default:
      return providerKey;
  }
@ -169,6 +173,7 @@ function getNonInteractiveProvider() {
    "custom",
    "nim-local",
    "vllm",
+    "routed",
    "install-vllm",
    "install-ollama",
    "install-windows-ollama",
@ -177,7 +182,7 @@ function getNonInteractiveProvider() {
  if (!validProviders.has(normalized)) {
    console.error(`  Unsupported NEMOCLAW_PROVIDER: ${providerKey}`);
    console.error(
-      "  Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
+      "  Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, routed, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
    );
    process.exit(1);
  }
@ -340,6 +345,10 @@ function getSandboxInferenceConfig(
        supportsStore: false,
      };
      break;
+    case "nvidia-router":
+      providerKey = "inference";
+      primaryModelRef = `inference/${model}`;
+      break;
    case "nvidia-prod":
    case "nvidia-nim":
    default:
--- a/src/lib/onboard-session.ts
+++ b/src/lib/onboard-session.ts
@ -74,6 +74,8 @@ export interface Session {
  credentialEnv: string | null;
  preferredInferenceApi: string | null;
  nimContainer: string | null;
+  routerPid: number | null;
+  routerCredentialHash: string | null;
  webSearchConfig: WebSearchConfig | null;
  policyPresets: string[] | null;
  messagingChannels: string[] | null;
@ -122,6 +124,8 @@ export interface SessionUpdates {
  credentialEnv?: string;
  preferredInferenceApi?: string;
  nimContainer?: string;
+  routerPid?: number;
+  routerCredentialHash?: string;
  webSearchConfig?: WebSearchConfig | null;
  policyPresets?: string[];
  messagingChannels?: string[];
@ -189,6 +193,10 @@ function readString(value: SessionJsonValue | undefined): string | null {
  return typeof value === "string" ? value : null;
 }

+function readPositiveInteger(value: SessionJsonValue | undefined): number | null {
+  return typeof value === "number" && Number.isInteger(value) && value > 0 ? value : null;
+}
+
 function readStringArray(value: SessionJsonValue | undefined): string[] | null {
  if (!Array.isArray(value)) return null;
  return value.filter((entry): entry is string => typeof entry === "string");
@ -297,6 +305,8 @@ export function createSession(overrides: Partial<Session> = {}): Session {
    credentialEnv: overrides.credentialEnv ?? null,
    preferredInferenceApi: overrides.preferredInferenceApi ?? null,
    nimContainer: overrides.nimContainer ?? null,
+    routerPid: readPositiveInteger(overrides.routerPid),
+    routerCredentialHash: overrides.routerCredentialHash ?? null,
    webSearchConfig:
      overrides.webSearchConfig?.fetchEnabled === true ? { fetchEnabled: true } : null,
    policyPresets: readStringArray(overrides.policyPresets),
@ -333,6 +343,8 @@ export function normalizeSession(data: Session | SessionJsonValue | undefined):
    credentialEnv: readString(data.credentialEnv),
    preferredInferenceApi: readString(data.preferredInferenceApi),
    nimContainer: readString(data.nimContainer),
+    routerPid: readPositiveInteger(data.routerPid),
+    routerCredentialHash: readString(data.routerCredentialHash),
    webSearchConfig: parseWebSearchConfig(data.webSearchConfig),
    policyPresets: readStringArray(data.policyPresets),
    messagingChannels: readStringArray(data.messagingChannels),
@ -692,6 +704,12 @@ export function filterSafeUpdates(updates: SessionUpdates): Partial<Session> {
  if (typeof updates.preferredInferenceApi === "string")
    safe.preferredInferenceApi = updates.preferredInferenceApi;
  if (typeof updates.nimContainer === "string") safe.nimContainer = updates.nimContainer;
+  if (typeof updates.routerPid === "number" && Number.isInteger(updates.routerPid) && updates.routerPid > 0) {
+    safe.routerPid = updates.routerPid;
+  }
+  if (typeof updates.routerCredentialHash === "string") {
+    safe.routerCredentialHash = updates.routerCredentialHash;
+  }
  if (isObject(updates.webSearchConfig) && updates.webSearchConfig.fetchEnabled === true) {
    safe.webSearchConfig = { fetchEnabled: true };
  } else if (updates.webSearchConfig === null) {
--- a/src/lib/onboard.ts
+++ b/src/lib/onboard.ts
@ -770,6 +770,281 @@ function getBlueprintMaxOpenshellVersion(rootDir = ROOT): string | null {
  return getBlueprintVersionField("max_openshell_version", rootDir);
 }

+/**
+ * Load a named inference profile and router config from blueprint.yaml.
+ * Returns null if the blueprint or profile is missing.
+ */
+type BlueprintRouterConfig = {
+  enabled?: boolean;
+  port?: number;
+  pool_config_path?: string;
+  credential_env?: string;
+};
+
+type BlueprintInferenceProfile = {
+  provider_name?: string;
+  endpoint?: string;
+  model: string;
+  credential_env?: string;
+  credential_default?: string;
+  router: BlueprintRouterConfig;
+};
+
+function loadBlueprintProfile(
+  profileName: string,
+  rootDir: string = ROOT,
+): BlueprintInferenceProfile | null {
+  try {
+    const YAML = require("yaml");
+    const blueprintPath = path.join(rootDir, "nemoclaw-blueprint", "blueprint.yaml");
+    if (!fs.existsSync(blueprintPath)) return null;
+    const raw = fs.readFileSync(blueprintPath, "utf8");
+    const parsed = YAML.parse(raw);
+    const profile = parsed?.components?.inference?.profiles?.[profileName];
+    if (!profile) return null;
+    const router = { ...(parsed?.components?.router || {}) };
+    if (typeof profile.credential_env === "string" && profile.credential_env.trim().length > 0) {
+      router.credential_env = profile.credential_env;
+    }
+    return { ...profile, router } as BlueprintInferenceProfile;
+  } catch {
+    return null;
+  }
+}
+
+const ROUTER_HEALTH_RETRIES = 15;
+const ROUTER_HEALTH_INTERVAL_MS = 2000;
+const ROUTER_HEALTH_TIMEOUT_MS = 3000;
+
+async function isRouterHealthy(port: number, timeoutMs = ROUTER_HEALTH_TIMEOUT_MS): Promise<boolean> {
+  const http = require("http");
+  return new Promise<boolean>((resolve) => {
+    let settled = false;
+    const settle = (healthy: boolean) => {
+      if (settled) return;
+      settled = true;
+      resolve(healthy);
+    };
+    const request = http
+      .get(`http://127.0.0.1:${port}/health`, (res: import("node:http").IncomingMessage) => {
+        res.resume();
+        settle((res.statusCode || 0) >= 200 && (res.statusCode || 0) < 300);
+      })
+      .on("error", () => settle(false));
+    request.setTimeout(timeoutMs, () => {
+      request.destroy();
+      settle(false);
+    });
+  });
+}
+
+function isProcessRunning(pid: number | null | undefined): boolean {
+  if (!Number.isInteger(pid) || Number(pid) <= 0) return false;
+  try {
+    process.kill(Number(pid), 0);
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+async function stopModelRouterProcess(pid: number, port: number): Promise<void> {
+  try {
+    process.kill(pid, "SIGTERM");
+  } catch {
+    return;
+  }
+  for (let attempt = 0; attempt < 10; attempt++) {
+    await new Promise((resolve) => setTimeout(resolve, 500));
+    if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
+  }
+  try {
+    process.kill(pid, "SIGKILL");
+  } catch {
+    // already stopped
+  }
+  for (let attempt = 0; attempt < 5; attempt++) {
+    await new Promise((resolve) => setTimeout(resolve, 500));
+    if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
+  }
+}
+
+/**
+ * Start the model-router proxy and wait for it to become healthy.
+ * Follows the same pattern as Ollama startup (spawn detached, poll health).
+ * Returns the PID of the child process.
+ */
+async function startModelRouter(routerCfg: BlueprintRouterConfig): Promise<number> {
+  const port = routerCfg.port || 4000;
+  const blueprintDir = path.join(ROOT, "nemoclaw-blueprint");
+  const poolConfigPath = path.join(
+    blueprintDir,
+    routerCfg.pool_config_path || "router/pool-config.yaml",
+  );
+  const stateDir = path.join(os.homedir(), ".nemoclaw", "state");
+  const litellmConfigPath = path.join(stateDir, "litellm-proxy.yaml");
+
+  fs.mkdirSync(stateDir, { recursive: true });
+
+  const proxyConfigResult = spawnSync(
+    "model-router",
+    ["proxy-config", "--config", poolConfigPath, "--output", litellmConfigPath],
+    { encoding: "utf8", timeout: 30_000, cwd: blueprintDir },
+  );
+  if (proxyConfigResult.status !== 0) {
+    throw new Error(
+      `model-router proxy-config failed: ${proxyConfigResult.stderr || proxyConfigResult.error || "unknown error"}`,
+    );
+  }
+
+  const { buildSubprocessEnv } = require("./subprocess-env");
+  const credEnvVars: Record<string, string> = {};
+  const credName = routerCfg.credential_env || "NVIDIA_API_KEY";
+  const routedCredential = resolveProviderCredential(credName);
+  const openAiCredential = resolveProviderCredential("OPENAI_API_KEY");
+  if (routedCredential) {
+    credEnvVars[credName] = routedCredential;
+    if (!openAiCredential) credEnvVars.OPENAI_API_KEY = routedCredential;
+  }
+  if (openAiCredential) credEnvVars.OPENAI_API_KEY = openAiCredential;
+  const _providerKey = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
+  if (_providerKey) {
+    if (!credEnvVars[credName]) credEnvVars[credName] = _providerKey;
+    if (!credEnvVars.OPENAI_API_KEY) credEnvVars.OPENAI_API_KEY = _providerKey;
+  }
+
+  if (await isRouterHealthy(port)) {
+    throw new Error(
+      `Port ${port} already has a healthy router endpoint; refusing to start a second router.`,
+    );
+  }
+
+  const child = spawn(
+    "model-router",
+    [
+      "proxy",
+      "--litellm-config", litellmConfigPath,
+      "--router-config", poolConfigPath,
+      "--host", "0.0.0.0",
+      "--port", String(port),
+    ],
+    {
+      detached: true,
+      stdio: "ignore",
+      cwd: blueprintDir,
+      env: buildSubprocessEnv(credEnvVars),
+    },
+  );
+  let childExited = false;
+  let childExitDetail = "";
+  child.once("error", (err: Error) => {
+    childExited = true;
+    childExitDetail = `child failed to start: ${err.message}`;
+  });
+  child.once("exit", (code: number | null, signal: string | null) => {
+    childExited = true;
+    if (!childExitDetail) {
+      childExitDetail = `child exited with code ${code ?? "null"}${signal ? ` signal ${signal}` : ""}`;
+    }
+  });
+  child.unref();
+
+  const pid = child.pid;
+  if (!pid) {
+    throw new Error(
+      "Failed to start model-router proxy: no PID returned" +
+        (childExitDetail ? ` (${childExitDetail})` : ""),
+    );
+  }
+
+  for (let attempt = 0; attempt < ROUTER_HEALTH_RETRIES; attempt++) {
+    await new Promise((resolve) => setTimeout(resolve, ROUTER_HEALTH_INTERVAL_MS));
+    if (childExited) break;
+    const healthy = await isRouterHealthy(port);
+    let processAlive = true;
+    try {
+      process.kill(pid, 0);
+    } catch {
+      processAlive = false;
+    }
+    if (healthy && processAlive) return pid;
+    if (!processAlive) {
+      childExited = true;
+      if (!childExitDetail) childExitDetail = "child process is no longer running";
+      break;
+    }
+  }
+  try {
+    process.kill(pid, "SIGTERM");
+  } catch {
+    // already dead
+  }
+  throw new Error(
+    `Model router failed to become healthy on port ${port} after ${ROUTER_HEALTH_RETRIES} attempts` +
+      (childExitDetail ? ` (${childExitDetail})` : ""),
+  );
+}
+
+function getRoutedProfile(): BlueprintInferenceProfile {
+  const bp = loadBlueprintProfile("routed");
+  if (!bp || bp.router?.enabled !== true) {
+    throw new Error("Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
+  }
+  return bp;
+}
+
+function isRoutedInferenceProvider(provider: string | null | undefined): boolean {
+  if (!provider) return false;
+  if (provider === "nvidia-router") return true;
+  const bp = loadBlueprintProfile("routed");
+  return Boolean(bp?.provider_name && provider === bp.provider_name);
+}
+
+async function reconcileModelRouter(): Promise<void> {
+  const bp = getRoutedProfile();
+  const routerPort = bp.router.port || 4000;
+  const routerCredentialEnv = bp.router.credential_env || bp.credential_env || "NVIDIA_API_KEY";
+  const routerCredential =
+    hydrateCredentialEnv(routerCredentialEnv) ||
+    normalizeCredentialValue(bp.credential_default || "");
+  if (!routerCredential) {
+    throw new Error(`${routerCredentialEnv} is required to start Model Router.`);
+  }
+  saveCredential(routerCredentialEnv, routerCredential);
+  const routerCredentialHash = hashCredential(routerCredential);
+  const session = onboardSession.loadSession();
+  const recordedPid = session?.routerPid ?? null;
+  const recordedCredentialHash = session?.routerCredentialHash ?? null;
+
+  if (await isRouterHealthy(routerPort)) {
+    if (
+      routerCredentialHash &&
+      recordedCredentialHash === routerCredentialHash &&
+      isProcessRunning(recordedPid)
+    ) {
+      console.log(`  ✓ Model router is already healthy on port ${routerPort}`);
+      return;
+    }
+    if (isProcessRunning(recordedPid)) {
+      console.log("  Restarting model router with updated credentials...");
+      await stopModelRouterProcess(requireValue(recordedPid, "Expected recorded router PID"), routerPort);
+    } else {
+      throw new Error(
+        `Port ${routerPort} already has a healthy router endpoint, but its credential state is unknown. Stop the existing model-router process and rerun onboarding.`,
+      );
+    }
+  }
+
+  console.log("  Starting model router...");
+  const routerPid = await startModelRouter(bp.router);
+  console.log(`  ✓ Model router started (PID ${routerPid}) on port ${routerPort}`);
+  onboardSession.updateSession((current: Session) => {
+    current.routerPid = routerPid;
+    current.routerCredentialHash = routerCredentialHash;
+    return current;
+  });
+}
+
 // ── Base image digest resolution ────────────────────────────────
 // Pulls the sandbox-base image from GHCR and inspects it to get the
 // actual repo digest. This avoids the registry mismatch that broke
@ -5184,6 +5459,7 @@ function providerNameToOptionKey(
  opts: { hasNimContainer?: boolean } = {},
 ): string | null {
  if (!name) return null;
+  if (name === "nvidia-router") return "routed";
  if (name === "ollama-local") return "ollama";
  // Local NIM and standalone vLLM both persist as provider="vllm-local". NIM
  // is positively identified by a nimContainer record; the absence of one in
@ -5595,6 +5871,12 @@ async function setupNim(
    }
  }

+  // Model Router: complexity-based routing via blueprint config.
+  const blueprintRouterCfg = loadBlueprintProfile("routed");
+  if (blueprintRouterCfg && blueprintRouterCfg.router?.enabled === true) {
+    options.push({ key: "routed", label: "Model Router (complexity-based routing)" });
+  }
+
  function checkOllamaPortsOrWarn(): boolean {
    const portValidation = validateOllamaPortConfiguration();
    if (!portValidation.ok) {
@ -6489,6 +6771,49 @@ async function setupNim(
        }
        preferredInferenceApi = "openai-completions";
        break;
+      } else if (selected.key === "routed") {
+        const bp = loadBlueprintProfile("routed");
+        if (!bp || bp.router?.enabled !== true) {
+          console.error("  Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
+          if (isNonInteractive()) process.exit(1);
+          continue selectionLoop;
+        }
+        const routerCredentialEnv = bp.router?.credential_env || bp.credential_env || "OPENAI_API_KEY";
+        credentialEnv = routerCredentialEnv;
+        const routedCredential =
+          hydrateCredentialEnv(routerCredentialEnv) ||
+          normalizeCredentialValue(bp.credential_default || "");
+        if (routedCredential) {
+          saveCredential(routerCredentialEnv, routedCredential);
+        }
+        const _providerKeyHint = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
+        if (_providerKeyHint && !resolveProviderCredential(routerCredentialEnv)) {
+          saveCredential(routerCredentialEnv, _providerKeyHint);
+        }
+        if (isNonInteractive()) {
+          if (!resolveProviderCredential(routerCredentialEnv)) {
+            console.error(
+              `  ${routerCredentialEnv} (or NEMOCLAW_PROVIDER_KEY) is required for Model Router in non-interactive mode.`,
+            );
+            process.exit(1);
+          }
+        } else {
+          if (!resolveProviderCredential(routerCredentialEnv)) {
+            await ensureNamedCredential(routerCredentialEnv, "Model Router API key", null);
+          }
+        }
+        provider = bp.provider_name || "nvidia-router";
+        model = bp.model;
+        const { HOST_GATEWAY_URL } = require("./local-inference");
+        const routerEndpointUrl = bp.endpoint || "";
+        endpointUrl = routerEndpointUrl;
+        if (routerEndpointUrl.match(/localhost|127\.0\.0\.1/)) {
+          const u = new URL(routerEndpointUrl);
+          endpointUrl = `${HOST_GATEWAY_URL}:${u.port}${u.pathname}`;
+        }
+        preferredInferenceApi = "openai-completions";
+        console.log(`  ✓ Using Model Router: ${provider} / ${model}`);
+        break;
      }
    }
  }
@ -6737,6 +7062,42 @@ async function setupInference(
    // Do not mutate ~/.nemoclaw/credentials.json here: local Ollama now uses
    // OLLAMA_PROXY_CREDENTIAL_ENV, so any saved OPENAI_API_KEY remains available
    // to unrelated OpenAI-backed sandboxes.
+  } else if (isRoutedInferenceProvider(provider)) {
+    // Blueprint profile provider (e.g., nvidia-router for the routed profile).
+    // Same pattern as vllm-local: upsert the provider and set the inference route.
+    try {
+      await reconcileModelRouter();
+    } catch (err) {
+      console.error(`  ✗ Failed to start model router: ${err instanceof Error ? err.message : String(err)}`);
+      process.exit(1);
+    }
+    const resolvedCredentialEnv = credentialEnv || "NVIDIA_API_KEY";
+    const credentialValue = hydrateCredentialEnv(resolvedCredentialEnv);
+    const env = credentialValue ? { [resolvedCredentialEnv]: credentialValue } : {};
+    const providerResult = upsertProvider(
+      provider,
+      "openai",
+      resolvedCredentialEnv,
+      endpointUrl,
+      env,
+    );
+    if (!providerResult.ok) {
+      console.error(`  ${providerResult.message}`);
+      process.exit(providerResult.status || 1);
+    }
+    const inferenceArgs = [
+      "inference",
+      "set",
+      "--no-verify",
+      "--provider",
+      provider,
+      "--model",
+      model,
+    ];
+    runOpenshell(inferenceArgs);
+  } else {
+    console.error(`  Unsupported provider configuration: ${provider}`);
+    process.exit(1);
  }

  verifyInferenceRoute(provider, model);
@ -9156,6 +9517,16 @@ async function onboard(opts: OnboardOptions = {}): Promise<void> {
      const resumeInference =
        !forceProviderSelection && resume && isInferenceRouteReady(provider, model);
      if (resumeInference) {
+        if (isRoutedInferenceProvider(provider)) {
+          try {
+            await reconcileModelRouter();
+          } catch (err) {
+            console.error(
+              `  ✗ Failed to reconcile model router: ${err instanceof Error ? err.message : String(err)}`,
+            );
+            process.exit(1);
+          }
+        }
        skippedStepMessage("inference", `${provider} / ${model}`);
        if (nimContainer && sandboxName) {
          registry.updateSandbox(sandboxName, { nimContainer });
--- a/test/onboard.test.ts
+++ b/test/onboard.test.ts
@ -106,6 +106,7 @@ type OnboardTestInternals = {
    value: string | null | undefined,
    flavor: "openai" | "anthropic",
  ) => string;
+  providerNameToOptionKey: (name?: string | null) => string | null;
  parsePolicyPresetEnv: (value: string | null) => string[];
  patchStagedDockerfile: ShimFn<void>;
  pullAndResolveBaseImageDigest: () => { digest: string; ref: string } | null;
@ -148,6 +149,8 @@ function isOnboardTestInternals(
    typeof value.agentSupportsWebSearch === "function" &&
    typeof value.configureWebSearch === "function" &&
    typeof value.formatSandboxBuildEstimateNote === "function" &&
+    Object.prototype.hasOwnProperty.call(value, "providerNameToOptionKey") &&
+    typeof value.providerNameToOptionKey === "function" &&
    typeof value.shouldRunCompatibleEndpointSandboxSmoke === "function" &&
    typeof value.writeSandboxConfigSyncFile === "function"
  );
@ -199,6 +202,7 @@ const {
  configureWebSearch,
  isLoopbackHostname,
  normalizeProviderBaseUrl,
+  providerNameToOptionKey,
  parsePolicyPresetEnv,
  patchStagedDockerfile,
  pullAndResolveBaseImageDigest,
@ -829,6 +833,20 @@ describe("onboard helpers", () => {
    );
  });

+  it("maps Model Router sandboxes through managed inference.local", () => {
+    assert.deepEqual(getSandboxInferenceConfig("nvidia-routed", "nvidia-router"), {
+      providerKey: "inference",
+      primaryModelRef: "inference/nvidia-routed",
+      inferenceBaseUrl: "https://inference.local/v1",
+      inferenceApi: "openai-completions",
+      inferenceCompat: null,
+    });
+  });
+
+  it("maps persisted Model Router provider back to the routed provider option", () => {
+    assert.equal(providerNameToOptionKey("nvidia-router"), "routed");
+  });
+
  it("leaves Kimi K2.6 compat to the model-specific setup registry", () => {
    assert.deepEqual(
      getSandboxInferenceConfig("moonshotai/kimi-k2.6", "nvidia-prod", "openai-completions"),
@ -2406,6 +2424,145 @@ const { setupInference } = require(${onboardPath});
    assert.equal(payload.nvidiaApiKey, "nvapi-secret-value");
  });

+  it("configures Model Router as a host provider while sandboxes keep inference.local", () => {
+    const repoRoot = path.join(import.meta.dirname, "..");
+    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-router-inference-"));
+    const fakeBin = path.join(tmpDir, "bin");
+    const scriptPath = path.join(tmpDir, "setup-router-check.js");
+    const onboardPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "onboard.js"));
+    const runnerPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "runner.js"));
+    const registryPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "state", "registry.js"));
+
+    fs.mkdirSync(fakeBin, { recursive: true });
+    fs.writeFileSync(path.join(fakeBin, "openshell"), "#!/usr/bin/env bash\nexit 0\n", {
+      mode: 0o755,
+    });
+    fs.writeFileSync(
+      path.join(fakeBin, "model-router"),
+      [
+        "#!/usr/bin/env node",
+        'const fs = require("fs");',
+        'const http = require("http");',
+        'const path = require("path");',
+        "const args = process.argv.slice(2);",
+        'if (args[0] === "proxy-config") {',
+        '  const output = args[args.indexOf("--output") + 1];',
+        "  fs.mkdirSync(path.dirname(output), { recursive: true });",
+        '  fs.writeFileSync(output, "model_list: []\\n");',
+        "  process.exit(0);",
+        "}",
+        'if (args[0] === "proxy") {',
+        '  const port = Number(args[args.indexOf("--port") + 1] || "4000");',
+        "  const server = http.createServer((req, res) => {",
+        '    if (req.url === "/health") { res.statusCode = 200; res.end("ok"); return; }',
+        "    res.statusCode = 404;",
+        "    res.end();",
+        "  });",
+        '  server.listen(port, "127.0.0.1");',
+        "  setTimeout(() => process.exit(0), 10000);",
+        "} else {",
+        "  process.exit(1);",
+        "}",
+        "",
+      ].join("\n"),
+      { mode: 0o755 },
+    );
+
+    const script = String.raw`
+const runner = require(${runnerPath});
+const _n = (c) => (Array.isArray(c) ? c.join(" ") : String(c)).replace(/'/g, "");
+const registry = require(${registryPath});
+
+const commands = [];
+runner.run = (command, opts = {}) => {
+  const cmd = _n(command);
+  commands.push({ command: cmd, env: opts.env || null });
+  if (cmd.includes("provider get")) return { status: 1, stdout: "", stderr: "" };
+  return { status: 0, stdout: "", stderr: "" };
+};
+runner.runCapture = (command) => {
+  const cmd = _n(command);
+  if (cmd.includes("inference") && cmd.includes("get")) {
+    return [
+      "Gateway inference:",
+      "",
+      "  Route: inference.local",
+      "  Provider: nvidia-router",
+      "  Model: nvidia-routed",
+      "  Version: 1",
+    ].join("\\n");
+  }
+  return "";
+};
+registry.updateSandbox = () => true;
+
+process.env.NVIDIA_API_KEY = "nvapi-router-secret";
+
+const { setupInference, getSandboxInferenceConfig } = require(${onboardPath});
+
+(async () => {
+  await setupInference(
+    "router-box",
+    "nvidia-routed",
+    "nvidia-router",
+    "http://host.openshell.internal:4000/v1",
+    "NVIDIA_API_KEY",
+  );
+  console.log(JSON.stringify({
+    commands,
+    sandboxConfig: getSandboxInferenceConfig("nvidia-routed", "nvidia-router", "openai-completions"),
+  }));
+})().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
+`;
+    fs.writeFileSync(scriptPath, script);
+
+    const result = spawnSync(process.execPath, [scriptPath], {
+      cwd: repoRoot,
+      encoding: "utf-8",
+      env: {
+        ...process.env,
+        HOME: tmpDir,
+        PATH: `${fakeBin}:${process.env.PATH || ""}`,
+      },
+    });
+
+    assert.equal(result.status, 0, result.stderr);
+    const payload = parseStdoutJson<{
+      commands: CommandEntry[];
+      sandboxConfig: SandboxInferenceConfig;
+    }>(result.stdout);
+    const providerCommand = payload.commands.find((entry) =>
+      /provider create/.test(entry.command),
+    );
+    assert.ok(providerCommand, JSON.stringify(payload.commands));
+    assert.match(providerCommand.command, /--name nvidia-router/);
+    assert.match(providerCommand.command, /--credential NVIDIA_API_KEY/);
+    assert.match(
+      providerCommand.command,
+      /OPENAI_BASE_URL=http:\/\/host\.openshell\.internal:4000\/v1/,
+    );
+    assert.doesNotMatch(providerCommand.command, /nvapi-router-secret/);
+    assert.equal(providerCommand.env?.NVIDIA_API_KEY, "nvapi-router-secret");
+
+    const inferenceCommand = payload.commands.find((entry) =>
+      /inference set/.test(entry.command),
+    );
+    assert.ok(inferenceCommand, JSON.stringify(payload.commands));
+    assert.match(inferenceCommand.command, /--provider nvidia-router/);
+    assert.match(inferenceCommand.command, /--model nvidia-routed/);
+
+    assert.deepEqual(payload.sandboxConfig, {
+      providerKey: "inference",
+      primaryModelRef: "inference/nvidia-routed",
+      inferenceBaseUrl: "https://inference.local/v1",
+      inferenceApi: "openai-completions",
+      inferenceCompat: null,
+    });
+  });
+
  it("does not delete saved OpenAI credentials when configuring local vLLM", () => {
    const repoRoot = path.join(import.meta.dirname, "..");
    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-local-vllm-"));
				`@ -0,0 +1 @@`
				`Subproject commit 2bd8dfaa751efb60aa4e7e49b270490dfbc0a68a`