feat(router): add model router integration with complexity-based routing (#2202)

## Summary

- Add NVIDIA LLM Router v3 (prefill-based) as an inference provider,
enabling complexity-based routing that automatically picks the most
efficient model for each query
- Add Model Router as a provider option in the onboard wizard, following
the same pattern as Ollama/NIM (onboard starts the service, configures
the provider)
- Router runs on the host; sandbox reaches it through the OpenShell
gateway L7 proxy

## Changes

- Add llm-router v3 as a git submodule at
nemoclaw-blueprint/router/llm-router/
- Add pool-config.yaml defining the model pool and routing parameters
- Add routed inference profile to blueprint.yaml with model name
nvidia-routed
- Add router startup logic to onboard wizard (startModelRouter in
onboard.ts)
- Add router schema to schemas/blueprint.schema.json
- Remove router lifecycle code from blueprint runner (onboard owns
service startup)
- Remove model-router-toolkit install from Dockerfile (router is
host-only)
- Add model-router-toolkit install to scripts/install.sh
- Update README with Model Router documentation and architecture

## Verification

- [x] npx prek run --all-files passes (hadolint and CLI test failures
are pre-existing)
- [x] npm test passes 
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes

## Test plan

- [x] Unit tests pass (322 plugin tests, 10 files)
- [x] nemoclaw onboard --non-interactive with NEMOCLAW_PROVIDER=routed
completes all 8 steps
- [x] Router starts on port 4000, both pool models healthy
- [x] Inference works end-to-end: sandbox to OpenShell gateway to router
to NVIDIA API
- [x] nvidia-routed model name resolves via model_group_alias in LiteLLM
proxy config
- [x] Routing strategy injects eagerly at startup (no lazy first-request
400 error)

Made with [Cursor](https://cursor.com) 
Signed-off-by: Vinay Bagade <vbagade@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Optional Model Router: complexity-based routed inference selectable
during onboarding; host router runs on port 4000 with configurable model
pool and provider mapping; onboarding now starts and monitors the router
and persists its process ID.
  * Installer detects and installs the Model Router when present.

* **Documentation**
* README updated with Model Router architecture, enablement, and pool
configuration guidance.

* **Tests**
  * New tests covering onboarding and routed/blueprint scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Co-authored-by: Vinay Bagade <vbagade@nvidia.com>
Co-authored-by: Aaron Erickson <aerickson@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
vinaybagade 2026-05-06 15:09:30 -07:00 committed by GitHub
parent 283ee50d83
commit ca1d6b84a5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
14 changed files with 992 additions and 3 deletions

View file

@ -0,0 +1,144 @@
---
name: "nemoclaw-user-triage-instructions"
description: "AI-assisteds label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard."
---
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
# NemoClaw User Triage Instructions
AI-assisted label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard.
This document is the single source of truth for AI-assisted label triage on NVIDIA/NemoClaw issues and PRs.
It is read at runtime by the `nemoclaw-maintainer-triage` CLI skill and fetched at generation time by the nvoss-velocity dashboard.
---
## Step 1: Role
You are a GitHub issue and PR labeler for NemoClaw, NVIDIA's open-source agentic AI assistant framework.
For each item:
1. Assign 15 labels from the provided list that best match the content. Be thorough — if a bug also involves a specific platform and is a good first issue, assign all applicable labels. Only skip a label if it genuinely does not apply.
2. Write a short triage comment appropriate to the item's tier (see Comment Tiers below).
---
## Step 2: Output Format
Return ONLY valid JSON — no markdown fences, no explanation:
```json
{"results": [{"number": 123, "labels": ["bug", "good first issue"], "reason": "One sentence explaining label choices.", "comment": "Comment text."}]}
```
Fields:
- `number` — the issue or PR number
- `labels` — array of label names, exactly as provided in the label list
- `reason` — one concise sentence explaining why these labels apply
- `comment` — triage comment text (see Comment Tiers)
---
## Step 3: Label Assignment Rules
- Use only label names exactly as provided in the label list
- Assign 15 labels per item — apply every label that genuinely fits
- If a specific `enhancement: *` sub-label is assigned, do NOT also assign the bare `enhancement` label — the sub-label is sufficient
- If genuinely unclear, assign `question`
---
## Step 4: Skip Labels
Never assign these — they require human judgment:
- `duplicate`
- `invalid`
- `wontfix`
- `priority: medium`
- `priority: low`
- `status: triage`
- `NV QA`
`priority: high` is allowed ONLY when the issue clearly blocks critical functionality, causes data loss, or describes a production outage — not based on the author's frustration or urgency language alone.
---
## Step 5: Label Guide
Use these descriptions to match labels to issue/PR content:
- `bug`: User reports something broken — unexpected error, crash, exception, traceback, "not working", "fails", "broken", unexpected behavior
- `enhancement`: Generic enhancement — use only if none of the specific `enhancement: *` sub-types clearly apply
- `enhancement: feature`: Request for a new capability — "would be great if", "feature request", "add support for", "please add"
- `enhancement: inference`: Inference routing, model support, provider configuration
- `enhancement: security`: Security controls, policies, audit logging
- `enhancement: policy`: Network policy, egress rules, sandbox policy
- `enhancement: ui`: CLI UX, output formatting, terminal display
- `enhancement: platform`: Cross-platform support (pair with a `Platform: *` label)
- `enhancement: provider`: Cloud or inference provider support (pair with a `Provider: *` label)
- `enhancement: performance`: Speed, resource usage, memory, latency
- `enhancement: reliability`: Stability, error handling, recovery, retries
- `enhancement: testing`: Test coverage, CI/CD quality, test infrastructure
- `enhancement: MCP`: MCP protocol support, tool integration
- `enhancement: CI/CD`: Pipeline, build system, automation
- `enhancement: documentation`: Docs improvements, examples, guides
- `question`: Asking how to do something — "how do I", "is it possible", "does X support"
- `documentation`: Missing or incorrect docs, README errors, API doc gaps
- `good first issue`: Small well-scoped fix, doc typo, clear simple change — easy entry point for new contributors
- `help wanted`: Clear fix or improvement that needs a community contribution
- `security`: Auth issues, API key exposure, CVE, vulnerability, unauthorized access
- `status: needs-info`: Issue or PR has no description, no reproduction steps, or so little detail the team cannot act on it
- `priority: high`: Issue blocks critical functionality, causes data loss, or describes a production outage — apply only when the report clearly describes severe, reproducible impact
- `Platform: MacOS`: Issue specific to macOS, Mac OS X, or Apple Silicon (M1/M2/M3/M4). Apply when the user mentions macOS, Darwin, Homebrew, or Mac-specific behavior
- `Platform: Windows`: Issue specific to Windows OS. Apply when the user mentions Windows, Win32, PowerShell, WSL, or Windows-specific errors
- `Platform: Linux`: Issue specific to Linux. Apply when the user mentions a Linux distro (Ubuntu, CentOS, RHEL, Debian, etc.) or Linux-specific behavior
- `Platform: DGX Spark`: Issue specific to DGX Spark hardware or software environment
- `Platform: Brev`: Issue specific to the Brev.dev cloud environment
- `Platform: ARM64`: Issue specific to ARM64 / aarch64 architecture
- `Integration: Slack`: Issue or feature involving the Slack integration or Slack bridge
- `Integration: Discord`: Issue or feature involving the Discord integration
- `Integration: Telegram`: Issue or feature involving the Telegram integration
- `Integration: GitHub`: Issue or feature involving GitHub-specific behavior (not the repo itself)
- `Provider: NVIDIA`: Issue or feature specific to NVIDIA inference endpoints or NIM
- `Provider: OpenAI`: Issue or feature specific to OpenAI API or models
- `Provider: Anthropic`: Issue or feature specific to Anthropic / Claude models
- `Provider: Azure`: Issue or feature specific to Azure OpenAI or Azure cloud
- `Provider: AWS`: Issue or feature specific to AWS Bedrock or AWS cloud
- `Provider: GCP`: Issue or feature specific to Google Cloud / Vertex AI
---
## Step 6: Comment Tiers
Items are classified as `quality_tier` or `standard_tier` before generation. This is passed in the item metadata.
- **quality_tier** (influencer author, company-affiliated author, or body > 800 chars): Write 23 sentences. Start with "Thanks," then naturally reference specific details from the body. Avoid "I've taken a look at", "I've reviewed", "it appears to", "I can see that" — these sound bot-generated. Write like a human maintainer giving a warm, specific response.
- **standard_tier**: Write 1 sentence acknowledging the report and mentioning the labels applied.
---
## Step 7: Tone Rules (strictly enforced)
- Use "could" not "should"; use "may" not "will" — this is a first response, not a commitment
- Never say "Thanks for fixing" — say "Thanks for the proposed fix" or "Thanks for submitting this"
- Never say "Thanks for adding" — say "Thanks for the suggested addition"
- Never claim the submission accomplishes something before review
- Do not say "I'll" or "we'll"
- For issues (bugs, questions, enhancements): use "this identifies a..." or "this reports a..."
- For PRs: use "this proposes a way to..."
- For security-related items: never confirm a vulnerability is real; use neutral language
- Do NOT open with praise about detail or thoroughness. Only reference the quality of the report if the body is genuinely exceptional — multiple reproduction steps, version info, logs, and clear expected vs actual behavior. For most reports, skip the praise entirely and go straight to the triage acknowledgment.
- Do not add generic closing filler phrases
- If a "Spam signal:" line is present in the item metadata, assign only `status: needs-info` and ask for more detail politely
- If a "Note: Author also opened..." line is present, briefly acknowledge if the relationship is plausible
---
## Related Skills
- `nemoclaw-user-skills-coding` — Agent Skills — all available maintainer and user skills

4
.gitmodules vendored Normal file
View file

@ -0,0 +1,4 @@
[submodule "nemoclaw-blueprint/router/llm-router"]
path = nemoclaw-blueprint/router/llm-router
url = https://github.com/NVIDIA-AI-Blueprints/llm-router.git
branch = v3

View file

@ -138,6 +138,58 @@ Alternatively, send a single message and print the response:
openclaw agent --agent main --local -m "hello" --session-id test
```
### Model Router (Complexity-Based Routing)
NemoClaw includes an optional model router that automatically picks the most efficient model for each query. Instead of sending every request to a single large model, the router uses a lightweight encoder to predict which model in a pool can handle each query correctly, then routes to the cheapest one that meets an accuracy threshold.
The router uses the [NVIDIA LLM Router v3](https://github.com/NVIDIA-AI-Blueprints/llm-router/tree/v3) prefill routing engine and runs on the host as a LiteLLM proxy. The sandbox reaches it through the OpenShell gateway and continues to call `https://inference.local/v1`; do not probe `localhost:4000` or `host.openshell.internal` directly from inside the sandbox.
#### Enable during onboard
Select **Model Router (complexity-based routing)** during the onboard wizard, or set `NEMOCLAW_PROVIDER=routed` for non-interactive mode:
```bash
NEMOCLAW_PROVIDER=routed nemoclaw onboard --non-interactive
```
The onboard wizard starts the router, configures the OpenShell provider, and creates the sandbox. The router process runs on the host on port 4000.
#### Configure the model pool
Edit `nemoclaw-blueprint/router/pool-config.yaml` to define which models the router can choose from:
```yaml
routing:
method: prefill
checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
tolerance: 0.20
encoder: Qwen/Qwen3.5-0.8B
models:
- name: nano
litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
cost_per_m_input_tokens: 0.05
api_base: "https://inference-api.nvidia.com"
- name: super
litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
cost_per_m_input_tokens: 0.10
api_base: "https://inference-api.nvidia.com"
```
The `tolerance` parameter controls the accuracy-cost tradeoff: 0.0 always picks the most accurate model, 1.0 always picks the cheapest, and 0.20 (default) allows up to 20 percentage points below the best for a cheaper model.
#### Architecture
The router runs on the host, not inside the sandbox:
```text
Sandbox (OpenClaw) ──> OpenShell Gateway (L7 proxy) ──> Model Router (:4000) ──> NVIDIA API
└── PrefillRouter selects model
```
Credentials flow through the OpenShell provider system. The sandbox never sees raw API keys.
### Uninstall
To remove NemoClaw and all resources created during setup, run the CLI's built-in uninstall command:
@ -197,6 +249,9 @@ NemoClaw/
│ ├── commands/ # Slash commands, migration state
│ └── onboard/ # Onboarding config
├── nemoclaw-blueprint/ # Blueprint YAML and network policies
│ └── router/
│ ├── pool-config.yaml # Model pool and routing config
│ └── llm-router/ # LLM Router v3 submodule (prefill routing engine)
├── scripts/ # Install helpers, setup, automation
├── test/ # Integration and E2E tests
└── docs/ # User-facing docs (Sphinx/MyST)

View file

@ -18,6 +18,7 @@ profiles:
- ncp
- nim-local
- vllm
- routed
description: |
NemoClaw blueprint: orchestrates OpenClaw sandbox creation, migration,
@ -70,6 +71,19 @@ components:
credential_default: "dummy"
timeout_secs: 180
routed:
provider_type: "openai"
provider_name: "nvidia-router"
endpoint: "http://localhost:4000/v1"
model: "nvidia-routed"
credential_env: "NVIDIA_API_KEY"
timeout_secs: 180
router:
enabled: true
port: 4000
pool_config_path: "router/pool-config.yaml"
policy:
base: "sandboxes/openclaw/policy.yaml"
additions:

@ -0,0 +1 @@
Subproject commit 2bd8dfaa751efb60aa4e7e49b270490dfbc0a68a

View file

@ -0,0 +1,36 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Model Router Toolkit pool config for complexity-based routing.
# Uses the NVIDIA LLM Router v3 (https://github.com/NVIDIA-AI-Blueprints/llm-router)
# to route requests to the most efficient model that meets an accuracy threshold.
#
# The router runs an encoder (Qwen3.5-0.8B) on each query to predict P(correct)
# per model, then selects the cheapest model above the tolerance threshold.
#
# tolerance controls the accuracy-cost tradeoff:
# 0.0 = always pick the highest-confidence model (most expensive)
# 0.20 = allow up to 20pp below the best for a cheaper model (default)
# 1.0 = always pick the cheapest model
routing:
method: prefill
checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
tolerance: 0.20
encoder: Qwen/Qwen3.5-0.8B
encoder_backend: transformers
models:
- name: nemotron-3-nano-reasoning
display_name: "Nemotron 3 Nano (Reasoning)"
litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
cost_per_m_input_tokens: 0.05
cost_per_m_output_tokens: 0.20
api_base: "https://inference-api.nvidia.com"
- name: nemotron-3-super
display_name: "Nemotron 3 Super 120B"
litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
cost_per_m_input_tokens: 0.10
cost_per_m_output_tokens: 0.40
api_base: "https://inference-api.nvidia.com"

View file

@ -122,6 +122,38 @@ function minimalBlueprint(overrides?: Record<string, unknown>): Record<string, u
};
}
function routedBlueprint(): Record<string, unknown> {
return {
version: "1.0",
components: {
inference: {
profiles: {
routed: {
provider_type: "openai",
provider_name: "nvidia-router",
endpoint: "http://localhost:4000/v1",
model: "routed",
credential_env: "NVIDIA_API_KEY",
credential_default: "router-local",
timeout_secs: 180,
},
},
},
sandbox: {
image: "openclaw",
name: "test-sandbox",
forward_ports: [18789],
},
router: {
enabled: true,
port: 4000,
pool_config_path: "router/pool-config.yaml",
},
policy: { additions: {} },
},
};
}
function seedBlueprintFile(bp?: Record<string, unknown>): void {
addFile("blueprint.yaml", YAML.stringify(bp ?? minimalBlueprint()));
}
@ -378,6 +410,25 @@ describe("runner", () => {
expect(out).toContain("PROGRESS:10:Validating blueprint");
expect(out).toContain("PROGRESS:100:Plan complete");
});
it("includes router info when router is enabled", async () => {
captureStdout();
mockExeca.mockResolvedValue({ exitCode: 0 });
const plan = await actionPlan("routed", routedBlueprint());
expect(plan.router.enabled).toBe(true);
expect(plan.router.port).toBe(4000);
expect(plan.router.pool_config_path).toBe("router/pool-config.yaml");
});
it("defaults router to disabled when not in blueprint", async () => {
captureStdout();
mockExeca.mockResolvedValue({ exitCode: 0 });
const plan = await actionPlan("default", minimalBlueprint());
expect(plan.router.enabled).toBe(false);
expect(plan.router.port).toBe(4000);
});
});
describe("actionApply", () => {
@ -679,6 +730,24 @@ describe("runner", () => {
if (!inferenceCall) throw new Error("inference set call not found");
expect(inferenceCall[1]).not.toContain("--timeout");
});
it("passes endpoint as-is from blueprint (no rewriting)", async () => {
process.env.NVIDIA_API_KEY = "test-key";
try {
await actionApply("routed", routedBlueprint());
const providerCall = mockExeca.mock.calls.find(
(c) => Array.isArray(c[1]) && c[1].includes("provider"),
);
if (!providerCall) throw new Error("provider create call not found");
const configArg = (providerCall[1] as string[]).find((a: string) =>
a.startsWith("OPENAI_BASE_URL="),
);
expect(configArg).toBe("OPENAI_BASE_URL=http://localhost:4000/v1");
} finally {
delete process.env.NVIDIA_API_KEY;
}
});
});
describe("actionStatus", () => {

View file

@ -50,6 +50,10 @@ function isOptionalFiniteNumber(value: unknown): value is number | undefined {
return value === undefined || (typeof value === "number" && Number.isFinite(value));
}
function isOptionalBoolean(value: unknown): value is boolean | undefined {
return value === undefined || typeof value === "boolean";
}
function isValidPort(value: unknown): value is number {
return typeof value === "number" && Number.isInteger(value) && value >= 1 && value <= 65535;
}
@ -142,6 +146,20 @@ function isBlueprint(value: unknown): value is Blueprint {
}
}
const router = components.router;
if (router !== undefined) {
if (!isObjectLike(router)) {
return false;
}
if (
!isOptionalBoolean(router.enabled) ||
!(router.port === undefined || isValidPort(router.port)) ||
!isOptionalString(router.pool_config_path)
) {
return false;
}
}
const policy = components.policy;
if (policy !== undefined) {
if (!isObjectLike(policy)) {
@ -199,6 +217,7 @@ interface Blueprint {
profiles?: InferenceProfileMap;
};
sandbox?: SandboxConfig;
router?: RouterConfig;
policy?: {
additions?: PolicyAdditions;
};
@ -221,6 +240,14 @@ interface SandboxConfig {
forward_ports?: number[];
}
interface RouterConfig {
enabled?: boolean;
port?: number;
pool_config_path?: string;
}
const DEFAULT_ROUTER_PORT = 4000;
export function loadBlueprint(): Blueprint {
const blueprintPath = process.env.NEMOCLAW_BLUEPRINT_PATH ?? ".";
const bpFile = join(blueprintPath, "blueprint.yaml");
@ -272,6 +299,7 @@ async function resolveRunConfig(
inferenceProfiles: InferenceProfileMap;
inferenceCfg: InferenceProfile;
sandboxCfg: SandboxConfig;
routerCfg: RouterConfig;
}> {
const inferenceProfiles = blueprint.components?.inference?.profiles ?? {};
if (!(profile in inferenceProfiles)) {
@ -297,7 +325,8 @@ async function resolveRunConfig(
}
const sandboxCfg = blueprint.components?.sandbox ?? {};
return { inferenceProfiles, inferenceCfg, sandboxCfg };
const routerCfg = blueprint.components?.router ?? {};
return { inferenceProfiles, inferenceCfg, sandboxCfg, routerCfg };
}
// ── Actions ─────────────────────────────────────────────────────
@ -317,6 +346,11 @@ export interface RunPlan {
model: string | undefined;
credential_env: string | undefined;
};
router: {
enabled: boolean;
port: number;
pool_config_path: string | undefined;
};
policy_additions: PolicyAdditions;
dry_run: boolean;
}
@ -329,7 +363,7 @@ export async function actionPlan(
const rid = emitRunId();
progress(10, "Validating blueprint");
const { inferenceCfg, sandboxCfg } = await resolveRunConfig(
const { inferenceCfg, sandboxCfg, routerCfg } = await resolveRunConfig(
profile,
blueprint,
options?.endpointUrl,
@ -342,6 +376,9 @@ export async function actionPlan(
);
}
const routerEnabled = routerCfg.enabled === true;
const routerPort = routerCfg.port ?? DEFAULT_ROUTER_PORT;
const plan: RunPlan = {
run_id: rid,
profile,
@ -357,6 +394,11 @@ export async function actionPlan(
model: inferenceCfg.model,
credential_env: inferenceCfg.credential_env,
},
router: {
enabled: routerEnabled,
port: routerPort,
pool_config_path: routerCfg.pool_config_path,
},
policy_additions: blueprint.components?.policy?.additions ?? {},
dry_run: options?.dryRun ?? false,
};

View file

@ -79,6 +79,27 @@
}
}
},
"router": {
"type": "object",
"description": "Model router configuration for complexity-based routing.",
"additionalProperties": false,
"properties": {
"enabled": {
"type": "boolean",
"description": "Whether the model router is active."
},
"port": {
"type": "integer",
"minimum": 1,
"maximum": 65535,
"description": "Port the router proxy listens on."
},
"pool_config_path": {
"type": "string",
"description": "Path to the router pool config YAML relative to the blueprint directory."
}
}
},
"policy": {
"type": "object",
"required": ["base"],

View file

@ -1320,6 +1320,44 @@ is_source_checkout() {
return 1
}
init_nemoclaw_submodules() {
local root="$1"
[[ -f "$root/.gitmodules" ]] || return 0
git -C "$root" rev-parse --git-dir >/dev/null 2>&1 || return 0
git -C "$root" submodule update --init --depth 1 2>/dev/null
}
is_routed_provider_requested() {
local provider="${NEMOCLAW_PROVIDER:-}"
provider="$(printf '%s' "$provider" | tr '[:upper:]' '[:lower:]')"
[[ "$provider" == "routed" ]]
}
install_model_router_if_present() {
local root="$1"
local router_dir="$root/nemoclaw-blueprint/router/llm-router"
if ! command_exists pip3; then
is_routed_provider_requested && error "pip3 is required for routed inference."
return 0
fi
if [[ ! -d "$router_dir" ]]; then
is_routed_provider_requested && error "llm-router is required for routed inference but is missing."
return 0
fi
if [[ ! -f "$router_dir/pyproject.toml" && ! -f "$router_dir/setup.py" ]]; then
is_routed_provider_requested && error "llm-router is required for routed inference but is not initialized."
warn "Skipping model router install — llm-router submodule is not initialized."
return 0
fi
if ! spin "Installing model router" pip3 install --quiet --user "${router_dir}[prefill,proxy]"; then
if is_routed_provider_requested; then
error "pip3 install of llm-router failed"
fi
warn "Skipping model router install — pip3 install failed"
fi
}
install_nemoclaw() {
command_exists git || error "git was not found on PATH."
local repo_root package_json
@ -1335,9 +1373,14 @@ install_nemoclaw() {
spin "Preparing OpenClaw package" bash -c "$(declare -f info warn resolve_openclaw_version pre_extract_openclaw); pre_extract_openclaw \"\$1\"" _ "$NEMOCLAW_SOURCE_ROOT" \
|| warn "Pre-extraction failed — npm install may fail if openclaw tarball is broken"
fi
if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$NEMOCLAW_SOURCE_ROOT"; then
is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
warn "Submodule initialization failed — model router support may be unavailable"
fi
spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm install --ignore-scripts"
spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm run --if-present build:cli"
spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\"/nemoclaw && npm install --ignore-scripts && npm run build"
install_model_router_if_present "$NEMOCLAW_SOURCE_ROOT"
spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm link"
else
if [[ -f "$package_json" ]]; then
@ -1356,6 +1399,10 @@ install_nemoclaw() {
mkdir -p "$(dirname "$nemoclaw_src")"
NEMOCLAW_SOURCE_ROOT="$nemoclaw_src"
spin "Cloning ${_CLI_DISPLAY} source" clone_nemoclaw_ref "$release_ref" "$nemoclaw_src"
if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$nemoclaw_src"; then
is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
warn "Submodule initialization failed — model router support may be unavailable"
fi
# Fetch version tags into the shallow clone so `git describe --tags
# --match "v*"` works at runtime (the shallow clone only has the
# single ref we asked for).
@ -1371,6 +1418,7 @@ install_nemoclaw() {
spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$nemoclaw_src\" && npm install --ignore-scripts"
spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$nemoclaw_src\" && npm run --if-present build:cli"
spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$nemoclaw_src\"/nemoclaw && npm install --ignore-scripts && npm run build"
install_model_router_if_present "$nemoclaw_src"
spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$nemoclaw_src\" && npm link"
# Install/upgrade the OpenShell CLI on the GitHub-clone path (curl|bash).

View file

@ -119,6 +119,8 @@ function getProviderLabel(provider) {
switch (provider) {
case "nvidia-nim":
return "NVIDIA Endpoints";
case "nvidia-router":
return "Model Router";
case "vllm-local":
return "Local vLLM";
case "ollama-local":
@ -142,6 +144,8 @@ function getEffectiveProviderName(providerKey) {
return "ollama-local";
case "vllm":
return "vllm-local";
case "routed":
return "nvidia-router";
default:
return providerKey;
}
@ -169,6 +173,7 @@ function getNonInteractiveProvider() {
"custom",
"nim-local",
"vllm",
"routed",
"install-vllm",
"install-ollama",
"install-windows-ollama",
@ -177,7 +182,7 @@ function getNonInteractiveProvider() {
if (!validProviders.has(normalized)) {
console.error(` Unsupported NEMOCLAW_PROVIDER: ${providerKey}`);
console.error(
" Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
" Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, routed, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
);
process.exit(1);
}
@ -340,6 +345,10 @@ function getSandboxInferenceConfig(
supportsStore: false,
};
break;
case "nvidia-router":
providerKey = "inference";
primaryModelRef = `inference/${model}`;
break;
case "nvidia-prod":
case "nvidia-nim":
default:

View file

@ -74,6 +74,8 @@ export interface Session {
credentialEnv: string | null;
preferredInferenceApi: string | null;
nimContainer: string | null;
routerPid: number | null;
routerCredentialHash: string | null;
webSearchConfig: WebSearchConfig | null;
policyPresets: string[] | null;
messagingChannels: string[] | null;
@ -122,6 +124,8 @@ export interface SessionUpdates {
credentialEnv?: string;
preferredInferenceApi?: string;
nimContainer?: string;
routerPid?: number;
routerCredentialHash?: string;
webSearchConfig?: WebSearchConfig | null;
policyPresets?: string[];
messagingChannels?: string[];
@ -189,6 +193,10 @@ function readString(value: SessionJsonValue | undefined): string | null {
return typeof value === "string" ? value : null;
}
function readPositiveInteger(value: SessionJsonValue | undefined): number | null {
return typeof value === "number" && Number.isInteger(value) && value > 0 ? value : null;
}
function readStringArray(value: SessionJsonValue | undefined): string[] | null {
if (!Array.isArray(value)) return null;
return value.filter((entry): entry is string => typeof entry === "string");
@ -297,6 +305,8 @@ export function createSession(overrides: Partial<Session> = {}): Session {
credentialEnv: overrides.credentialEnv ?? null,
preferredInferenceApi: overrides.preferredInferenceApi ?? null,
nimContainer: overrides.nimContainer ?? null,
routerPid: readPositiveInteger(overrides.routerPid),
routerCredentialHash: overrides.routerCredentialHash ?? null,
webSearchConfig:
overrides.webSearchConfig?.fetchEnabled === true ? { fetchEnabled: true } : null,
policyPresets: readStringArray(overrides.policyPresets),
@ -333,6 +343,8 @@ export function normalizeSession(data: Session | SessionJsonValue | undefined):
credentialEnv: readString(data.credentialEnv),
preferredInferenceApi: readString(data.preferredInferenceApi),
nimContainer: readString(data.nimContainer),
routerPid: readPositiveInteger(data.routerPid),
routerCredentialHash: readString(data.routerCredentialHash),
webSearchConfig: parseWebSearchConfig(data.webSearchConfig),
policyPresets: readStringArray(data.policyPresets),
messagingChannels: readStringArray(data.messagingChannels),
@ -692,6 +704,12 @@ export function filterSafeUpdates(updates: SessionUpdates): Partial<Session> {
if (typeof updates.preferredInferenceApi === "string")
safe.preferredInferenceApi = updates.preferredInferenceApi;
if (typeof updates.nimContainer === "string") safe.nimContainer = updates.nimContainer;
if (typeof updates.routerPid === "number" && Number.isInteger(updates.routerPid) && updates.routerPid > 0) {
safe.routerPid = updates.routerPid;
}
if (typeof updates.routerCredentialHash === "string") {
safe.routerCredentialHash = updates.routerCredentialHash;
}
if (isObject(updates.webSearchConfig) && updates.webSearchConfig.fetchEnabled === true) {
safe.webSearchConfig = { fetchEnabled: true };
} else if (updates.webSearchConfig === null) {

View file

@ -770,6 +770,281 @@ function getBlueprintMaxOpenshellVersion(rootDir = ROOT): string | null {
return getBlueprintVersionField("max_openshell_version", rootDir);
}
/**
* Load a named inference profile and router config from blueprint.yaml.
* Returns null if the blueprint or profile is missing.
*/
type BlueprintRouterConfig = {
enabled?: boolean;
port?: number;
pool_config_path?: string;
credential_env?: string;
};
type BlueprintInferenceProfile = {
provider_name?: string;
endpoint?: string;
model: string;
credential_env?: string;
credential_default?: string;
router: BlueprintRouterConfig;
};
function loadBlueprintProfile(
profileName: string,
rootDir: string = ROOT,
): BlueprintInferenceProfile | null {
try {
const YAML = require("yaml");
const blueprintPath = path.join(rootDir, "nemoclaw-blueprint", "blueprint.yaml");
if (!fs.existsSync(blueprintPath)) return null;
const raw = fs.readFileSync(blueprintPath, "utf8");
const parsed = YAML.parse(raw);
const profile = parsed?.components?.inference?.profiles?.[profileName];
if (!profile) return null;
const router = { ...(parsed?.components?.router || {}) };
if (typeof profile.credential_env === "string" && profile.credential_env.trim().length > 0) {
router.credential_env = profile.credential_env;
}
return { ...profile, router } as BlueprintInferenceProfile;
} catch {
return null;
}
}
const ROUTER_HEALTH_RETRIES = 15;
const ROUTER_HEALTH_INTERVAL_MS = 2000;
const ROUTER_HEALTH_TIMEOUT_MS = 3000;
async function isRouterHealthy(port: number, timeoutMs = ROUTER_HEALTH_TIMEOUT_MS): Promise<boolean> {
const http = require("http");
return new Promise<boolean>((resolve) => {
let settled = false;
const settle = (healthy: boolean) => {
if (settled) return;
settled = true;
resolve(healthy);
};
const request = http
.get(`http://127.0.0.1:${port}/health`, (res: import("node:http").IncomingMessage) => {
res.resume();
settle((res.statusCode || 0) >= 200 && (res.statusCode || 0) < 300);
})
.on("error", () => settle(false));
request.setTimeout(timeoutMs, () => {
request.destroy();
settle(false);
});
});
}
function isProcessRunning(pid: number | null | undefined): boolean {
if (!Number.isInteger(pid) || Number(pid) <= 0) return false;
try {
process.kill(Number(pid), 0);
return true;
} catch {
return false;
}
}
async function stopModelRouterProcess(pid: number, port: number): Promise<void> {
try {
process.kill(pid, "SIGTERM");
} catch {
return;
}
for (let attempt = 0; attempt < 10; attempt++) {
await new Promise((resolve) => setTimeout(resolve, 500));
if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
}
try {
process.kill(pid, "SIGKILL");
} catch {
// already stopped
}
for (let attempt = 0; attempt < 5; attempt++) {
await new Promise((resolve) => setTimeout(resolve, 500));
if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
}
}
/**
* Start the model-router proxy and wait for it to become healthy.
* Follows the same pattern as Ollama startup (spawn detached, poll health).
* Returns the PID of the child process.
*/
async function startModelRouter(routerCfg: BlueprintRouterConfig): Promise<number> {
const port = routerCfg.port || 4000;
const blueprintDir = path.join(ROOT, "nemoclaw-blueprint");
const poolConfigPath = path.join(
blueprintDir,
routerCfg.pool_config_path || "router/pool-config.yaml",
);
const stateDir = path.join(os.homedir(), ".nemoclaw", "state");
const litellmConfigPath = path.join(stateDir, "litellm-proxy.yaml");
fs.mkdirSync(stateDir, { recursive: true });
const proxyConfigResult = spawnSync(
"model-router",
["proxy-config", "--config", poolConfigPath, "--output", litellmConfigPath],
{ encoding: "utf8", timeout: 30_000, cwd: blueprintDir },
);
if (proxyConfigResult.status !== 0) {
throw new Error(
`model-router proxy-config failed: ${proxyConfigResult.stderr || proxyConfigResult.error || "unknown error"}`,
);
}
const { buildSubprocessEnv } = require("./subprocess-env");
const credEnvVars: Record<string, string> = {};
const credName = routerCfg.credential_env || "NVIDIA_API_KEY";
const routedCredential = resolveProviderCredential(credName);
const openAiCredential = resolveProviderCredential("OPENAI_API_KEY");
if (routedCredential) {
credEnvVars[credName] = routedCredential;
if (!openAiCredential) credEnvVars.OPENAI_API_KEY = routedCredential;
}
if (openAiCredential) credEnvVars.OPENAI_API_KEY = openAiCredential;
const _providerKey = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
if (_providerKey) {
if (!credEnvVars[credName]) credEnvVars[credName] = _providerKey;
if (!credEnvVars.OPENAI_API_KEY) credEnvVars.OPENAI_API_KEY = _providerKey;
}
if (await isRouterHealthy(port)) {
throw new Error(
`Port ${port} already has a healthy router endpoint; refusing to start a second router.`,
);
}
const child = spawn(
"model-router",
[
"proxy",
"--litellm-config", litellmConfigPath,
"--router-config", poolConfigPath,
"--host", "0.0.0.0",
"--port", String(port),
],
{
detached: true,
stdio: "ignore",
cwd: blueprintDir,
env: buildSubprocessEnv(credEnvVars),
},
);
let childExited = false;
let childExitDetail = "";
child.once("error", (err: Error) => {
childExited = true;
childExitDetail = `child failed to start: ${err.message}`;
});
child.once("exit", (code: number | null, signal: string | null) => {
childExited = true;
if (!childExitDetail) {
childExitDetail = `child exited with code ${code ?? "null"}${signal ? ` signal ${signal}` : ""}`;
}
});
child.unref();
const pid = child.pid;
if (!pid) {
throw new Error(
"Failed to start model-router proxy: no PID returned" +
(childExitDetail ? ` (${childExitDetail})` : ""),
);
}
for (let attempt = 0; attempt < ROUTER_HEALTH_RETRIES; attempt++) {
await new Promise((resolve) => setTimeout(resolve, ROUTER_HEALTH_INTERVAL_MS));
if (childExited) break;
const healthy = await isRouterHealthy(port);
let processAlive = true;
try {
process.kill(pid, 0);
} catch {
processAlive = false;
}
if (healthy && processAlive) return pid;
if (!processAlive) {
childExited = true;
if (!childExitDetail) childExitDetail = "child process is no longer running";
break;
}
}
try {
process.kill(pid, "SIGTERM");
} catch {
// already dead
}
throw new Error(
`Model router failed to become healthy on port ${port} after ${ROUTER_HEALTH_RETRIES} attempts` +
(childExitDetail ? ` (${childExitDetail})` : ""),
);
}
function getRoutedProfile(): BlueprintInferenceProfile {
const bp = loadBlueprintProfile("routed");
if (!bp || bp.router?.enabled !== true) {
throw new Error("Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
}
return bp;
}
function isRoutedInferenceProvider(provider: string | null | undefined): boolean {
if (!provider) return false;
if (provider === "nvidia-router") return true;
const bp = loadBlueprintProfile("routed");
return Boolean(bp?.provider_name && provider === bp.provider_name);
}
async function reconcileModelRouter(): Promise<void> {
const bp = getRoutedProfile();
const routerPort = bp.router.port || 4000;
const routerCredentialEnv = bp.router.credential_env || bp.credential_env || "NVIDIA_API_KEY";
const routerCredential =
hydrateCredentialEnv(routerCredentialEnv) ||
normalizeCredentialValue(bp.credential_default || "");
if (!routerCredential) {
throw new Error(`${routerCredentialEnv} is required to start Model Router.`);
}
saveCredential(routerCredentialEnv, routerCredential);
const routerCredentialHash = hashCredential(routerCredential);
const session = onboardSession.loadSession();
const recordedPid = session?.routerPid ?? null;
const recordedCredentialHash = session?.routerCredentialHash ?? null;
if (await isRouterHealthy(routerPort)) {
if (
routerCredentialHash &&
recordedCredentialHash === routerCredentialHash &&
isProcessRunning(recordedPid)
) {
console.log(` ✓ Model router is already healthy on port ${routerPort}`);
return;
}
if (isProcessRunning(recordedPid)) {
console.log(" Restarting model router with updated credentials...");
await stopModelRouterProcess(requireValue(recordedPid, "Expected recorded router PID"), routerPort);
} else {
throw new Error(
`Port ${routerPort} already has a healthy router endpoint, but its credential state is unknown. Stop the existing model-router process and rerun onboarding.`,
);
}
}
console.log(" Starting model router...");
const routerPid = await startModelRouter(bp.router);
console.log(` ✓ Model router started (PID ${routerPid}) on port ${routerPort}`);
onboardSession.updateSession((current: Session) => {
current.routerPid = routerPid;
current.routerCredentialHash = routerCredentialHash;
return current;
});
}
// ── Base image digest resolution ────────────────────────────────
// Pulls the sandbox-base image from GHCR and inspects it to get the
// actual repo digest. This avoids the registry mismatch that broke
@ -5184,6 +5459,7 @@ function providerNameToOptionKey(
opts: { hasNimContainer?: boolean } = {},
): string | null {
if (!name) return null;
if (name === "nvidia-router") return "routed";
if (name === "ollama-local") return "ollama";
// Local NIM and standalone vLLM both persist as provider="vllm-local". NIM
// is positively identified by a nimContainer record; the absence of one in
@ -5595,6 +5871,12 @@ async function setupNim(
}
}
// Model Router: complexity-based routing via blueprint config.
const blueprintRouterCfg = loadBlueprintProfile("routed");
if (blueprintRouterCfg && blueprintRouterCfg.router?.enabled === true) {
options.push({ key: "routed", label: "Model Router (complexity-based routing)" });
}
function checkOllamaPortsOrWarn(): boolean {
const portValidation = validateOllamaPortConfiguration();
if (!portValidation.ok) {
@ -6489,6 +6771,49 @@ async function setupNim(
}
preferredInferenceApi = "openai-completions";
break;
} else if (selected.key === "routed") {
const bp = loadBlueprintProfile("routed");
if (!bp || bp.router?.enabled !== true) {
console.error(" Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
if (isNonInteractive()) process.exit(1);
continue selectionLoop;
}
const routerCredentialEnv = bp.router?.credential_env || bp.credential_env || "OPENAI_API_KEY";
credentialEnv = routerCredentialEnv;
const routedCredential =
hydrateCredentialEnv(routerCredentialEnv) ||
normalizeCredentialValue(bp.credential_default || "");
if (routedCredential) {
saveCredential(routerCredentialEnv, routedCredential);
}
const _providerKeyHint = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
if (_providerKeyHint && !resolveProviderCredential(routerCredentialEnv)) {
saveCredential(routerCredentialEnv, _providerKeyHint);
}
if (isNonInteractive()) {
if (!resolveProviderCredential(routerCredentialEnv)) {
console.error(
` ${routerCredentialEnv} (or NEMOCLAW_PROVIDER_KEY) is required for Model Router in non-interactive mode.`,
);
process.exit(1);
}
} else {
if (!resolveProviderCredential(routerCredentialEnv)) {
await ensureNamedCredential(routerCredentialEnv, "Model Router API key", null);
}
}
provider = bp.provider_name || "nvidia-router";
model = bp.model;
const { HOST_GATEWAY_URL } = require("./local-inference");
const routerEndpointUrl = bp.endpoint || "";
endpointUrl = routerEndpointUrl;
if (routerEndpointUrl.match(/localhost|127\.0\.0\.1/)) {
const u = new URL(routerEndpointUrl);
endpointUrl = `${HOST_GATEWAY_URL}:${u.port}${u.pathname}`;
}
preferredInferenceApi = "openai-completions";
console.log(` ✓ Using Model Router: ${provider} / ${model}`);
break;
}
}
}
@ -6737,6 +7062,42 @@ async function setupInference(
// Do not mutate ~/.nemoclaw/credentials.json here: local Ollama now uses
// OLLAMA_PROXY_CREDENTIAL_ENV, so any saved OPENAI_API_KEY remains available
// to unrelated OpenAI-backed sandboxes.
} else if (isRoutedInferenceProvider(provider)) {
// Blueprint profile provider (e.g., nvidia-router for the routed profile).
// Same pattern as vllm-local: upsert the provider and set the inference route.
try {
await reconcileModelRouter();
} catch (err) {
console.error(` ✗ Failed to start model router: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const resolvedCredentialEnv = credentialEnv || "NVIDIA_API_KEY";
const credentialValue = hydrateCredentialEnv(resolvedCredentialEnv);
const env = credentialValue ? { [resolvedCredentialEnv]: credentialValue } : {};
const providerResult = upsertProvider(
provider,
"openai",
resolvedCredentialEnv,
endpointUrl,
env,
);
if (!providerResult.ok) {
console.error(` ${providerResult.message}`);
process.exit(providerResult.status || 1);
}
const inferenceArgs = [
"inference",
"set",
"--no-verify",
"--provider",
provider,
"--model",
model,
];
runOpenshell(inferenceArgs);
} else {
console.error(` Unsupported provider configuration: ${provider}`);
process.exit(1);
}
verifyInferenceRoute(provider, model);
@ -9156,6 +9517,16 @@ async function onboard(opts: OnboardOptions = {}): Promise<void> {
const resumeInference =
!forceProviderSelection && resume && isInferenceRouteReady(provider, model);
if (resumeInference) {
if (isRoutedInferenceProvider(provider)) {
try {
await reconcileModelRouter();
} catch (err) {
console.error(
` ✗ Failed to reconcile model router: ${err instanceof Error ? err.message : String(err)}`,
);
process.exit(1);
}
}
skippedStepMessage("inference", `${provider} / ${model}`);
if (nimContainer && sandboxName) {
registry.updateSandbox(sandboxName, { nimContainer });

View file

@ -106,6 +106,7 @@ type OnboardTestInternals = {
value: string | null | undefined,
flavor: "openai" | "anthropic",
) => string;
providerNameToOptionKey: (name?: string | null) => string | null;
parsePolicyPresetEnv: (value: string | null) => string[];
patchStagedDockerfile: ShimFn<void>;
pullAndResolveBaseImageDigest: () => { digest: string; ref: string } | null;
@ -148,6 +149,8 @@ function isOnboardTestInternals(
typeof value.agentSupportsWebSearch === "function" &&
typeof value.configureWebSearch === "function" &&
typeof value.formatSandboxBuildEstimateNote === "function" &&
Object.prototype.hasOwnProperty.call(value, "providerNameToOptionKey") &&
typeof value.providerNameToOptionKey === "function" &&
typeof value.shouldRunCompatibleEndpointSandboxSmoke === "function" &&
typeof value.writeSandboxConfigSyncFile === "function"
);
@ -199,6 +202,7 @@ const {
configureWebSearch,
isLoopbackHostname,
normalizeProviderBaseUrl,
providerNameToOptionKey,
parsePolicyPresetEnv,
patchStagedDockerfile,
pullAndResolveBaseImageDigest,
@ -829,6 +833,20 @@ describe("onboard helpers", () => {
);
});
it("maps Model Router sandboxes through managed inference.local", () => {
assert.deepEqual(getSandboxInferenceConfig("nvidia-routed", "nvidia-router"), {
providerKey: "inference",
primaryModelRef: "inference/nvidia-routed",
inferenceBaseUrl: "https://inference.local/v1",
inferenceApi: "openai-completions",
inferenceCompat: null,
});
});
it("maps persisted Model Router provider back to the routed provider option", () => {
assert.equal(providerNameToOptionKey("nvidia-router"), "routed");
});
it("leaves Kimi K2.6 compat to the model-specific setup registry", () => {
assert.deepEqual(
getSandboxInferenceConfig("moonshotai/kimi-k2.6", "nvidia-prod", "openai-completions"),
@ -2406,6 +2424,145 @@ const { setupInference } = require(${onboardPath});
assert.equal(payload.nvidiaApiKey, "nvapi-secret-value");
});
it("configures Model Router as a host provider while sandboxes keep inference.local", () => {
const repoRoot = path.join(import.meta.dirname, "..");
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-router-inference-"));
const fakeBin = path.join(tmpDir, "bin");
const scriptPath = path.join(tmpDir, "setup-router-check.js");
const onboardPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "onboard.js"));
const runnerPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "runner.js"));
const registryPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "state", "registry.js"));
fs.mkdirSync(fakeBin, { recursive: true });
fs.writeFileSync(path.join(fakeBin, "openshell"), "#!/usr/bin/env bash\nexit 0\n", {
mode: 0o755,
});
fs.writeFileSync(
path.join(fakeBin, "model-router"),
[
"#!/usr/bin/env node",
'const fs = require("fs");',
'const http = require("http");',
'const path = require("path");',
"const args = process.argv.slice(2);",
'if (args[0] === "proxy-config") {',
' const output = args[args.indexOf("--output") + 1];',
" fs.mkdirSync(path.dirname(output), { recursive: true });",
' fs.writeFileSync(output, "model_list: []\\n");',
" process.exit(0);",
"}",
'if (args[0] === "proxy") {',
' const port = Number(args[args.indexOf("--port") + 1] || "4000");',
" const server = http.createServer((req, res) => {",
' if (req.url === "/health") { res.statusCode = 200; res.end("ok"); return; }',
" res.statusCode = 404;",
" res.end();",
" });",
' server.listen(port, "127.0.0.1");',
" setTimeout(() => process.exit(0), 10000);",
"} else {",
" process.exit(1);",
"}",
"",
].join("\n"),
{ mode: 0o755 },
);
const script = String.raw`
const runner = require(${runnerPath});
const _n = (c) => (Array.isArray(c) ? c.join(" ") : String(c)).replace(/'/g, "");
const registry = require(${registryPath});
const commands = [];
runner.run = (command, opts = {}) => {
const cmd = _n(command);
commands.push({ command: cmd, env: opts.env || null });
if (cmd.includes("provider get")) return { status: 1, stdout: "", stderr: "" };
return { status: 0, stdout: "", stderr: "" };
};
runner.runCapture = (command) => {
const cmd = _n(command);
if (cmd.includes("inference") && cmd.includes("get")) {
return [
"Gateway inference:",
"",
" Route: inference.local",
" Provider: nvidia-router",
" Model: nvidia-routed",
" Version: 1",
].join("\\n");
}
return "";
};
registry.updateSandbox = () => true;
process.env.NVIDIA_API_KEY = "nvapi-router-secret";
const { setupInference, getSandboxInferenceConfig } = require(${onboardPath});
(async () => {
await setupInference(
"router-box",
"nvidia-routed",
"nvidia-router",
"http://host.openshell.internal:4000/v1",
"NVIDIA_API_KEY",
);
console.log(JSON.stringify({
commands,
sandboxConfig: getSandboxInferenceConfig("nvidia-routed", "nvidia-router", "openai-completions"),
}));
})().catch((error) => {
console.error(error);
process.exit(1);
});
`;
fs.writeFileSync(scriptPath, script);
const result = spawnSync(process.execPath, [scriptPath], {
cwd: repoRoot,
encoding: "utf-8",
env: {
...process.env,
HOME: tmpDir,
PATH: `${fakeBin}:${process.env.PATH || ""}`,
},
});
assert.equal(result.status, 0, result.stderr);
const payload = parseStdoutJson<{
commands: CommandEntry[];
sandboxConfig: SandboxInferenceConfig;
}>(result.stdout);
const providerCommand = payload.commands.find((entry) =>
/provider create/.test(entry.command),
);
assert.ok(providerCommand, JSON.stringify(payload.commands));
assert.match(providerCommand.command, /--name nvidia-router/);
assert.match(providerCommand.command, /--credential NVIDIA_API_KEY/);
assert.match(
providerCommand.command,
/OPENAI_BASE_URL=http:\/\/host\.openshell\.internal:4000\/v1/,
);
assert.doesNotMatch(providerCommand.command, /nvapi-router-secret/);
assert.equal(providerCommand.env?.NVIDIA_API_KEY, "nvapi-router-secret");
const inferenceCommand = payload.commands.find((entry) =>
/inference set/.test(entry.command),
);
assert.ok(inferenceCommand, JSON.stringify(payload.commands));
assert.match(inferenceCommand.command, /--provider nvidia-router/);
assert.match(inferenceCommand.command, /--model nvidia-routed/);
assert.deepEqual(payload.sandboxConfig, {
providerKey: "inference",
primaryModelRef: "inference/nvidia-routed",
inferenceBaseUrl: "https://inference.local/v1",
inferenceApi: "openai-completions",
inferenceCompat: null,
});
});
it("does not delete saved OpenAI credentials when configuring local vLLM", () => {
const repoRoot = path.join(import.meta.dirname, "..");
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-local-vllm-"));