mirror of
https://github.com/NVIDIA/NemoClaw.git
synced 2026-07-03 03:37:16 +00:00
feat(router): add model router integration with complexity-based routing (#2202)
## Summary - Add NVIDIA LLM Router v3 (prefill-based) as an inference provider, enabling complexity-based routing that automatically picks the most efficient model for each query - Add Model Router as a provider option in the onboard wizard, following the same pattern as Ollama/NIM (onboard starts the service, configures the provider) - Router runs on the host; sandbox reaches it through the OpenShell gateway L7 proxy ## Changes - Add llm-router v3 as a git submodule at nemoclaw-blueprint/router/llm-router/ - Add pool-config.yaml defining the model pool and routing parameters - Add routed inference profile to blueprint.yaml with model name nvidia-routed - Add router startup logic to onboard wizard (startModelRouter in onboard.ts) - Add router schema to schemas/blueprint.schema.json - Remove router lifecycle code from blueprint runner (onboard owns service startup) - Remove model-router-toolkit install from Dockerfile (router is host-only) - Add model-router-toolkit install to scripts/install.sh - Update README with Model Router documentation and architecture ## Verification - [x] npx prek run --all-files passes (hadolint and CLI test failures are pre-existing) - [x] npm test passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes ## Test plan - [x] Unit tests pass (322 plugin tests, 10 files) - [x] nemoclaw onboard --non-interactive with NEMOCLAW_PROVIDER=routed completes all 8 steps - [x] Router starts on port 4000, both pool models healthy - [x] Inference works end-to-end: sandbox to OpenShell gateway to router to NVIDIA API - [x] nvidia-routed model name resolves via model_group_alias in LiteLLM proxy config - [x] Routing strategy injects eagerly at startup (no lazy first-request 400 error) Made with [Cursor](https://cursor.com) Signed-off-by: Vinay Bagade <vbagade@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Optional Model Router: complexity-based routed inference selectable during onboarding; host router runs on port 4000 with configurable model pool and provider mapping; onboarding now starts and monitors the router and persists its process ID. * Installer detects and installs the Model Router when present. * **Documentation** * README updated with Model Router architecture, enablement, and pool configuration guidance. * **Tests** * New tests covering onboarding and routed/blueprint scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Vinay Bagade <vbagade@nvidia.com> Co-authored-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
parent
283ee50d83
commit
ca1d6b84a5
14 changed files with 992 additions and 3 deletions
144
.agents/skills/nemoclaw-user-triage-instructions/SKILL.md
Normal file
144
.agents/skills/nemoclaw-user-triage-instructions/SKILL.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
---
|
||||
name: "nemoclaw-user-triage-instructions"
|
||||
description: "AI-assisteds label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard."
|
||||
---
|
||||
|
||||
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
|
||||
<!-- SPDX-License-Identifier: Apache-2.0 -->
|
||||
|
||||
# NemoClaw User Triage Instructions
|
||||
|
||||
AI-assisted label triage instructions for NVIDIA/NemoClaw issues and PRs. Single source of truth for the nemoclaw-maintainer-triage CLI skill and the nvoss-velocity dashboard.
|
||||
|
||||
This document is the single source of truth for AI-assisted label triage on NVIDIA/NemoClaw issues and PRs.
|
||||
It is read at runtime by the `nemoclaw-maintainer-triage` CLI skill and fetched at generation time by the nvoss-velocity dashboard.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Role
|
||||
|
||||
You are a GitHub issue and PR labeler for NemoClaw, NVIDIA's open-source agentic AI assistant framework.
|
||||
|
||||
For each item:
|
||||
|
||||
1. Assign 1–5 labels from the provided list that best match the content. Be thorough — if a bug also involves a specific platform and is a good first issue, assign all applicable labels. Only skip a label if it genuinely does not apply.
|
||||
2. Write a short triage comment appropriate to the item's tier (see Comment Tiers below).
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Output Format
|
||||
|
||||
Return ONLY valid JSON — no markdown fences, no explanation:
|
||||
|
||||
```json
|
||||
{"results": [{"number": 123, "labels": ["bug", "good first issue"], "reason": "One sentence explaining label choices.", "comment": "Comment text."}]}
|
||||
```
|
||||
|
||||
Fields:
|
||||
|
||||
- `number` — the issue or PR number
|
||||
- `labels` — array of label names, exactly as provided in the label list
|
||||
- `reason` — one concise sentence explaining why these labels apply
|
||||
- `comment` — triage comment text (see Comment Tiers)
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Label Assignment Rules
|
||||
|
||||
- Use only label names exactly as provided in the label list
|
||||
- Assign 1–5 labels per item — apply every label that genuinely fits
|
||||
- If a specific `enhancement: *` sub-label is assigned, do NOT also assign the bare `enhancement` label — the sub-label is sufficient
|
||||
- If genuinely unclear, assign `question`
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Skip Labels
|
||||
|
||||
Never assign these — they require human judgment:
|
||||
|
||||
- `duplicate`
|
||||
- `invalid`
|
||||
- `wontfix`
|
||||
- `priority: medium`
|
||||
- `priority: low`
|
||||
- `status: triage`
|
||||
- `NV QA`
|
||||
|
||||
`priority: high` is allowed ONLY when the issue clearly blocks critical functionality, causes data loss, or describes a production outage — not based on the author's frustration or urgency language alone.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Label Guide
|
||||
|
||||
Use these descriptions to match labels to issue/PR content:
|
||||
|
||||
- `bug`: User reports something broken — unexpected error, crash, exception, traceback, "not working", "fails", "broken", unexpected behavior
|
||||
- `enhancement`: Generic enhancement — use only if none of the specific `enhancement: *` sub-types clearly apply
|
||||
- `enhancement: feature`: Request for a new capability — "would be great if", "feature request", "add support for", "please add"
|
||||
- `enhancement: inference`: Inference routing, model support, provider configuration
|
||||
- `enhancement: security`: Security controls, policies, audit logging
|
||||
- `enhancement: policy`: Network policy, egress rules, sandbox policy
|
||||
- `enhancement: ui`: CLI UX, output formatting, terminal display
|
||||
- `enhancement: platform`: Cross-platform support (pair with a `Platform: *` label)
|
||||
- `enhancement: provider`: Cloud or inference provider support (pair with a `Provider: *` label)
|
||||
- `enhancement: performance`: Speed, resource usage, memory, latency
|
||||
- `enhancement: reliability`: Stability, error handling, recovery, retries
|
||||
- `enhancement: testing`: Test coverage, CI/CD quality, test infrastructure
|
||||
- `enhancement: MCP`: MCP protocol support, tool integration
|
||||
- `enhancement: CI/CD`: Pipeline, build system, automation
|
||||
- `enhancement: documentation`: Docs improvements, examples, guides
|
||||
- `question`: Asking how to do something — "how do I", "is it possible", "does X support"
|
||||
- `documentation`: Missing or incorrect docs, README errors, API doc gaps
|
||||
- `good first issue`: Small well-scoped fix, doc typo, clear simple change — easy entry point for new contributors
|
||||
- `help wanted`: Clear fix or improvement that needs a community contribution
|
||||
- `security`: Auth issues, API key exposure, CVE, vulnerability, unauthorized access
|
||||
- `status: needs-info`: Issue or PR has no description, no reproduction steps, or so little detail the team cannot act on it
|
||||
- `priority: high`: Issue blocks critical functionality, causes data loss, or describes a production outage — apply only when the report clearly describes severe, reproducible impact
|
||||
- `Platform: MacOS`: Issue specific to macOS, Mac OS X, or Apple Silicon (M1/M2/M3/M4). Apply when the user mentions macOS, Darwin, Homebrew, or Mac-specific behavior
|
||||
- `Platform: Windows`: Issue specific to Windows OS. Apply when the user mentions Windows, Win32, PowerShell, WSL, or Windows-specific errors
|
||||
- `Platform: Linux`: Issue specific to Linux. Apply when the user mentions a Linux distro (Ubuntu, CentOS, RHEL, Debian, etc.) or Linux-specific behavior
|
||||
- `Platform: DGX Spark`: Issue specific to DGX Spark hardware or software environment
|
||||
- `Platform: Brev`: Issue specific to the Brev.dev cloud environment
|
||||
- `Platform: ARM64`: Issue specific to ARM64 / aarch64 architecture
|
||||
- `Integration: Slack`: Issue or feature involving the Slack integration or Slack bridge
|
||||
- `Integration: Discord`: Issue or feature involving the Discord integration
|
||||
- `Integration: Telegram`: Issue or feature involving the Telegram integration
|
||||
- `Integration: GitHub`: Issue or feature involving GitHub-specific behavior (not the repo itself)
|
||||
- `Provider: NVIDIA`: Issue or feature specific to NVIDIA inference endpoints or NIM
|
||||
- `Provider: OpenAI`: Issue or feature specific to OpenAI API or models
|
||||
- `Provider: Anthropic`: Issue or feature specific to Anthropic / Claude models
|
||||
- `Provider: Azure`: Issue or feature specific to Azure OpenAI or Azure cloud
|
||||
- `Provider: AWS`: Issue or feature specific to AWS Bedrock or AWS cloud
|
||||
- `Provider: GCP`: Issue or feature specific to Google Cloud / Vertex AI
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Comment Tiers
|
||||
|
||||
Items are classified as `quality_tier` or `standard_tier` before generation. This is passed in the item metadata.
|
||||
|
||||
- **quality_tier** (influencer author, company-affiliated author, or body > 800 chars): Write 2–3 sentences. Start with "Thanks," then naturally reference specific details from the body. Avoid "I've taken a look at", "I've reviewed", "it appears to", "I can see that" — these sound bot-generated. Write like a human maintainer giving a warm, specific response.
|
||||
- **standard_tier**: Write 1 sentence acknowledging the report and mentioning the labels applied.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Tone Rules (strictly enforced)
|
||||
|
||||
- Use "could" not "should"; use "may" not "will" — this is a first response, not a commitment
|
||||
- Never say "Thanks for fixing" — say "Thanks for the proposed fix" or "Thanks for submitting this"
|
||||
- Never say "Thanks for adding" — say "Thanks for the suggested addition"
|
||||
- Never claim the submission accomplishes something before review
|
||||
- Do not say "I'll" or "we'll"
|
||||
- For issues (bugs, questions, enhancements): use "this identifies a..." or "this reports a..."
|
||||
- For PRs: use "this proposes a way to..."
|
||||
- For security-related items: never confirm a vulnerability is real; use neutral language
|
||||
- Do NOT open with praise about detail or thoroughness. Only reference the quality of the report if the body is genuinely exceptional — multiple reproduction steps, version info, logs, and clear expected vs actual behavior. For most reports, skip the praise entirely and go straight to the triage acknowledgment.
|
||||
- Do not add generic closing filler phrases
|
||||
- If a "Spam signal:" line is present in the item metadata, assign only `status: needs-info` and ask for more detail politely
|
||||
- If a "Note: Author also opened..." line is present, briefly acknowledge if the relationship is plausible
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `nemoclaw-user-skills-coding` — Agent Skills — all available maintainer and user skills
|
||||
4
.gitmodules
vendored
Normal file
4
.gitmodules
vendored
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
[submodule "nemoclaw-blueprint/router/llm-router"]
|
||||
path = nemoclaw-blueprint/router/llm-router
|
||||
url = https://github.com/NVIDIA-AI-Blueprints/llm-router.git
|
||||
branch = v3
|
||||
55
README.md
55
README.md
|
|
@ -138,6 +138,58 @@ Alternatively, send a single message and print the response:
|
|||
openclaw agent --agent main --local -m "hello" --session-id test
|
||||
```
|
||||
|
||||
### Model Router (Complexity-Based Routing)
|
||||
|
||||
NemoClaw includes an optional model router that automatically picks the most efficient model for each query. Instead of sending every request to a single large model, the router uses a lightweight encoder to predict which model in a pool can handle each query correctly, then routes to the cheapest one that meets an accuracy threshold.
|
||||
|
||||
The router uses the [NVIDIA LLM Router v3](https://github.com/NVIDIA-AI-Blueprints/llm-router/tree/v3) prefill routing engine and runs on the host as a LiteLLM proxy. The sandbox reaches it through the OpenShell gateway and continues to call `https://inference.local/v1`; do not probe `localhost:4000` or `host.openshell.internal` directly from inside the sandbox.
|
||||
|
||||
#### Enable during onboard
|
||||
|
||||
Select **Model Router (complexity-based routing)** during the onboard wizard, or set `NEMOCLAW_PROVIDER=routed` for non-interactive mode:
|
||||
|
||||
```bash
|
||||
NEMOCLAW_PROVIDER=routed nemoclaw onboard --non-interactive
|
||||
```
|
||||
|
||||
The onboard wizard starts the router, configures the OpenShell provider, and creates the sandbox. The router process runs on the host on port 4000.
|
||||
|
||||
#### Configure the model pool
|
||||
|
||||
Edit `nemoclaw-blueprint/router/pool-config.yaml` to define which models the router can choose from:
|
||||
|
||||
```yaml
|
||||
routing:
|
||||
method: prefill
|
||||
checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
|
||||
tolerance: 0.20
|
||||
encoder: Qwen/Qwen3.5-0.8B
|
||||
|
||||
models:
|
||||
- name: nano
|
||||
litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
|
||||
cost_per_m_input_tokens: 0.05
|
||||
api_base: "https://inference-api.nvidia.com"
|
||||
|
||||
- name: super
|
||||
litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
|
||||
cost_per_m_input_tokens: 0.10
|
||||
api_base: "https://inference-api.nvidia.com"
|
||||
```
|
||||
|
||||
The `tolerance` parameter controls the accuracy-cost tradeoff: 0.0 always picks the most accurate model, 1.0 always picks the cheapest, and 0.20 (default) allows up to 20 percentage points below the best for a cheaper model.
|
||||
|
||||
#### Architecture
|
||||
|
||||
The router runs on the host, not inside the sandbox:
|
||||
|
||||
```text
|
||||
Sandbox (OpenClaw) ──> OpenShell Gateway (L7 proxy) ──> Model Router (:4000) ──> NVIDIA API
|
||||
└── PrefillRouter selects model
|
||||
```
|
||||
|
||||
Credentials flow through the OpenShell provider system. The sandbox never sees raw API keys.
|
||||
|
||||
### Uninstall
|
||||
|
||||
To remove NemoClaw and all resources created during setup, run the CLI's built-in uninstall command:
|
||||
|
|
@ -197,6 +249,9 @@ NemoClaw/
|
|||
│ ├── commands/ # Slash commands, migration state
|
||||
│ └── onboard/ # Onboarding config
|
||||
├── nemoclaw-blueprint/ # Blueprint YAML and network policies
|
||||
│ └── router/
|
||||
│ ├── pool-config.yaml # Model pool and routing config
|
||||
│ └── llm-router/ # LLM Router v3 submodule (prefill routing engine)
|
||||
├── scripts/ # Install helpers, setup, automation
|
||||
├── test/ # Integration and E2E tests
|
||||
└── docs/ # User-facing docs (Sphinx/MyST)
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ profiles:
|
|||
- ncp
|
||||
- nim-local
|
||||
- vllm
|
||||
- routed
|
||||
|
||||
description: |
|
||||
NemoClaw blueprint: orchestrates OpenClaw sandbox creation, migration,
|
||||
|
|
@ -70,6 +71,19 @@ components:
|
|||
credential_default: "dummy"
|
||||
timeout_secs: 180
|
||||
|
||||
routed:
|
||||
provider_type: "openai"
|
||||
provider_name: "nvidia-router"
|
||||
endpoint: "http://localhost:4000/v1"
|
||||
model: "nvidia-routed"
|
||||
credential_env: "NVIDIA_API_KEY"
|
||||
timeout_secs: 180
|
||||
|
||||
router:
|
||||
enabled: true
|
||||
port: 4000
|
||||
pool_config_path: "router/pool-config.yaml"
|
||||
|
||||
policy:
|
||||
base: "sandboxes/openclaw/policy.yaml"
|
||||
additions:
|
||||
|
|
|
|||
1
nemoclaw-blueprint/router/llm-router
Submodule
1
nemoclaw-blueprint/router/llm-router
Submodule
|
|
@ -0,0 +1 @@
|
|||
Subproject commit 2bd8dfaa751efb60aa4e7e49b270490dfbc0a68a
|
||||
36
nemoclaw-blueprint/router/pool-config.yaml
Normal file
36
nemoclaw-blueprint/router/pool-config.yaml
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
# Model Router Toolkit pool config for complexity-based routing.
|
||||
# Uses the NVIDIA LLM Router v3 (https://github.com/NVIDIA-AI-Blueprints/llm-router)
|
||||
# to route requests to the most efficient model that meets an accuracy threshold.
|
||||
#
|
||||
# The router runs an encoder (Qwen3.5-0.8B) on each query to predict P(correct)
|
||||
# per model, then selects the cheapest model above the tolerance threshold.
|
||||
#
|
||||
# tolerance controls the accuracy-cost tradeoff:
|
||||
# 0.0 = always pick the highest-confidence model (most expensive)
|
||||
# 0.20 = allow up to 20pp below the best for a cheaper model (default)
|
||||
# 1.0 = always pick the cheapest model
|
||||
|
||||
routing:
|
||||
method: prefill
|
||||
checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
|
||||
tolerance: 0.20
|
||||
encoder: Qwen/Qwen3.5-0.8B
|
||||
encoder_backend: transformers
|
||||
|
||||
models:
|
||||
- name: nemotron-3-nano-reasoning
|
||||
display_name: "Nemotron 3 Nano (Reasoning)"
|
||||
litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
|
||||
cost_per_m_input_tokens: 0.05
|
||||
cost_per_m_output_tokens: 0.20
|
||||
api_base: "https://inference-api.nvidia.com"
|
||||
|
||||
- name: nemotron-3-super
|
||||
display_name: "Nemotron 3 Super 120B"
|
||||
litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
|
||||
cost_per_m_input_tokens: 0.10
|
||||
cost_per_m_output_tokens: 0.40
|
||||
api_base: "https://inference-api.nvidia.com"
|
||||
|
|
@ -122,6 +122,38 @@ function minimalBlueprint(overrides?: Record<string, unknown>): Record<string, u
|
|||
};
|
||||
}
|
||||
|
||||
function routedBlueprint(): Record<string, unknown> {
|
||||
return {
|
||||
version: "1.0",
|
||||
components: {
|
||||
inference: {
|
||||
profiles: {
|
||||
routed: {
|
||||
provider_type: "openai",
|
||||
provider_name: "nvidia-router",
|
||||
endpoint: "http://localhost:4000/v1",
|
||||
model: "routed",
|
||||
credential_env: "NVIDIA_API_KEY",
|
||||
credential_default: "router-local",
|
||||
timeout_secs: 180,
|
||||
},
|
||||
},
|
||||
},
|
||||
sandbox: {
|
||||
image: "openclaw",
|
||||
name: "test-sandbox",
|
||||
forward_ports: [18789],
|
||||
},
|
||||
router: {
|
||||
enabled: true,
|
||||
port: 4000,
|
||||
pool_config_path: "router/pool-config.yaml",
|
||||
},
|
||||
policy: { additions: {} },
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function seedBlueprintFile(bp?: Record<string, unknown>): void {
|
||||
addFile("blueprint.yaml", YAML.stringify(bp ?? minimalBlueprint()));
|
||||
}
|
||||
|
|
@ -378,6 +410,25 @@ describe("runner", () => {
|
|||
expect(out).toContain("PROGRESS:10:Validating blueprint");
|
||||
expect(out).toContain("PROGRESS:100:Plan complete");
|
||||
});
|
||||
|
||||
it("includes router info when router is enabled", async () => {
|
||||
captureStdout();
|
||||
mockExeca.mockResolvedValue({ exitCode: 0 });
|
||||
|
||||
const plan = await actionPlan("routed", routedBlueprint());
|
||||
expect(plan.router.enabled).toBe(true);
|
||||
expect(plan.router.port).toBe(4000);
|
||||
expect(plan.router.pool_config_path).toBe("router/pool-config.yaml");
|
||||
});
|
||||
|
||||
it("defaults router to disabled when not in blueprint", async () => {
|
||||
captureStdout();
|
||||
mockExeca.mockResolvedValue({ exitCode: 0 });
|
||||
|
||||
const plan = await actionPlan("default", minimalBlueprint());
|
||||
expect(plan.router.enabled).toBe(false);
|
||||
expect(plan.router.port).toBe(4000);
|
||||
});
|
||||
});
|
||||
|
||||
describe("actionApply", () => {
|
||||
|
|
@ -679,6 +730,24 @@ describe("runner", () => {
|
|||
if (!inferenceCall) throw new Error("inference set call not found");
|
||||
expect(inferenceCall[1]).not.toContain("--timeout");
|
||||
});
|
||||
|
||||
it("passes endpoint as-is from blueprint (no rewriting)", async () => {
|
||||
process.env.NVIDIA_API_KEY = "test-key";
|
||||
try {
|
||||
await actionApply("routed", routedBlueprint());
|
||||
|
||||
const providerCall = mockExeca.mock.calls.find(
|
||||
(c) => Array.isArray(c[1]) && c[1].includes("provider"),
|
||||
);
|
||||
if (!providerCall) throw new Error("provider create call not found");
|
||||
const configArg = (providerCall[1] as string[]).find((a: string) =>
|
||||
a.startsWith("OPENAI_BASE_URL="),
|
||||
);
|
||||
expect(configArg).toBe("OPENAI_BASE_URL=http://localhost:4000/v1");
|
||||
} finally {
|
||||
delete process.env.NVIDIA_API_KEY;
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("actionStatus", () => {
|
||||
|
|
|
|||
|
|
@ -50,6 +50,10 @@ function isOptionalFiniteNumber(value: unknown): value is number | undefined {
|
|||
return value === undefined || (typeof value === "number" && Number.isFinite(value));
|
||||
}
|
||||
|
||||
function isOptionalBoolean(value: unknown): value is boolean | undefined {
|
||||
return value === undefined || typeof value === "boolean";
|
||||
}
|
||||
|
||||
function isValidPort(value: unknown): value is number {
|
||||
return typeof value === "number" && Number.isInteger(value) && value >= 1 && value <= 65535;
|
||||
}
|
||||
|
|
@ -142,6 +146,20 @@ function isBlueprint(value: unknown): value is Blueprint {
|
|||
}
|
||||
}
|
||||
|
||||
const router = components.router;
|
||||
if (router !== undefined) {
|
||||
if (!isObjectLike(router)) {
|
||||
return false;
|
||||
}
|
||||
if (
|
||||
!isOptionalBoolean(router.enabled) ||
|
||||
!(router.port === undefined || isValidPort(router.port)) ||
|
||||
!isOptionalString(router.pool_config_path)
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
const policy = components.policy;
|
||||
if (policy !== undefined) {
|
||||
if (!isObjectLike(policy)) {
|
||||
|
|
@ -199,6 +217,7 @@ interface Blueprint {
|
|||
profiles?: InferenceProfileMap;
|
||||
};
|
||||
sandbox?: SandboxConfig;
|
||||
router?: RouterConfig;
|
||||
policy?: {
|
||||
additions?: PolicyAdditions;
|
||||
};
|
||||
|
|
@ -221,6 +240,14 @@ interface SandboxConfig {
|
|||
forward_ports?: number[];
|
||||
}
|
||||
|
||||
interface RouterConfig {
|
||||
enabled?: boolean;
|
||||
port?: number;
|
||||
pool_config_path?: string;
|
||||
}
|
||||
|
||||
const DEFAULT_ROUTER_PORT = 4000;
|
||||
|
||||
export function loadBlueprint(): Blueprint {
|
||||
const blueprintPath = process.env.NEMOCLAW_BLUEPRINT_PATH ?? ".";
|
||||
const bpFile = join(blueprintPath, "blueprint.yaml");
|
||||
|
|
@ -272,6 +299,7 @@ async function resolveRunConfig(
|
|||
inferenceProfiles: InferenceProfileMap;
|
||||
inferenceCfg: InferenceProfile;
|
||||
sandboxCfg: SandboxConfig;
|
||||
routerCfg: RouterConfig;
|
||||
}> {
|
||||
const inferenceProfiles = blueprint.components?.inference?.profiles ?? {};
|
||||
if (!(profile in inferenceProfiles)) {
|
||||
|
|
@ -297,7 +325,8 @@ async function resolveRunConfig(
|
|||
}
|
||||
|
||||
const sandboxCfg = blueprint.components?.sandbox ?? {};
|
||||
return { inferenceProfiles, inferenceCfg, sandboxCfg };
|
||||
const routerCfg = blueprint.components?.router ?? {};
|
||||
return { inferenceProfiles, inferenceCfg, sandboxCfg, routerCfg };
|
||||
}
|
||||
|
||||
// ── Actions ─────────────────────────────────────────────────────
|
||||
|
|
@ -317,6 +346,11 @@ export interface RunPlan {
|
|||
model: string | undefined;
|
||||
credential_env: string | undefined;
|
||||
};
|
||||
router: {
|
||||
enabled: boolean;
|
||||
port: number;
|
||||
pool_config_path: string | undefined;
|
||||
};
|
||||
policy_additions: PolicyAdditions;
|
||||
dry_run: boolean;
|
||||
}
|
||||
|
|
@ -329,7 +363,7 @@ export async function actionPlan(
|
|||
const rid = emitRunId();
|
||||
progress(10, "Validating blueprint");
|
||||
|
||||
const { inferenceCfg, sandboxCfg } = await resolveRunConfig(
|
||||
const { inferenceCfg, sandboxCfg, routerCfg } = await resolveRunConfig(
|
||||
profile,
|
||||
blueprint,
|
||||
options?.endpointUrl,
|
||||
|
|
@ -342,6 +376,9 @@ export async function actionPlan(
|
|||
);
|
||||
}
|
||||
|
||||
const routerEnabled = routerCfg.enabled === true;
|
||||
const routerPort = routerCfg.port ?? DEFAULT_ROUTER_PORT;
|
||||
|
||||
const plan: RunPlan = {
|
||||
run_id: rid,
|
||||
profile,
|
||||
|
|
@ -357,6 +394,11 @@ export async function actionPlan(
|
|||
model: inferenceCfg.model,
|
||||
credential_env: inferenceCfg.credential_env,
|
||||
},
|
||||
router: {
|
||||
enabled: routerEnabled,
|
||||
port: routerPort,
|
||||
pool_config_path: routerCfg.pool_config_path,
|
||||
},
|
||||
policy_additions: blueprint.components?.policy?.additions ?? {},
|
||||
dry_run: options?.dryRun ?? false,
|
||||
};
|
||||
|
|
|
|||
|
|
@ -79,6 +79,27 @@
|
|||
}
|
||||
}
|
||||
},
|
||||
"router": {
|
||||
"type": "object",
|
||||
"description": "Model router configuration for complexity-based routing.",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"enabled": {
|
||||
"type": "boolean",
|
||||
"description": "Whether the model router is active."
|
||||
},
|
||||
"port": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"maximum": 65535,
|
||||
"description": "Port the router proxy listens on."
|
||||
},
|
||||
"pool_config_path": {
|
||||
"type": "string",
|
||||
"description": "Path to the router pool config YAML relative to the blueprint directory."
|
||||
}
|
||||
}
|
||||
},
|
||||
"policy": {
|
||||
"type": "object",
|
||||
"required": ["base"],
|
||||
|
|
|
|||
|
|
@ -1320,6 +1320,44 @@ is_source_checkout() {
|
|||
return 1
|
||||
}
|
||||
|
||||
init_nemoclaw_submodules() {
|
||||
local root="$1"
|
||||
[[ -f "$root/.gitmodules" ]] || return 0
|
||||
git -C "$root" rev-parse --git-dir >/dev/null 2>&1 || return 0
|
||||
git -C "$root" submodule update --init --depth 1 2>/dev/null
|
||||
}
|
||||
|
||||
is_routed_provider_requested() {
|
||||
local provider="${NEMOCLAW_PROVIDER:-}"
|
||||
provider="$(printf '%s' "$provider" | tr '[:upper:]' '[:lower:]')"
|
||||
[[ "$provider" == "routed" ]]
|
||||
}
|
||||
|
||||
install_model_router_if_present() {
|
||||
local root="$1"
|
||||
local router_dir="$root/nemoclaw-blueprint/router/llm-router"
|
||||
|
||||
if ! command_exists pip3; then
|
||||
is_routed_provider_requested && error "pip3 is required for routed inference."
|
||||
return 0
|
||||
fi
|
||||
if [[ ! -d "$router_dir" ]]; then
|
||||
is_routed_provider_requested && error "llm-router is required for routed inference but is missing."
|
||||
return 0
|
||||
fi
|
||||
if [[ ! -f "$router_dir/pyproject.toml" && ! -f "$router_dir/setup.py" ]]; then
|
||||
is_routed_provider_requested && error "llm-router is required for routed inference but is not initialized."
|
||||
warn "Skipping model router install — llm-router submodule is not initialized."
|
||||
return 0
|
||||
fi
|
||||
if ! spin "Installing model router" pip3 install --quiet --user "${router_dir}[prefill,proxy]"; then
|
||||
if is_routed_provider_requested; then
|
||||
error "pip3 install of llm-router failed"
|
||||
fi
|
||||
warn "Skipping model router install — pip3 install failed"
|
||||
fi
|
||||
}
|
||||
|
||||
install_nemoclaw() {
|
||||
command_exists git || error "git was not found on PATH."
|
||||
local repo_root package_json
|
||||
|
|
@ -1335,9 +1373,14 @@ install_nemoclaw() {
|
|||
spin "Preparing OpenClaw package" bash -c "$(declare -f info warn resolve_openclaw_version pre_extract_openclaw); pre_extract_openclaw \"\$1\"" _ "$NEMOCLAW_SOURCE_ROOT" \
|
||||
|| warn "Pre-extraction failed — npm install may fail if openclaw tarball is broken"
|
||||
fi
|
||||
if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$NEMOCLAW_SOURCE_ROOT"; then
|
||||
is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
|
||||
warn "Submodule initialization failed — model router support may be unavailable"
|
||||
fi
|
||||
spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm install --ignore-scripts"
|
||||
spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm run --if-present build:cli"
|
||||
spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\"/nemoclaw && npm install --ignore-scripts && npm run build"
|
||||
install_model_router_if_present "$NEMOCLAW_SOURCE_ROOT"
|
||||
spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$NEMOCLAW_SOURCE_ROOT\" && npm link"
|
||||
else
|
||||
if [[ -f "$package_json" ]]; then
|
||||
|
|
@ -1356,6 +1399,10 @@ install_nemoclaw() {
|
|||
mkdir -p "$(dirname "$nemoclaw_src")"
|
||||
NEMOCLAW_SOURCE_ROOT="$nemoclaw_src"
|
||||
spin "Cloning ${_CLI_DISPLAY} source" clone_nemoclaw_ref "$release_ref" "$nemoclaw_src"
|
||||
if ! spin "Initializing ${_CLI_DISPLAY} submodules" init_nemoclaw_submodules "$nemoclaw_src"; then
|
||||
is_routed_provider_requested && error "Failed to initialize the llm-router submodule required for routed inference."
|
||||
warn "Submodule initialization failed — model router support may be unavailable"
|
||||
fi
|
||||
# Fetch version tags into the shallow clone so `git describe --tags
|
||||
# --match "v*"` works at runtime (the shallow clone only has the
|
||||
# single ref we asked for).
|
||||
|
|
@ -1371,6 +1418,7 @@ install_nemoclaw() {
|
|||
spin "Installing ${_CLI_DISPLAY} dependencies" bash -c "cd \"$nemoclaw_src\" && npm install --ignore-scripts"
|
||||
spin "Building ${_CLI_DISPLAY} CLI modules" bash -c "cd \"$nemoclaw_src\" && npm run --if-present build:cli"
|
||||
spin "Building ${_CLI_DISPLAY} plugin" bash -c "cd \"$nemoclaw_src\"/nemoclaw && npm install --ignore-scripts && npm run build"
|
||||
install_model_router_if_present "$nemoclaw_src"
|
||||
spin "Linking ${_CLI_DISPLAY} CLI" bash -c "cd \"$nemoclaw_src\" && npm link"
|
||||
|
||||
# Install/upgrade the OpenShell CLI on the GitHub-clone path (curl|bash).
|
||||
|
|
|
|||
|
|
@ -119,6 +119,8 @@ function getProviderLabel(provider) {
|
|||
switch (provider) {
|
||||
case "nvidia-nim":
|
||||
return "NVIDIA Endpoints";
|
||||
case "nvidia-router":
|
||||
return "Model Router";
|
||||
case "vllm-local":
|
||||
return "Local vLLM";
|
||||
case "ollama-local":
|
||||
|
|
@ -142,6 +144,8 @@ function getEffectiveProviderName(providerKey) {
|
|||
return "ollama-local";
|
||||
case "vllm":
|
||||
return "vllm-local";
|
||||
case "routed":
|
||||
return "nvidia-router";
|
||||
default:
|
||||
return providerKey;
|
||||
}
|
||||
|
|
@ -169,6 +173,7 @@ function getNonInteractiveProvider() {
|
|||
"custom",
|
||||
"nim-local",
|
||||
"vllm",
|
||||
"routed",
|
||||
"install-vllm",
|
||||
"install-ollama",
|
||||
"install-windows-ollama",
|
||||
|
|
@ -177,7 +182,7 @@ function getNonInteractiveProvider() {
|
|||
if (!validProviders.has(normalized)) {
|
||||
console.error(` Unsupported NEMOCLAW_PROVIDER: ${providerKey}`);
|
||||
console.error(
|
||||
" Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
|
||||
" Valid values: build, openai, anthropic, anthropicCompatible, gemini, ollama, custom, nim-local, vllm, routed, install-vllm, install-ollama, install-windows-ollama, start-windows-ollama",
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
|
@ -340,6 +345,10 @@ function getSandboxInferenceConfig(
|
|||
supportsStore: false,
|
||||
};
|
||||
break;
|
||||
case "nvidia-router":
|
||||
providerKey = "inference";
|
||||
primaryModelRef = `inference/${model}`;
|
||||
break;
|
||||
case "nvidia-prod":
|
||||
case "nvidia-nim":
|
||||
default:
|
||||
|
|
|
|||
|
|
@ -74,6 +74,8 @@ export interface Session {
|
|||
credentialEnv: string | null;
|
||||
preferredInferenceApi: string | null;
|
||||
nimContainer: string | null;
|
||||
routerPid: number | null;
|
||||
routerCredentialHash: string | null;
|
||||
webSearchConfig: WebSearchConfig | null;
|
||||
policyPresets: string[] | null;
|
||||
messagingChannels: string[] | null;
|
||||
|
|
@ -122,6 +124,8 @@ export interface SessionUpdates {
|
|||
credentialEnv?: string;
|
||||
preferredInferenceApi?: string;
|
||||
nimContainer?: string;
|
||||
routerPid?: number;
|
||||
routerCredentialHash?: string;
|
||||
webSearchConfig?: WebSearchConfig | null;
|
||||
policyPresets?: string[];
|
||||
messagingChannels?: string[];
|
||||
|
|
@ -189,6 +193,10 @@ function readString(value: SessionJsonValue | undefined): string | null {
|
|||
return typeof value === "string" ? value : null;
|
||||
}
|
||||
|
||||
function readPositiveInteger(value: SessionJsonValue | undefined): number | null {
|
||||
return typeof value === "number" && Number.isInteger(value) && value > 0 ? value : null;
|
||||
}
|
||||
|
||||
function readStringArray(value: SessionJsonValue | undefined): string[] | null {
|
||||
if (!Array.isArray(value)) return null;
|
||||
return value.filter((entry): entry is string => typeof entry === "string");
|
||||
|
|
@ -297,6 +305,8 @@ export function createSession(overrides: Partial<Session> = {}): Session {
|
|||
credentialEnv: overrides.credentialEnv ?? null,
|
||||
preferredInferenceApi: overrides.preferredInferenceApi ?? null,
|
||||
nimContainer: overrides.nimContainer ?? null,
|
||||
routerPid: readPositiveInteger(overrides.routerPid),
|
||||
routerCredentialHash: overrides.routerCredentialHash ?? null,
|
||||
webSearchConfig:
|
||||
overrides.webSearchConfig?.fetchEnabled === true ? { fetchEnabled: true } : null,
|
||||
policyPresets: readStringArray(overrides.policyPresets),
|
||||
|
|
@ -333,6 +343,8 @@ export function normalizeSession(data: Session | SessionJsonValue | undefined):
|
|||
credentialEnv: readString(data.credentialEnv),
|
||||
preferredInferenceApi: readString(data.preferredInferenceApi),
|
||||
nimContainer: readString(data.nimContainer),
|
||||
routerPid: readPositiveInteger(data.routerPid),
|
||||
routerCredentialHash: readString(data.routerCredentialHash),
|
||||
webSearchConfig: parseWebSearchConfig(data.webSearchConfig),
|
||||
policyPresets: readStringArray(data.policyPresets),
|
||||
messagingChannels: readStringArray(data.messagingChannels),
|
||||
|
|
@ -692,6 +704,12 @@ export function filterSafeUpdates(updates: SessionUpdates): Partial<Session> {
|
|||
if (typeof updates.preferredInferenceApi === "string")
|
||||
safe.preferredInferenceApi = updates.preferredInferenceApi;
|
||||
if (typeof updates.nimContainer === "string") safe.nimContainer = updates.nimContainer;
|
||||
if (typeof updates.routerPid === "number" && Number.isInteger(updates.routerPid) && updates.routerPid > 0) {
|
||||
safe.routerPid = updates.routerPid;
|
||||
}
|
||||
if (typeof updates.routerCredentialHash === "string") {
|
||||
safe.routerCredentialHash = updates.routerCredentialHash;
|
||||
}
|
||||
if (isObject(updates.webSearchConfig) && updates.webSearchConfig.fetchEnabled === true) {
|
||||
safe.webSearchConfig = { fetchEnabled: true };
|
||||
} else if (updates.webSearchConfig === null) {
|
||||
|
|
|
|||
|
|
@ -770,6 +770,281 @@ function getBlueprintMaxOpenshellVersion(rootDir = ROOT): string | null {
|
|||
return getBlueprintVersionField("max_openshell_version", rootDir);
|
||||
}
|
||||
|
||||
/**
|
||||
* Load a named inference profile and router config from blueprint.yaml.
|
||||
* Returns null if the blueprint or profile is missing.
|
||||
*/
|
||||
type BlueprintRouterConfig = {
|
||||
enabled?: boolean;
|
||||
port?: number;
|
||||
pool_config_path?: string;
|
||||
credential_env?: string;
|
||||
};
|
||||
|
||||
type BlueprintInferenceProfile = {
|
||||
provider_name?: string;
|
||||
endpoint?: string;
|
||||
model: string;
|
||||
credential_env?: string;
|
||||
credential_default?: string;
|
||||
router: BlueprintRouterConfig;
|
||||
};
|
||||
|
||||
function loadBlueprintProfile(
|
||||
profileName: string,
|
||||
rootDir: string = ROOT,
|
||||
): BlueprintInferenceProfile | null {
|
||||
try {
|
||||
const YAML = require("yaml");
|
||||
const blueprintPath = path.join(rootDir, "nemoclaw-blueprint", "blueprint.yaml");
|
||||
if (!fs.existsSync(blueprintPath)) return null;
|
||||
const raw = fs.readFileSync(blueprintPath, "utf8");
|
||||
const parsed = YAML.parse(raw);
|
||||
const profile = parsed?.components?.inference?.profiles?.[profileName];
|
||||
if (!profile) return null;
|
||||
const router = { ...(parsed?.components?.router || {}) };
|
||||
if (typeof profile.credential_env === "string" && profile.credential_env.trim().length > 0) {
|
||||
router.credential_env = profile.credential_env;
|
||||
}
|
||||
return { ...profile, router } as BlueprintInferenceProfile;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
const ROUTER_HEALTH_RETRIES = 15;
|
||||
const ROUTER_HEALTH_INTERVAL_MS = 2000;
|
||||
const ROUTER_HEALTH_TIMEOUT_MS = 3000;
|
||||
|
||||
async function isRouterHealthy(port: number, timeoutMs = ROUTER_HEALTH_TIMEOUT_MS): Promise<boolean> {
|
||||
const http = require("http");
|
||||
return new Promise<boolean>((resolve) => {
|
||||
let settled = false;
|
||||
const settle = (healthy: boolean) => {
|
||||
if (settled) return;
|
||||
settled = true;
|
||||
resolve(healthy);
|
||||
};
|
||||
const request = http
|
||||
.get(`http://127.0.0.1:${port}/health`, (res: import("node:http").IncomingMessage) => {
|
||||
res.resume();
|
||||
settle((res.statusCode || 0) >= 200 && (res.statusCode || 0) < 300);
|
||||
})
|
||||
.on("error", () => settle(false));
|
||||
request.setTimeout(timeoutMs, () => {
|
||||
request.destroy();
|
||||
settle(false);
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
function isProcessRunning(pid: number | null | undefined): boolean {
|
||||
if (!Number.isInteger(pid) || Number(pid) <= 0) return false;
|
||||
try {
|
||||
process.kill(Number(pid), 0);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
async function stopModelRouterProcess(pid: number, port: number): Promise<void> {
|
||||
try {
|
||||
process.kill(pid, "SIGTERM");
|
||||
} catch {
|
||||
return;
|
||||
}
|
||||
for (let attempt = 0; attempt < 10; attempt++) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
|
||||
}
|
||||
try {
|
||||
process.kill(pid, "SIGKILL");
|
||||
} catch {
|
||||
// already stopped
|
||||
}
|
||||
for (let attempt = 0; attempt < 5; attempt++) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Start the model-router proxy and wait for it to become healthy.
|
||||
* Follows the same pattern as Ollama startup (spawn detached, poll health).
|
||||
* Returns the PID of the child process.
|
||||
*/
|
||||
async function startModelRouter(routerCfg: BlueprintRouterConfig): Promise<number> {
|
||||
const port = routerCfg.port || 4000;
|
||||
const blueprintDir = path.join(ROOT, "nemoclaw-blueprint");
|
||||
const poolConfigPath = path.join(
|
||||
blueprintDir,
|
||||
routerCfg.pool_config_path || "router/pool-config.yaml",
|
||||
);
|
||||
const stateDir = path.join(os.homedir(), ".nemoclaw", "state");
|
||||
const litellmConfigPath = path.join(stateDir, "litellm-proxy.yaml");
|
||||
|
||||
fs.mkdirSync(stateDir, { recursive: true });
|
||||
|
||||
const proxyConfigResult = spawnSync(
|
||||
"model-router",
|
||||
["proxy-config", "--config", poolConfigPath, "--output", litellmConfigPath],
|
||||
{ encoding: "utf8", timeout: 30_000, cwd: blueprintDir },
|
||||
);
|
||||
if (proxyConfigResult.status !== 0) {
|
||||
throw new Error(
|
||||
`model-router proxy-config failed: ${proxyConfigResult.stderr || proxyConfigResult.error || "unknown error"}`,
|
||||
);
|
||||
}
|
||||
|
||||
const { buildSubprocessEnv } = require("./subprocess-env");
|
||||
const credEnvVars: Record<string, string> = {};
|
||||
const credName = routerCfg.credential_env || "NVIDIA_API_KEY";
|
||||
const routedCredential = resolveProviderCredential(credName);
|
||||
const openAiCredential = resolveProviderCredential("OPENAI_API_KEY");
|
||||
if (routedCredential) {
|
||||
credEnvVars[credName] = routedCredential;
|
||||
if (!openAiCredential) credEnvVars.OPENAI_API_KEY = routedCredential;
|
||||
}
|
||||
if (openAiCredential) credEnvVars.OPENAI_API_KEY = openAiCredential;
|
||||
const _providerKey = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
|
||||
if (_providerKey) {
|
||||
if (!credEnvVars[credName]) credEnvVars[credName] = _providerKey;
|
||||
if (!credEnvVars.OPENAI_API_KEY) credEnvVars.OPENAI_API_KEY = _providerKey;
|
||||
}
|
||||
|
||||
if (await isRouterHealthy(port)) {
|
||||
throw new Error(
|
||||
`Port ${port} already has a healthy router endpoint; refusing to start a second router.`,
|
||||
);
|
||||
}
|
||||
|
||||
const child = spawn(
|
||||
"model-router",
|
||||
[
|
||||
"proxy",
|
||||
"--litellm-config", litellmConfigPath,
|
||||
"--router-config", poolConfigPath,
|
||||
"--host", "0.0.0.0",
|
||||
"--port", String(port),
|
||||
],
|
||||
{
|
||||
detached: true,
|
||||
stdio: "ignore",
|
||||
cwd: blueprintDir,
|
||||
env: buildSubprocessEnv(credEnvVars),
|
||||
},
|
||||
);
|
||||
let childExited = false;
|
||||
let childExitDetail = "";
|
||||
child.once("error", (err: Error) => {
|
||||
childExited = true;
|
||||
childExitDetail = `child failed to start: ${err.message}`;
|
||||
});
|
||||
child.once("exit", (code: number | null, signal: string | null) => {
|
||||
childExited = true;
|
||||
if (!childExitDetail) {
|
||||
childExitDetail = `child exited with code ${code ?? "null"}${signal ? ` signal ${signal}` : ""}`;
|
||||
}
|
||||
});
|
||||
child.unref();
|
||||
|
||||
const pid = child.pid;
|
||||
if (!pid) {
|
||||
throw new Error(
|
||||
"Failed to start model-router proxy: no PID returned" +
|
||||
(childExitDetail ? ` (${childExitDetail})` : ""),
|
||||
);
|
||||
}
|
||||
|
||||
for (let attempt = 0; attempt < ROUTER_HEALTH_RETRIES; attempt++) {
|
||||
await new Promise((resolve) => setTimeout(resolve, ROUTER_HEALTH_INTERVAL_MS));
|
||||
if (childExited) break;
|
||||
const healthy = await isRouterHealthy(port);
|
||||
let processAlive = true;
|
||||
try {
|
||||
process.kill(pid, 0);
|
||||
} catch {
|
||||
processAlive = false;
|
||||
}
|
||||
if (healthy && processAlive) return pid;
|
||||
if (!processAlive) {
|
||||
childExited = true;
|
||||
if (!childExitDetail) childExitDetail = "child process is no longer running";
|
||||
break;
|
||||
}
|
||||
}
|
||||
try {
|
||||
process.kill(pid, "SIGTERM");
|
||||
} catch {
|
||||
// already dead
|
||||
}
|
||||
throw new Error(
|
||||
`Model router failed to become healthy on port ${port} after ${ROUTER_HEALTH_RETRIES} attempts` +
|
||||
(childExitDetail ? ` (${childExitDetail})` : ""),
|
||||
);
|
||||
}
|
||||
|
||||
function getRoutedProfile(): BlueprintInferenceProfile {
|
||||
const bp = loadBlueprintProfile("routed");
|
||||
if (!bp || bp.router?.enabled !== true) {
|
||||
throw new Error("Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
|
||||
}
|
||||
return bp;
|
||||
}
|
||||
|
||||
function isRoutedInferenceProvider(provider: string | null | undefined): boolean {
|
||||
if (!provider) return false;
|
||||
if (provider === "nvidia-router") return true;
|
||||
const bp = loadBlueprintProfile("routed");
|
||||
return Boolean(bp?.provider_name && provider === bp.provider_name);
|
||||
}
|
||||
|
||||
async function reconcileModelRouter(): Promise<void> {
|
||||
const bp = getRoutedProfile();
|
||||
const routerPort = bp.router.port || 4000;
|
||||
const routerCredentialEnv = bp.router.credential_env || bp.credential_env || "NVIDIA_API_KEY";
|
||||
const routerCredential =
|
||||
hydrateCredentialEnv(routerCredentialEnv) ||
|
||||
normalizeCredentialValue(bp.credential_default || "");
|
||||
if (!routerCredential) {
|
||||
throw new Error(`${routerCredentialEnv} is required to start Model Router.`);
|
||||
}
|
||||
saveCredential(routerCredentialEnv, routerCredential);
|
||||
const routerCredentialHash = hashCredential(routerCredential);
|
||||
const session = onboardSession.loadSession();
|
||||
const recordedPid = session?.routerPid ?? null;
|
||||
const recordedCredentialHash = session?.routerCredentialHash ?? null;
|
||||
|
||||
if (await isRouterHealthy(routerPort)) {
|
||||
if (
|
||||
routerCredentialHash &&
|
||||
recordedCredentialHash === routerCredentialHash &&
|
||||
isProcessRunning(recordedPid)
|
||||
) {
|
||||
console.log(` ✓ Model router is already healthy on port ${routerPort}`);
|
||||
return;
|
||||
}
|
||||
if (isProcessRunning(recordedPid)) {
|
||||
console.log(" Restarting model router with updated credentials...");
|
||||
await stopModelRouterProcess(requireValue(recordedPid, "Expected recorded router PID"), routerPort);
|
||||
} else {
|
||||
throw new Error(
|
||||
`Port ${routerPort} already has a healthy router endpoint, but its credential state is unknown. Stop the existing model-router process and rerun onboarding.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(" Starting model router...");
|
||||
const routerPid = await startModelRouter(bp.router);
|
||||
console.log(` ✓ Model router started (PID ${routerPid}) on port ${routerPort}`);
|
||||
onboardSession.updateSession((current: Session) => {
|
||||
current.routerPid = routerPid;
|
||||
current.routerCredentialHash = routerCredentialHash;
|
||||
return current;
|
||||
});
|
||||
}
|
||||
|
||||
// ── Base image digest resolution ────────────────────────────────
|
||||
// Pulls the sandbox-base image from GHCR and inspects it to get the
|
||||
// actual repo digest. This avoids the registry mismatch that broke
|
||||
|
|
@ -5184,6 +5459,7 @@ function providerNameToOptionKey(
|
|||
opts: { hasNimContainer?: boolean } = {},
|
||||
): string | null {
|
||||
if (!name) return null;
|
||||
if (name === "nvidia-router") return "routed";
|
||||
if (name === "ollama-local") return "ollama";
|
||||
// Local NIM and standalone vLLM both persist as provider="vllm-local". NIM
|
||||
// is positively identified by a nimContainer record; the absence of one in
|
||||
|
|
@ -5595,6 +5871,12 @@ async function setupNim(
|
|||
}
|
||||
}
|
||||
|
||||
// Model Router: complexity-based routing via blueprint config.
|
||||
const blueprintRouterCfg = loadBlueprintProfile("routed");
|
||||
if (blueprintRouterCfg && blueprintRouterCfg.router?.enabled === true) {
|
||||
options.push({ key: "routed", label: "Model Router (complexity-based routing)" });
|
||||
}
|
||||
|
||||
function checkOllamaPortsOrWarn(): boolean {
|
||||
const portValidation = validateOllamaPortConfiguration();
|
||||
if (!portValidation.ok) {
|
||||
|
|
@ -6489,6 +6771,49 @@ async function setupNim(
|
|||
}
|
||||
preferredInferenceApi = "openai-completions";
|
||||
break;
|
||||
} else if (selected.key === "routed") {
|
||||
const bp = loadBlueprintProfile("routed");
|
||||
if (!bp || bp.router?.enabled !== true) {
|
||||
console.error(" Router is not enabled in nemoclaw-blueprint/blueprint.yaml.");
|
||||
if (isNonInteractive()) process.exit(1);
|
||||
continue selectionLoop;
|
||||
}
|
||||
const routerCredentialEnv = bp.router?.credential_env || bp.credential_env || "OPENAI_API_KEY";
|
||||
credentialEnv = routerCredentialEnv;
|
||||
const routedCredential =
|
||||
hydrateCredentialEnv(routerCredentialEnv) ||
|
||||
normalizeCredentialValue(bp.credential_default || "");
|
||||
if (routedCredential) {
|
||||
saveCredential(routerCredentialEnv, routedCredential);
|
||||
}
|
||||
const _providerKeyHint = (process.env.NEMOCLAW_PROVIDER_KEY || "").trim();
|
||||
if (_providerKeyHint && !resolveProviderCredential(routerCredentialEnv)) {
|
||||
saveCredential(routerCredentialEnv, _providerKeyHint);
|
||||
}
|
||||
if (isNonInteractive()) {
|
||||
if (!resolveProviderCredential(routerCredentialEnv)) {
|
||||
console.error(
|
||||
` ${routerCredentialEnv} (or NEMOCLAW_PROVIDER_KEY) is required for Model Router in non-interactive mode.`,
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
} else {
|
||||
if (!resolveProviderCredential(routerCredentialEnv)) {
|
||||
await ensureNamedCredential(routerCredentialEnv, "Model Router API key", null);
|
||||
}
|
||||
}
|
||||
provider = bp.provider_name || "nvidia-router";
|
||||
model = bp.model;
|
||||
const { HOST_GATEWAY_URL } = require("./local-inference");
|
||||
const routerEndpointUrl = bp.endpoint || "";
|
||||
endpointUrl = routerEndpointUrl;
|
||||
if (routerEndpointUrl.match(/localhost|127\.0\.0\.1/)) {
|
||||
const u = new URL(routerEndpointUrl);
|
||||
endpointUrl = `${HOST_GATEWAY_URL}:${u.port}${u.pathname}`;
|
||||
}
|
||||
preferredInferenceApi = "openai-completions";
|
||||
console.log(` ✓ Using Model Router: ${provider} / ${model}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -6737,6 +7062,42 @@ async function setupInference(
|
|||
// Do not mutate ~/.nemoclaw/credentials.json here: local Ollama now uses
|
||||
// OLLAMA_PROXY_CREDENTIAL_ENV, so any saved OPENAI_API_KEY remains available
|
||||
// to unrelated OpenAI-backed sandboxes.
|
||||
} else if (isRoutedInferenceProvider(provider)) {
|
||||
// Blueprint profile provider (e.g., nvidia-router for the routed profile).
|
||||
// Same pattern as vllm-local: upsert the provider and set the inference route.
|
||||
try {
|
||||
await reconcileModelRouter();
|
||||
} catch (err) {
|
||||
console.error(` ✗ Failed to start model router: ${err instanceof Error ? err.message : String(err)}`);
|
||||
process.exit(1);
|
||||
}
|
||||
const resolvedCredentialEnv = credentialEnv || "NVIDIA_API_KEY";
|
||||
const credentialValue = hydrateCredentialEnv(resolvedCredentialEnv);
|
||||
const env = credentialValue ? { [resolvedCredentialEnv]: credentialValue } : {};
|
||||
const providerResult = upsertProvider(
|
||||
provider,
|
||||
"openai",
|
||||
resolvedCredentialEnv,
|
||||
endpointUrl,
|
||||
env,
|
||||
);
|
||||
if (!providerResult.ok) {
|
||||
console.error(` ${providerResult.message}`);
|
||||
process.exit(providerResult.status || 1);
|
||||
}
|
||||
const inferenceArgs = [
|
||||
"inference",
|
||||
"set",
|
||||
"--no-verify",
|
||||
"--provider",
|
||||
provider,
|
||||
"--model",
|
||||
model,
|
||||
];
|
||||
runOpenshell(inferenceArgs);
|
||||
} else {
|
||||
console.error(` Unsupported provider configuration: ${provider}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
verifyInferenceRoute(provider, model);
|
||||
|
|
@ -9156,6 +9517,16 @@ async function onboard(opts: OnboardOptions = {}): Promise<void> {
|
|||
const resumeInference =
|
||||
!forceProviderSelection && resume && isInferenceRouteReady(provider, model);
|
||||
if (resumeInference) {
|
||||
if (isRoutedInferenceProvider(provider)) {
|
||||
try {
|
||||
await reconcileModelRouter();
|
||||
} catch (err) {
|
||||
console.error(
|
||||
` ✗ Failed to reconcile model router: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
skippedStepMessage("inference", `${provider} / ${model}`);
|
||||
if (nimContainer && sandboxName) {
|
||||
registry.updateSandbox(sandboxName, { nimContainer });
|
||||
|
|
|
|||
|
|
@ -106,6 +106,7 @@ type OnboardTestInternals = {
|
|||
value: string | null | undefined,
|
||||
flavor: "openai" | "anthropic",
|
||||
) => string;
|
||||
providerNameToOptionKey: (name?: string | null) => string | null;
|
||||
parsePolicyPresetEnv: (value: string | null) => string[];
|
||||
patchStagedDockerfile: ShimFn<void>;
|
||||
pullAndResolveBaseImageDigest: () => { digest: string; ref: string } | null;
|
||||
|
|
@ -148,6 +149,8 @@ function isOnboardTestInternals(
|
|||
typeof value.agentSupportsWebSearch === "function" &&
|
||||
typeof value.configureWebSearch === "function" &&
|
||||
typeof value.formatSandboxBuildEstimateNote === "function" &&
|
||||
Object.prototype.hasOwnProperty.call(value, "providerNameToOptionKey") &&
|
||||
typeof value.providerNameToOptionKey === "function" &&
|
||||
typeof value.shouldRunCompatibleEndpointSandboxSmoke === "function" &&
|
||||
typeof value.writeSandboxConfigSyncFile === "function"
|
||||
);
|
||||
|
|
@ -199,6 +202,7 @@ const {
|
|||
configureWebSearch,
|
||||
isLoopbackHostname,
|
||||
normalizeProviderBaseUrl,
|
||||
providerNameToOptionKey,
|
||||
parsePolicyPresetEnv,
|
||||
patchStagedDockerfile,
|
||||
pullAndResolveBaseImageDigest,
|
||||
|
|
@ -829,6 +833,20 @@ describe("onboard helpers", () => {
|
|||
);
|
||||
});
|
||||
|
||||
it("maps Model Router sandboxes through managed inference.local", () => {
|
||||
assert.deepEqual(getSandboxInferenceConfig("nvidia-routed", "nvidia-router"), {
|
||||
providerKey: "inference",
|
||||
primaryModelRef: "inference/nvidia-routed",
|
||||
inferenceBaseUrl: "https://inference.local/v1",
|
||||
inferenceApi: "openai-completions",
|
||||
inferenceCompat: null,
|
||||
});
|
||||
});
|
||||
|
||||
it("maps persisted Model Router provider back to the routed provider option", () => {
|
||||
assert.equal(providerNameToOptionKey("nvidia-router"), "routed");
|
||||
});
|
||||
|
||||
it("leaves Kimi K2.6 compat to the model-specific setup registry", () => {
|
||||
assert.deepEqual(
|
||||
getSandboxInferenceConfig("moonshotai/kimi-k2.6", "nvidia-prod", "openai-completions"),
|
||||
|
|
@ -2406,6 +2424,145 @@ const { setupInference } = require(${onboardPath});
|
|||
assert.equal(payload.nvidiaApiKey, "nvapi-secret-value");
|
||||
});
|
||||
|
||||
it("configures Model Router as a host provider while sandboxes keep inference.local", () => {
|
||||
const repoRoot = path.join(import.meta.dirname, "..");
|
||||
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-router-inference-"));
|
||||
const fakeBin = path.join(tmpDir, "bin");
|
||||
const scriptPath = path.join(tmpDir, "setup-router-check.js");
|
||||
const onboardPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "onboard.js"));
|
||||
const runnerPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "runner.js"));
|
||||
const registryPath = JSON.stringify(path.join(repoRoot, "dist", "lib", "state", "registry.js"));
|
||||
|
||||
fs.mkdirSync(fakeBin, { recursive: true });
|
||||
fs.writeFileSync(path.join(fakeBin, "openshell"), "#!/usr/bin/env bash\nexit 0\n", {
|
||||
mode: 0o755,
|
||||
});
|
||||
fs.writeFileSync(
|
||||
path.join(fakeBin, "model-router"),
|
||||
[
|
||||
"#!/usr/bin/env node",
|
||||
'const fs = require("fs");',
|
||||
'const http = require("http");',
|
||||
'const path = require("path");',
|
||||
"const args = process.argv.slice(2);",
|
||||
'if (args[0] === "proxy-config") {',
|
||||
' const output = args[args.indexOf("--output") + 1];',
|
||||
" fs.mkdirSync(path.dirname(output), { recursive: true });",
|
||||
' fs.writeFileSync(output, "model_list: []\\n");',
|
||||
" process.exit(0);",
|
||||
"}",
|
||||
'if (args[0] === "proxy") {',
|
||||
' const port = Number(args[args.indexOf("--port") + 1] || "4000");',
|
||||
" const server = http.createServer((req, res) => {",
|
||||
' if (req.url === "/health") { res.statusCode = 200; res.end("ok"); return; }',
|
||||
" res.statusCode = 404;",
|
||||
" res.end();",
|
||||
" });",
|
||||
' server.listen(port, "127.0.0.1");',
|
||||
" setTimeout(() => process.exit(0), 10000);",
|
||||
"} else {",
|
||||
" process.exit(1);",
|
||||
"}",
|
||||
"",
|
||||
].join("\n"),
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
const script = String.raw`
|
||||
const runner = require(${runnerPath});
|
||||
const _n = (c) => (Array.isArray(c) ? c.join(" ") : String(c)).replace(/'/g, "");
|
||||
const registry = require(${registryPath});
|
||||
|
||||
const commands = [];
|
||||
runner.run = (command, opts = {}) => {
|
||||
const cmd = _n(command);
|
||||
commands.push({ command: cmd, env: opts.env || null });
|
||||
if (cmd.includes("provider get")) return { status: 1, stdout: "", stderr: "" };
|
||||
return { status: 0, stdout: "", stderr: "" };
|
||||
};
|
||||
runner.runCapture = (command) => {
|
||||
const cmd = _n(command);
|
||||
if (cmd.includes("inference") && cmd.includes("get")) {
|
||||
return [
|
||||
"Gateway inference:",
|
||||
"",
|
||||
" Route: inference.local",
|
||||
" Provider: nvidia-router",
|
||||
" Model: nvidia-routed",
|
||||
" Version: 1",
|
||||
].join("\\n");
|
||||
}
|
||||
return "";
|
||||
};
|
||||
registry.updateSandbox = () => true;
|
||||
|
||||
process.env.NVIDIA_API_KEY = "nvapi-router-secret";
|
||||
|
||||
const { setupInference, getSandboxInferenceConfig } = require(${onboardPath});
|
||||
|
||||
(async () => {
|
||||
await setupInference(
|
||||
"router-box",
|
||||
"nvidia-routed",
|
||||
"nvidia-router",
|
||||
"http://host.openshell.internal:4000/v1",
|
||||
"NVIDIA_API_KEY",
|
||||
);
|
||||
console.log(JSON.stringify({
|
||||
commands,
|
||||
sandboxConfig: getSandboxInferenceConfig("nvidia-routed", "nvidia-router", "openai-completions"),
|
||||
}));
|
||||
})().catch((error) => {
|
||||
console.error(error);
|
||||
process.exit(1);
|
||||
});
|
||||
`;
|
||||
fs.writeFileSync(scriptPath, script);
|
||||
|
||||
const result = spawnSync(process.execPath, [scriptPath], {
|
||||
cwd: repoRoot,
|
||||
encoding: "utf-8",
|
||||
env: {
|
||||
...process.env,
|
||||
HOME: tmpDir,
|
||||
PATH: `${fakeBin}:${process.env.PATH || ""}`,
|
||||
},
|
||||
});
|
||||
|
||||
assert.equal(result.status, 0, result.stderr);
|
||||
const payload = parseStdoutJson<{
|
||||
commands: CommandEntry[];
|
||||
sandboxConfig: SandboxInferenceConfig;
|
||||
}>(result.stdout);
|
||||
const providerCommand = payload.commands.find((entry) =>
|
||||
/provider create/.test(entry.command),
|
||||
);
|
||||
assert.ok(providerCommand, JSON.stringify(payload.commands));
|
||||
assert.match(providerCommand.command, /--name nvidia-router/);
|
||||
assert.match(providerCommand.command, /--credential NVIDIA_API_KEY/);
|
||||
assert.match(
|
||||
providerCommand.command,
|
||||
/OPENAI_BASE_URL=http:\/\/host\.openshell\.internal:4000\/v1/,
|
||||
);
|
||||
assert.doesNotMatch(providerCommand.command, /nvapi-router-secret/);
|
||||
assert.equal(providerCommand.env?.NVIDIA_API_KEY, "nvapi-router-secret");
|
||||
|
||||
const inferenceCommand = payload.commands.find((entry) =>
|
||||
/inference set/.test(entry.command),
|
||||
);
|
||||
assert.ok(inferenceCommand, JSON.stringify(payload.commands));
|
||||
assert.match(inferenceCommand.command, /--provider nvidia-router/);
|
||||
assert.match(inferenceCommand.command, /--model nvidia-routed/);
|
||||
|
||||
assert.deepEqual(payload.sandboxConfig, {
|
||||
providerKey: "inference",
|
||||
primaryModelRef: "inference/nvidia-routed",
|
||||
inferenceBaseUrl: "https://inference.local/v1",
|
||||
inferenceApi: "openai-completions",
|
||||
inferenceCompat: null,
|
||||
});
|
||||
});
|
||||
|
||||
it("does not delete saved OpenAI credentials when configuring local vLLM", () => {
|
||||
const repoRoot = path.join(import.meta.dirname, "..");
|
||||
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-onboard-local-vllm-"));
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue