mirror of
https://github.com/NVIDIA/NemoClaw.git
synced 2026-07-01 18:57:15 +00:00
<!-- markdownlint-disable MD041 --> ## Summary `openclaw doctor --fix` can collapse mutable OpenClaw paths from NemoClaw's multi-UID `2770/660` contract to OpenClaw's single-user `700/600` defaults. This change restores that contract after both entrypoint one-shot commands and the documented `nemoclaw <name> exec` boundary, with deterministic child-versus-cleanup exit-status precedence. ## Related Issue Fixes #6047 ## Changes - Supervise entrypoint one-shot commands so `TERM` and `INT` are forwarded, the direct child is reaped, permission cleanup always runs, and the remote status is preserved when cleanup succeeds. - After public OpenShell exec returns, inspect registered OpenClaw sandboxes, repair only detected mutable-permission drift through the installed descriptor-safe normalizer while holding the timer-bound shields mutation lock, and require successful re-inspection. Hermes, custom agents, unregistered sandboxes, and active shields locks remain untouched. - Move mutable-tree normalization, baseline capture, and empty-config recovery into an installed root-trusted Python helper that operates through pinned, no-follow descriptors. - Authenticate the permanently privilege-dropped owner child with a private Unix socket, `SO_PASSCRED`, exact credentials, and `SCM_RIGHTS`; retain the exact directory/config descriptors across the privilege boundary. - Replace recovery baseline/config/hash entries with fresh inodes so hardlinks, symlink swaps, directory replacement, and inode-reuse races cannot turn root into a confused deputy. - Fail closed on missing trusted helper, unexpected ownership, unsafe links, metadata changes, malformed descriptor handoff, or incomplete verification. - Document host-side cleanup behavior, atomic recovery, failure precedence, and safe operator recovery guidance. - Add unit, integration, container, and live-target coverage for permission drift, signals, cleanup precedence, capability loss, hardlink safety, path swaps, protected symlinks, and trusted-helper selection. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [x] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Quality Gates - [x] Tests added or updated for changed behavior - [ ] Existing tests cover changed behavior — justification: - [ ] Tests not applicable — justification: - [x] Docs updated for user-facing behavior changes - [ ] Docs not applicable — justification: - [x] Sensitive paths changed (security, policy, credentials, preflight, onboarding, inference, runner, sandbox, or messaging) - [x] Sensitive-path review completed or maintainer-approved waiver recorded — reviewer/approval link/justification: independent review through runtime head `85b0744310553ec4a141bdc031cdc881e8c57422` confirmed the previously reproducible cross-phase ABA, external-hardlink ownership/mode mutation, path-based recovery TOCTOU, root helper-selection, and earlier-tree hardlink-alias issues are fixed. Follow-up `64d6234a9198e14e582155ebae4c75c362373f10` changes test control flow only. The owner child is permanently privilege-dropped before recursive mutation and root performs no work without an authenticated descriptor handoff. No runtime privilege-boundary blocker remains; the same-UID child retains only authority that the sandbox user already has over its own inode. - [ ] Non-success, skipped, or missing CI check accepted by maintainer — check name, approval link, and follow-up issue: ## Verification - [x] PR description includes the DCO sign-off declaration and every commit appears as `Verified` in GitHub - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Quality Gates section completed with required justifications or waivers - [x] No secrets, API keys, or credentials committed - [ ] `npm run docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ### Evidence - Signed commit and push hooks: repository checks, Biome, ShellCheck, Hadolint, gitleaks, CLI typecheck, full CLI test hook, source-shape budget, and test-size budget passed. - Focused host tests: the final runtime correction passed 67 CLI and 145 integration tests after the earlier focused suites; CLI type-checking passed. - Fresh production image: build passed; new E2E cases 30–30f all passed. The full script reported 40 passes and two unrelated stale base-image profile assertions. - Independent container probes passed normal repair, exact-config capture, empty-config recovery, hardlink/protected-target invariants, and the final `700/600` to `2770/660` production-image repair path. - Full CLI/coverage hooks, repository checks, source-shape budget, and test-size budget passed under the repository's expected `umask 022`. - Docs build completed with 0 errors and 2 pre-existing warnings; agent-variant synchronization and docs checks passed. - Security test-depth follow-ups remain non-blocking: malformed ancillary-message variants, an isolated missing-`CAP_SETUID` case, 16 MiB boundary/source-mutation/temp-cleanup cases, exact-image provenance when reusing an existing E2E tag, and a same-UID concurrent post-check hardlink race that cannot increase the child process's existing authority. --- Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added safer one-shot command handling with post-command config permission cleanup. * Improved config recovery and permission repair behavior for OpenClaw environments. * Updated documentation to describe the new cleanup and recovery behavior more clearly. * **Bug Fixes** * Hardened config permission handling against symlinks, ownership mismatches, and concurrent changes. * Improved failure handling so unsafe or incomplete repairs now fail closed with clearer status reporting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Signed-off-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> |
||
|---|---|---|
| .. | ||
| _components | ||
| _ext | ||
| _templates | ||
| about | ||
| deployment | ||
| get-started | ||
| inference | ||
| manage-sandboxes | ||
| monitoring | ||
| network-policy | ||
| reference | ||
| resources | ||
| security | ||
| .docs-skip | ||
| AGENTS.md | ||
| CONTRIBUTING.md | ||
| index.mdx | ||
| index.yml | ||