Skip to content

gpumod Security Model

This document defines the threat model, input validation specification, and security requirements for gpumod. Covers MCP tool exposure (Phase 4) and LLM-facing AI planning (Phase 5).

All implementation tickets must reference this document and follow its specifications.


1. Threat Model

MCP tools are invoked by LLMs, which may relay untrusted user input or be subject to prompt injection. Every tool argument, resource URI parameter, and return value is an attack surface.

# Threat Vector Impact Mitigation Ref
T1 Shell injection via service/mode IDs LLM passes "; rm -rf /" as service_id to a tool that eventually reaches systemctl Arbitrary command execution SEC-V1: Strict regex validation on all ID args at MCP boundary; systemd.py allowlist + unit name regex as defense-in-depth SEC-V1, SEC-D1
T2 SQL injection via string args LLM passes "'; DROP TABLE services--" as tool arg Data loss, DB corruption SEC-V1: Regex rejects SQL metacharacters; SEC-D2: aiosqlite parameterized queries as defense-in-depth SEC-V1, SEC-D2
T3 Path traversal via model_id LLM passes "../../etc/passwd" as model_id File system read/write SEC-V1: model_id regex allows only [a-zA-Z0-9_\-./]; no file path construction from model_id in MCP layer SEC-V1
T4 Jinja2 template injection LLM passes "{{7*7}}" as a string arg Code execution in sandboxed environment SEC-V1: Regex rejects { and } in IDs; SEC-D3: SandboxedEnvironment as defense-in-depth SEC-V1, SEC-D3
T5 Information disclosure via errors Internal exception bubbles up with DB path, traceback Leaks internal architecture SEC-E1: Error sanitization middleware strips paths and tracebacks SEC-E1
T6 Information disclosure via resources Resource output includes file paths or internal config Leaks deployment details SEC-E2: Resource output must not contain absolute paths SEC-E2
T7 Unauthorized mutating operations LLM calls switch_mode or stop_service without user intent Service disruption SEC-A1: Tool classification (read-only vs mutating); LLM client must confirm mutating actions SEC-A1
T8 Resource exhaustion via simulation LLM calls simulate_mode in a loop, generating max alternatives each time CPU/memory spike SEC-R1: Cap max alternatives at 10; SEC-R2: Rate limit middleware SEC-R1, SEC-R2
T9 Resource exhaustion via MCP flooding LLM sends rapid-fire tool calls Server overload SEC-R2: Rate limit middleware (default 10 req/s) SEC-R2
T10 Terminal escape injection Service/mode names with ANSI escapes returned to LLM, then displayed to user Terminal manipulation SEC-E3: Sanitize names in all output (existing _sanitize_name() pattern) SEC-E3
T11 Extra fields in tool input LLM sends unexpected kwargs that bypass validation Logic bypass SEC-V2: FastMCP strict_input_validation=True; Pydantic extra="forbid" on all models SEC-V2

1.2 Container Security Threats (Phase 7)

# Threat Vector Impact Mitigation Ref
T18 Docker privilege escalation Container runs with --privileged flag Full host access SEC-D7: DockerDriver blocks privileged mode at config validation SEC-D7
T19 Container host network escape Container uses host or macvlan network mode Network-level host access SEC-D8: DockerDriver rejects unsafe network modes SEC-D8
T20 Unsafe volume mounts Container mounts /, /etc, or Docker socket Host filesystem access SEC-D9: DockerDriver rejects critical host paths SEC-D9
T21 Environment variable injection Malicious env var keys with = character Container config manipulation SEC-D10: DockerDriver sanitizes env var keys SEC-D10

1.3 LLM Integration Threats (Phase 5)

The gpumod plan command calls external LLM APIs (OpenAI, Anthropic, Ollama) and uses their responses to generate VRAM allocation plans. This creates additional attack surfaces.

# Threat Vector Impact Mitigation Ref
T12 Indirect prompt injection via DB-stored names Attacker stores malicious service/mode name in DB (e.g., "ignore previous instructions...") that gets included in LLM prompt LLM produces manipulated plan output SEC-L2: Prompt template hardening with clear instruction boundaries; SEC-L1: validate all IDs in response regardless of LLM output SEC-L1, SEC-L2
T13 LLM response manipulation LLM returns malicious plan (e.g., stop all services, allocate impossible VRAM) Service disruption if auto-executed SEC-L4: LLM output is advisory only, never auto-executed; SimulationEngine validates plan feasibility SEC-L4
T14 API key leakage API key appears in error messages, logs, or LLM prompt context Credential theft, unauthorized API usage SEC-L3: API key stored as SecretStr, never logged, never in exception messages SEC-L3
T15 LLM response parsing injection LLM returns crafted JSON with unexpected fields or injection payloads in ID fields ID validation bypass, potential shell injection downstream SEC-L1: All IDs in LLM response validated via validation.py before use; Pydantic strict parsing SEC-L1
T16 Excessive LLM API calls (cost exhaustion) Automated or rapid-fire gpumod plan suggest calls Unexpected API billing costs SEC-R2: Rate limiting; CLI is interactive (human-speed); MCP rate limit middleware SEC-R2
T17 Sensitive data sent to external LLM APIs Full DB dump or internal config sent as prompt context Data exfiltration to third-party API SEC-L5: Data minimization — only service IDs, VRAM numbers, and mode names sent; no paths, credentials, or full configs SEC-L5

2. Input Validation Specification (SEC-V1)

Every string argument to an MCP tool or resource template must be validated against the corresponding regex at the MCP boundary, before any business logic executes.

2.1 ID Validation Regexes

Argument Regex Max Length Examples (valid) Examples (rejected)
service_id ^[a-zA-Z0-9][a-zA-Z0-9_\-]{0,63}$ 64 vllm-chat-01, fastapi_app ../etc, '; DROP, {{7*7}}
mode_id ^[a-zA-Z0-9][a-zA-Z0-9_\-]{0,63}$ 64 chat-mode, code_dev ; rm -rf /, empty string
model_id ^[a-zA-Z0-9][a-zA-Z0-9_\-./]{0,127}$ 128 meta-llama/Llama-3.1-8B, my-gguf-q4 ../../passwd, $(cmd)

2.2 Numeric Validation

Argument Type Min Max Notes
context_overrides values int 1 131072 Context window size in tokens
context_overrides keys str Validated as service_id

2.3 Validation Helper

All MCP modules should use a shared validation module to avoid duplication:

# src/gpumod/validation.py
import re

SERVICE_ID_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_\-]{0,63}$")
MODE_ID_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_\-]{0,63}$")
MODEL_ID_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_\-./]{0,127}$")
MAX_CONTEXT_TOKENS = 131072

def validate_service_id(value: str) -> str: ...
def validate_mode_id(value: str) -> str: ...
def validate_model_id(value: str) -> str: ...
def validate_context_override(key: str, value: int) -> tuple[str, int]: ...

2.4 Strict Input Mode (SEC-V2)

  • FastMCP server must be created with strict_input_validation=True
  • All Pydantic models must use ConfigDict(extra="forbid") (already enforced)

3. Tool Classification (SEC-A1)

3.1 Read-Only Tools (always safe)

These tools perform no mutations and can be called freely:

Tool Description Risk Level
gpu_status Current GPU status, VRAM, services None
list_services All registered services None
list_modes All available modes None
service_info Single service detail None
model_info Single model detail None
simulate_mode VRAM simulation (non-destructive) Low (CPU)

3.2 Mutating Tools (require confirmation)

These tools change system state. LLM clients (Claude Desktop, etc.) should present confirmation UIs before executing. The MCP server marks these in tool descriptions.

Tool Description Risk Level Side Effects
switch_mode Start/stop services High Starts/stops systemd units
start_service Start a service Medium Starts a systemd unit
stop_service Stop a service Medium Stops a systemd unit

3.3 Audit Logging

All mutating tool calls must be logged at INFO level with: - Tool name - Arguments (after validation) - Caller context (if available from MCP) - Result (success/failure)


4. Output Sanitization

4.1 Error Responses (SEC-E1)

Error messages returned to the LLM must not contain: - Absolute file paths (DB path, model file path, template path) - Python tracebacks or stack frames - Internal module names or line numbers

Pattern:

# Bad:  "Database error: /home/user/.config/gpumod/gpumod.db: table not found"
# Good: "Database error: operation failed"

# Bad:  "Traceback (most recent call last):\n  File '/home/...'"
# Good: "Internal error: please check server logs"

Implementation: ErrorSanitizationMiddleware strips paths matching common patterns (/home/, /tmp/, /var/, .db, .py) and replaces with generic messages.

4.2 Resource Output (SEC-E2)

MCP resource content must not contain: - Absolute file paths to DB, config, or model files - Environment variable values - Credentials or tokens

Resources may contain: - Service/mode/model IDs - VRAM numbers - Configuration parameters (port, context_size, etc.)

4.3 Name Sanitization (SEC-E3)

All service, mode, and model names in MCP output must be sanitized: - Strip ANSI escape sequences - Strip Rich markup tags - Remove control characters

Use the existing visualization._sanitize_name() pattern, extracted to validation.py.


5. LLM Security Controls (SEC-L1 through SEC-L5)

5.1 LLM Response Validation (SEC-L1)

All IDs in LLM-generated plans must be validated via validation.py before use: - Service IDs validated against SERVICE_ID_RE - Mode IDs validated against MODE_ID_RE - Model IDs validated against MODEL_ID_RE - VRAM values validated as positive integers within GPU capacity - Invalid IDs in LLM response must raise LLMResponseError with a user-friendly message — never silently accepted

Implementation: llm/response_validator.py parses LLM JSON output against Pydantic models, then validates every ID field via the shared validators.

5.2 Prompt Template Hardening (SEC-L2)

LLM prompt templates must follow these rules: - All templates stored in llm/prompts.py as Python constants (not user-configurable files) - Clear instruction boundaries: system prompt separated from user data - DB-sourced values (service names, mode names) placed in a clearly delimited data section, not interpolated into instruction text - No f-string or .format() with user-controlled values in prompt construction - Template variables limited to: service IDs, mode IDs, VRAM numbers, GPU capacity

Pattern:

SYSTEM_PROMPT = """You are a GPU resource planner. Analyze the following GPU state
and suggest an optimal service allocation plan.

RULES:
- Only use service IDs from the PROVIDED list
- VRAM allocations must be positive integers
- Total VRAM must not exceed GPU capacity
"""

# Data section constructed separately, never interpolated into instructions
def build_user_prompt(services: list[dict], gpu_capacity: int) -> str:
    data = json.dumps({"services": services, "gpu_capacity_mb": gpu_capacity})
    return f"GPU STATE:\n{data}\n\nProvide your plan as JSON."

5.3 API Key Management (SEC-L3)

LLM API keys must be handled securely: - Stored as SecretStr in GpumodSettings (pydantic-settings) - Loaded exclusively from environment variables (GPUMOD_LLM_API_KEY) - Never logged at any level (SecretStr __repr__ returns '**********') - Never included in exception messages or error responses - Never sent to the LLM as prompt context - Ollama backend (local) does not require an API key

5.4 LLM Output Sandboxing (SEC-L4)

LLM-generated plans are advisory only and never auto-executed: - gpumod plan suggest displays a table of recommendations - Output includes suggested CLI commands the user can copy-paste - No service start/stop/switch operations triggered by LLM output - The plan is validated against SimulationEngine to show feasibility - --dry-run flag shows the prompt without calling the LLM API

5.5 Data Minimization (SEC-L5)

Only the minimum necessary data is sent to external LLM APIs: - Sent: Service IDs, mode IDs, VRAM allocations (MB), GPU total capacity - Never sent: Database paths, file paths, API keys, environment variables, full service configurations, user credentials, systemd unit contents - Ollama backend runs locally — data does not leave the machine


6. Rate Limiting (SEC-R1, SEC-R2)

6.1 Simulation Alternatives Cap (SEC-R1)

SimulationEngine._generate_alternatives() must cap output at 10 alternatives. This prevents CPU exhaustion from combinatorial exploration.

6.2 Request Rate Limit (SEC-R2)

The MCP server should implement rate limiting middleware: - Default: 10 requests per second per connection - Configurable via environment variable GPUMOD_MCP_RATE_LIMIT - Returns clear error when limit exceeded: {"error": "Rate limit exceeded", "code": "RATE_LIMITED"}


7. Existing Security Controls Audit

7.1 Strengths (defense-in-depth)

Control Location Status Notes
SEC-D1: systemctl command allowlist services/systemd.py:14-25 Good 8 commands whitelisted, all others rejected
SEC-D1: Unit name regex services/systemd.py:27 Good ^[a-zA-Z0-9_@:.\-]+$ rejects shell metacharacters
SEC-D1: No shell=True services/systemd.py:95 Good Uses create_subprocess_exec throughout
SEC-D2: Parameterized SQL queries db.py (all methods) Good All queries use ? placeholders, no string interpolation
SEC-D3: Jinja2 SandboxedEnvironment templates/engine.py:44 Good Prevents template code execution
SEC-D3: Template name validation templates/engine.py:57-72 Good Rejects path traversal and absolute paths
SEC-D4: Pydantic extra="forbid" models.py (all models) Good Rejects unexpected fields
SEC-D5: Name sanitization visualization.py:38-59 Good Strips ANSI, Rich markup, control chars
SEC-D7: No privileged containers services/drivers/docker.py Good Blocks --privileged flag
SEC-D8: No host network services/drivers/docker.py Good Blocks host and macvlan modes
SEC-D9: Volume mount validation services/drivers/docker.py Good Rejects /, /etc, Docker socket mounts
SEC-D10: Env var sanitization services/drivers/docker.py Good Rejects = in env var keys

7.2 Gaps (addressed by this spec)

Gap Severity Addressed By
No input validation at business logic boundary Medium SEC-V1: MCP tools validate all args; validation.py module
Error messages may leak internal paths Medium SEC-E1: ErrorSanitizationMiddleware
No rate limiting Low SEC-R2: RateLimitMiddleware
No audit logging for mutations Low SEC-A1: Audit log for mutating tools
_sanitize_name not reusable outside visualization Low SEC-E3: Extract to validation.py

8.1 Transport

Mode Use Case Security
stdio (default) Claude Desktop, local LLM clients Process-level isolation, no network exposure
SSE Remote or multi-client Must bind to localhost only; use reverse proxy + TLS for remote access

8.2 MCP Client Configuration

{
  "mcpServers": {
    "gpumod": {
      "command": "python",
      "args": ["-m", "gpumod.mcp_main"],
      "env": {
        "GPUMOD_DB_PATH": "~/.config/gpumod/gpumod.db",
        "GPUMOD_MCP_RATE_LIMIT": "10"
      }
    }
  }
}

9. Implementation Checklist

Tickets must check off relevant items before closing.

Phase 4: Input Validation

  • validation.py module created with shared regexes and validators
  • All MCP tool string args validated via SEC-V1 before business logic
  • All MCP resource template params validated via SEC-V1
  • SimulationEngine validates IDs via SEC-V1 before DB lookups
  • CLI simulate command validates IDs (delegates to SimulationEngine)
  • FastMCP created with strict_input_validation=True (SEC-V2)

Phase 4: Output Sanitization

  • ErrorSanitizationMiddleware strips internal paths (SEC-E1)
  • MCP resources contain no absolute paths (SEC-E2)
  • Service/mode/model names sanitized in all MCP output (SEC-E3)

Phase 4: Authorization & Audit

  • Tools classified as read-only or mutating in descriptions (SEC-A1)
  • Mutating tools log operations at INFO level (SEC-A1)

Phase 4: Rate Limiting & Resource Protection

  • Simulation alternatives capped at 10 (SEC-R1)
  • Rate limit middleware implemented (SEC-R2)

Phase 4: Testing

  • Input validation tests: shell injection, SQL injection, template injection, path traversal
  • Error sanitization tests: no internal paths in error responses
  • Rate limit tests: excess requests rejected
  • Integration tests verify end-to-end security controls

Phase 5: LLM Security Controls

  • LLM response IDs validated via validation.py (SEC-L1)
  • Prompt templates hardened with instruction boundaries (SEC-L2)
  • API key stored as SecretStr, never logged or in errors (SEC-L3)
  • LLM output advisory only, never auto-executed (SEC-L4)
  • Data minimization — only IDs and VRAM sent to LLM (SEC-L5)

Phase 5: Security Hardening

  • RateLimitMiddleware registered on MCP server (SEC-R2)
  • sanitize_name() called in mcp_tools.py and mcp_resources.py (SEC-E3)
  • visualization._sanitize_name() replaced with import from validation.py (DRY)
  • run_async() properly typed — zero type: ignore in CLI modules

Phase 5: Integration Testing

  • Rate limiter rejects excess MCP requests end-to-end
  • LLM response with injected IDs rejected (SEC-L1)
  • API key never appears in any error output (SEC-L3)
  • Config flows through CLI -> services -> DB path

Phase 6: Security Hardening & Observability

Rate Limiting (SEC-R3)

  • Per-client rate limiting with independent quotas (SEC-R3)
  • Resource reads (on_read_resource) enforce rate limit (SEC-R2 update)

JSON & LLM Response Safety (SEC-P1, SEC-P2)

  • safe_json_loads() rejects payloads >1MB (SEC-P1)
  • safe_json_loads() rejects nesting depth >50 (SEC-P1)
  • PlanSuggestion.reasoning capped at 10,000 chars (SEC-P2)
  • Reasoning field sanitized (terminal escapes stripped) (SEC-P2)
  • All json.loads() in LLM backends replaced with safe_json_loads() (SEC-P1)

URL & Path Validation (SEC-V3, SEC-V4)

  • llm_base_url rejects file:// scheme (SEC-V3)
  • llm_base_url rejects metadata IP ranges (169.254.x.x) (SEC-V3)
  • db_path must resolve under $HOME or /tmp (SEC-V4)

Sanitization & Prompt Defense (SEC-E3, SEC-L2 updates)

  • Mode description sanitized in MCP resource output (SEC-E3)
  • Service/mode names sanitized before LLM prompt JSON (SEC-L2)
  • cli_mode.py sanitizes names in Rich output (SEC-E3)

DB Validation (SEC-D6, SEC-V5)

  • extra_config validated against allowed key set (SEC-D6)
  • vram_mb validated with upper bound (SEC-V5)
  • Model IDs validated at DB boundary (SEC-V1 update)

HTTP Hardening (SEC-N1, SEC-N2)

  • Explicit per-phase timeouts on httpx clients (SEC-N1)
  • Response Content-Type validated as application/json (SEC-N2)
  • Total lifecycle timeout via asyncio.wait_for (SEC-N1)

Observability (SEC-A2, SEC-A3)

  • RequestIDMiddleware generates UUID per MCP request (SEC-A2)
  • Structured logging in ServiceManager, Database, LifecycleManager (SEC-A3)
  • Request ID propagates through tool call → response (SEC-A2)

Code Quality

  • cli_context() async context manager in cli.py (DRY)
  • All 6 CLI modules use cli_context() (DRY)
  • Zero type: ignore comments in src/gpumod/ (type safety)
  • Dependencies pinned with upper bounds in pyproject.toml

Integration Testing

  • Integration tests cover all 15 audit findings
  • 900+ total tests, 97%+ coverage

Phase 7: Production Readiness

Container Security (SEC-D7, SEC-D8, SEC-D9, SEC-D10)

  • DockerDriver blocks --privileged mode (SEC-D7)
  • DockerDriver blocks host and macvlan network modes (SEC-D8)
  • DockerDriver rejects unsafe volume mounts (/, /etc, /var/run/docker.sock) (SEC-D9)
  • DockerDriver sanitizes environment variable keys (no = in keys) (SEC-D10)
  • Container image names validated via SEC-V1 regex

Health Monitor Security (SEC-H1 through SEC-H5)

  • Health checks restricted to localhost (SEC-H1)
  • Health endpoint paths validated against SEC-V1 (SEC-H2)
  • Health check timeouts enforced per request (SEC-H3)
  • Health responses validated (Content-Type, size limits) (SEC-H4)
  • Monitoring task cleanup on shutdown (SEC-H5)

TUI Security (SEC-T1 through SEC-T4)

  • Untrusted text rendered via rich.text.Text (no markup injection) (SEC-T1)
  • Service/model names passed through sanitize_name() before display (SEC-T2)
  • TUI commands validated before dispatch (SEC-T3)
  • No credentials or sensitive config displayed in TUI (SEC-T4)

E2E Testing Infrastructure

  • gpu_required and docker_required pytest markers for CI portability
  • Hardware detection skips tests gracefully on CPU-only machines
  • Real SQLite database fixtures for E2E tests
  • Cleanup verification tests for fixture isolation

Documentation

  • README.md updated with DockerDriver, HealthMonitor, TUI, project structure
  • ARCHITECTURE.md updated with component details and E2E testing
  • SECURITY.md updated with SEC-D7 through SEC-D10 container controls
  • Copyright updated to 2024-2026