MCP Server Integration¶
gpumod includes an MCP (Model Context Protocol) server that lets AI assistants manage GPU services directly. The server exposes tools for querying status, simulating VRAM, and switching modes.
All IDE configurations below assume you cloned gpumod and installed it
with uv sync. Adjust command paths if you used pip instead.
OpenTelemetry stdout pollution
gpumod depends on opentelemetry. Without OTEL_SDK_DISABLED=true in
the env block, the SDK may print a startup message to stdout on some
systems. This corrupts the JSON-RPC stream and causes MCP clients to
fail with Failed to parse JSONRPC message from server. Always include
"OTEL_SDK_DISABLED": "true" in your MCP server config.
Claude Code¶
Claude Code discovers MCP servers from .mcp.json in the project root.
Create this file in your project (or home directory for global access):
{
"mcpServers": {
"gpumod": {
"command": "uv",
"args": ["--directory", "/path/to/gpumod", "run", "python", "-m", "gpumod.mcp_main"],
"env": {
"GPUMOD_DB_PATH": "~/.config/gpumod/gpumod.db",
"OTEL_SDK_DISABLED": "true"
}
}
}
}
Or add it via the CLI:
Cursor¶
Cursor reads MCP configuration from .cursor/mcp.json in the project
root. Create the file:
{
"mcpServers": {
"gpumod": {
"command": "uv",
"args": ["--directory", "/path/to/gpumod", "run", "python", "-m", "gpumod.mcp_main"],
"env": {
"GPUMOD_DB_PATH": "~/.config/gpumod/gpumod.db",
"OTEL_SDK_DISABLED": "true"
}
}
}
}
After saving, restart the Cursor agent or open Settings > MCP to verify the server is connected.
Claude Desktop¶
Add to ~/.config/claude/claude_desktop_config.json (Linux) or
~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"gpumod": {
"command": "uv",
"args": ["--directory", "/path/to/gpumod", "run", "python", "-m", "gpumod.mcp_main"],
"env": {
"GPUMOD_DB_PATH": "~/.config/gpumod/gpumod.db",
"OTEL_SDK_DISABLED": "true"
}
}
}
}
Restart Claude Desktop after editing the config.
Antigravity (Google)¶
Antigravity stores MCP config in mcp_config.json. To edit it:
- Open the ... dropdown at the top of the agent panel
- Click Manage MCP Servers
- Click View raw config
- Add the gpumod entry:
{
"mcpServers": {
"gpumod": {
"command": "uv",
"args": ["--directory", "/path/to/gpumod", "run", "python", "-m", "gpumod.mcp_main"],
"env": {
"GPUMOD_DB_PATH": "~/.config/gpumod/gpumod.db",
"OTEL_SDK_DISABLED": "true"
}
}
}
}
Save and the server will connect automatically.
Running the MCP server manually¶
For testing or debugging, run the server directly:
The server starts in stdio mode, which is the standard transport for
MCP clients. Set GPUMOD_LOG_LEVEL=DEBUG for verbose output.
Available MCP Tools¶
The MCP server exposes 16 tools across three tiers:
| Tool | Description | Type |
|---|---|---|
gpu_status |
Get current GPU status, VRAM usage, running services | Read-only |
list_services |
List all registered services with driver type and VRAM | Read-only |
list_modes |
List all available GPU modes | Read-only |
service_info |
Get detailed info for a specific service | Read-only |
model_info |
Get model metadata and VRAM estimates | Read-only |
simulate_mode |
Simulate VRAM for a mode with optional changes | Read-only |
switch_mode |
Switch to a different GPU mode (starts/stops services) | Mutating |
start_service |
Start a specific service | Mutating |
stop_service |
Stop a specific service | Mutating |
search_hf_models |
Search HuggingFace for models by author/keyword/task/driver | Discovery |
list_gguf_files |
List GGUF files in a repo with size and VRAM estimates | Discovery |
list_model_files |
List model files (GGUF or Safetensors) with format detection | Discovery |
fetch_model_config |
Fetch config.json from a HuggingFace repo | Discovery |
generate_preset |
Generate preset YAML configuration for a GGUF model | Discovery |
fetch_driver_docs |
Fetch driver documentation (llama.cpp or vLLM) | Discovery |
consult |
Multi-step reasoning for complex GPU/model questions | Consulting |
Mutating tools are clearly marked in their descriptions and should trigger confirmation prompts in MCP clients.
Discovery Tools¶
The discovery tools help AI assistants find and configure new models:
search_hf_models
Parameters:
author: str | None # HuggingFace org (default: all)
search: str | None # Keyword search in model names
task: str | None # Filter: code, chat, embed, reasoning
driver: str | None # Filter: llamacpp (GGUF), vllm (Safetensors), any
limit: int = 20 # Max results (1-100)
no_cache: bool = False # Bypass cache
Returns: { models: [...], count: int }
# When driver param used, models include model_format and driver_hint
list_gguf_files
Parameters:
repo_id: str # e.g., "unsloth/Qwen3-Coder-Next-GGUF"
vram_budget_mb: int | None # Filter files that fit in VRAM
Returns: { repo_id, files: [...], count: int }
list_model_files (unified format support)
Parameters:
repo_id: str # e.g., "unsloth/Qwen3-Coder-Next-GGUF"
vram_budget_mb: int | None # Filter files that fit in VRAM
Returns: { repo_id, files: [...], count, model_format, driver_hint }
# model_format: "gguf" | "safetensors" | "unknown"
# driver_hint: "llamacpp" | "vllm" | null
generate_preset
Parameters:
repo_id: str # HuggingFace repo ID
gguf_file: str # GGUF filename to use
context_size: int = 8192 # Context window size
service_id: str | None # Custom service ID
Returns: { preset: str, service_id: str }
Available MCP Resources¶
The MCP server provides 8 browsable resources:
| URI | Description |
|---|---|
gpumod://help |
Overview of gpumod capabilities |
gpumod://config |
Current configuration and settings |
gpumod://modes |
List all defined modes |
gpumod://modes/{mode_id} |
Detail view of a specific mode |
gpumod://services |
List all registered services |
gpumod://services/{service_id} |
Detail view of a specific service |
gpumod://models |
List all registered models |
gpumod://models/{model_id} |
Detail view of a specific model |