gpumod

GPU Service Manager that manages inference services (vLLM, llama.cpp, FastAPI, Docker) on NVIDIA GPUs. It tracks memory allocation, enables switching between named service configurations, simulates resource requirements before deployment, and provides an MCP interface for AI assistant integration.

Service lifecycle management with multiple driver support
Mode-based switching that bundles services within VRAM constraints
Pre-deployment simulation with capacity suggestions
Model metadata tracking from HuggingFace and GGUF sources
MCP server for IDE integration with Claude, Cursor, and similar tools
Interactive terminal dashboard and CLI

Command Palette

Choose Theme