MCP Layer
The Model Context Protocol (MCP) is the bridge between AI assistants (Claude Code, claude.ai, Claude mobile) and the infrastructure. Instead of pasting log output or querying services manually, the AI connects directly to live systems: it can search semantic memory, trigger n8n workflows, inspect containers, query the database, and control Home Assistant — all through a unified protocol.
chris-os runs 14 MCP servers exposing over 600 tools in total.
Server Inventory
Section titled “Server Inventory”| Server | Transport | Purpose | Tools |
|---|---|---|---|
memory | stdio (bridge) | Persistent semantic memory — store, search, relate, and analyze knowledge across sessions | 28 |
home-assistant | stdio (uvx) | Home Assistant control — devices, automations, entities, scenes | 89 |
github | stdio | Repo management, issues, PRs, GitHub Projects | ~30 |
n8n | SSE | n8n workflow management — list, create, update, test workflows | ~20 |
unifi | SSE | UniFi network control — device inventory, VLAN inspection, client management | ~15 |
docker | stdio | Docker management on the production host — containers, images, networks | ~15 |
google-workspace | stdio (uvx) | Google Docs, Sheets, Gmail, Calendar, Drive, Tasks | ~60 |
docs-mcp-server | SSE (local) | Local documentation index — 159 libraries, full-text + semantic search | ~10 |
ollama | stdio (npx) | Ollama model management on the inference host | ~10 |
discord | stdio (npx) | Discord server management — messages, channels, roles | ~50 |
context7 | stdio (npx) | Live library documentation via Upstash | ~5 |
d2 | stdio | Diagram creation and export using D2 language | ~8 |
brewersfriend | stdio | Brewing data — recipes, fermentation sessions | ~5 |
gcloud / google-workspace | stdio | Google Cloud and Google Workspace operations | varies |
Connection Topology
Section titled “Connection Topology”Servers connect via two transports: stdio (process spawned locally, communicates over stdin/stdout) or SSE (Server-Sent Events over HTTP, persistent stream for server-push).
LAN (Claude Code on the dev workstation)
Section titled “LAN (Claude Code on the dev workstation)”Claude Code ├── memory stdio → bridge script → HTTP/SSE → production host (MCP auth) ├── n8n SSE → production host (MCP auth) ├── unifi SSE → production host (SSH tunnel, loopback only) ├── home-assistant stdio → uvx ha-mcp → HA API via long-lived token ├── github stdio → GitHub API via PAT ├── docker stdio → uvx mcp-server-docker (SSH DOCKER_HOST to prod host) ├── ollama stdio → npx ollama-mcp → inference host (Ollama API) ├── docs-mcp-server SSE → localhost (local process only) └── d2 / discord / context7 / brewersfriend / gcloud / workspace stdio → local or cloud APIsFor n8n and unifi, the SSE transport connects directly to the production host over LAN. Each connection passes through a Caddy listener that routes to the appropriate auth middleware container.
Remote (claude.ai via Cloudflare)
Section titled “Remote (claude.ai via Cloudflare)”claude.ai → Cloudflare Worker (OAuth provider) GitHub OAuth → issues scoped Bearer tokens Routes to: /api/db → Cloudflare Tunnel → mcp-auth-postgres /api/n8n → Cloudflare Tunnel → mcp-auth-n8n /api/memory → Cloudflare Tunnel → mcp-auth-memory /api/ha → Cloudflare Tunnel → mcp-auth-haThe OAuth Worker uses dedicated Cloudflare Tunnel hostnames to avoid a Cloudflare same-zone fetch loop — a Worker cannot fetch a hostname on the same zone directly.
Internal chain (per MCP service)
Section titled “Internal chain (per MCP service)”Every MCP service on the production host follows the same four-layer chain:
Internet / LAN → Caddy (net-frontend) → mcp-auth-{service} (net-mcp + net-frontend) ← validates credentials → mcp-proxy-{service} (net-mcp + net-data/net-app) ← protocol bridge → upstream service (postgres / n8n / mcp-ai-memory)This separation means the upstream services (postgres, n8n) are never directly reachable from outside net-data or net-app. The auth containers are the only point of credential validation.
Authentication Model
Section titled “Authentication Model”Two credential types are accepted at each MCP auth middleware container:
API key (LAN / Claude Desktop / scripts / n8n):
- Per-service keys for blast-radius isolation
- Multiple keys per service supported for zero-downtime rotation
- On LAN: the auth middleware rewrites SSE
endpointevent URLs to carry credentials automatically on subsequent MCP SDK POST calls
Cloudflare Access JWT (claude.ai / remote):
- Issued by the OAuth Worker after GitHub OAuth completes
- Validated against the Cloudflare Access team and per-service audience claims
- Only the GitHub account on the allowlist can obtain tokens
On LAN, the auth middleware terminates the MCP SDK’s OAuth discovery flow and forces immediate fallback to API key authentication.
Memory Server
Section titled “Memory Server”The memory server is the most architecturally complex component. It provides persistent, searchable semantic memory across all AI sessions — storing decisions, insights, project state, and conversation context.
Container chain
Section titled “Container chain”Claude Code / claude.ai → stdio bridge (LAN) or OAuth Worker + Tunnel (remote) → mcp-auth-memory → mcp-proxy-memory → mcp-ai-memory (upstream MCP server, stdio child of proxy) ├── PostgreSQL (memory schema, app role) ├── Redis (BullMQ job queue + search cache) └── Ollama on inference host (embeddings)mcp-ai-memory is a heavily patched fork of an upstream MCP memory server — 48 patches applied against the original. The proxy wraps it as a stdio child process and exposes an HTTP/SSE interface to the auth middleware.
Three-tier cache
Section titled “Three-tier cache”| Tier | Technology | Contents | Invalidation |
|---|---|---|---|
| L1 | JS Map (in-process) | Search result cache, keyed by query hash | On every successful memory_store call |
| L2 | Redis | BullMQ job queue (embedding + clustering), search result cache | TTL-based (default 300s), flushed on embedding model change |
| L3 | PostgreSQL | All memory records, vector embeddings, entity graph, relationships | Never expired (soft-delete via decay scoring) |
Vector search
Section titled “Vector search”Embeddings use qwen3-embedding:8b running on the dedicated inference host — a 4096-dimension model served via Ollama.
Vectors are stored using binary quantization with an HNSW bit index (migration 261). This enables approximate nearest-neighbor search on 4096-dimensional vectors without full float32 storage overhead.
Retrieval uses a two-stage approach:
- Stage 1 (ANN): HNSW bit index retrieves a broad candidate set (
ef_search=100) - Stage 2 (rerank): Candidates reranked by full float32 cosine similarity fused with BM25 full-text score via Reciprocal Rank Fusion (
RRF_K=60)
Additional search features: AutoCut (dynamic tail trimming), MMR diversity (deduplication by word overlap), date range filtering, scope/source filtering, and a federated bridge search for structured data fallback.
Similarity threshold: 0.45 (trajectory: 0.25 -> 0.30 -> 0.45 as the embedding model improved).
Memory lifecycle
Section titled “Memory lifecycle”Memories decay over time unless preserved. The decay system runs hourly via BullMQ cron:
| State | Score | Behavior |
|---|---|---|
| Active | ≥ 0.50 | Normal retrieval weight |
| Dormant | ≥ 0.10 | Reduced retrieval weight |
| Archived | ≥ 0.01 | Near-expiry |
| Expired | < 0.01 | Soft-deleted |
Memories tagged with permanent, important, decision, architecture, or preference bypass decay entirely.
Two memory types: episode (subject to decay, access-dependent) and knowledge (permanent).
Tool surface (28 tools)
Section titled “Tool surface (28 tools)”| Category | Tools |
|---|---|
| Core | memory_store, memory_search, memory_update, memory_delete, memory_list, memory_batch, memory_batch_delete |
| Relationships | memory_relate, memory_unrelate, memory_get_relations, memory_traverse |
| Entity | memory_entity_search, memory_entity_graph, memory_find_similar |
| Analysis | memory_pattern_search, memory_graph_search, memory_graph_analysis, memory_counter_narrative, memory_synthesis_status |
| Identity | memory_identity_claim |
| Lifecycle | memory_preserve, memory_supersede, memory_consolidate, memory_decay_status |
| Core scratch | core_memory_read, core_memory_write, core_memory_trim |
| Stats | memory_stats |
The stdio Bridge
Section titled “The stdio Bridge”The memory server uses a custom stdio bridge (scripts/mcp-memory-bridge.cjs) instead of a direct SSE connection. This exists to work around a bug in the Claude Code MCP SDK — when SSE transport is used for memory, the SDK triggers an OAuth discovery flow that fails on LAN. The stdio bridge bypasses this entirely.
Protocol translation:
Claude Code (stdio JSON-RPC) → bridge buffers messages in memory → HTTP GET /sse (SSE connection to mcp-auth-memory, with credentials) → on "endpoint" event: receives POST endpoint URL for this session → on stdin message: HTTP POST to endpoint URL → on "message" SSE event: write JSON-RPC response to stdout → Claude Code reads response from stdoutReliability features (bridge v3):
- Durable spool (on-disk) — messages spooled during a crash are replayed on restart
- Map-based retry tracking keyed by JSON-RPC id — retries survive requeue round-trips (max 3, exponential backoff)
- Stale endpoint handling — 404/410 response triggers reconnect and re-queues the message
- Graceful shutdown — spool is persisted on
SIGINT/SIGTERM
Known Gotchas
Section titled “Known Gotchas”SSE header forwarding: Claude Code’s SSE transport does not forward custom headers on POST calls after the initial SSE connection. The auth middleware rewrites the stream’s endpoint event to carry the credentials forward in the URL.
mcp-proxy-memory child death: If the mcp-ai-memory stdio child crashes inside the proxy, the proxy returns 5xx. The auth middleware’s health check probes the upstream; if the upstream is unhealthy, Docker restarts the auth container, clearing the stale connection pool. Always restart proxy before auth when recovering.
HNSW bypass via CTE materialization (fixed): A shared CTE in knowledge_search() caused PostgreSQL to materialize the CTE before the HNSW scan, bypassing the index. Fixed in migration 263 by rewriting to inline subqueries. Symptom: slow search + full sequential scan in EXPLAIN.
Embedding model change: After changing the embedding model, manually flush Redis keys matching mcp:embeddings:*. The JS Map cache clears on container restart; the HNSW index must be rebuilt.
Memory tags: Tags cannot contain / or : — slashes are stripped silently. Use hyphens.