Case study

Eleanor-AI

Agentic Security Operations Assistant

Slack + Discord agent system that runs security questions, reporting tasks, and evidence workflows against structured operational data instead of manual console checks. Hardened ephemeral Docker containers, proxy credential pattern, six-network segmentation between control and execution planes, persistent conversation and session resume.

Problem

A small security team can’t keep up with the operational surface of a multi-region SaaS environment by clicking through consoles. AWS resources, CrowdStrike sensor coverage, vulnerability findings, network exposure, and SOC 2 evidence all live in different UIs, with different authentication, different latency, and different export formats. Every recurring question — “what’s our current sensor coverage on production EC2?”, “which RDS instances aren’t behind a private subnet?”, “give me a CrowdStrike rotation readiness report by role” — becomes an hour of cross-referencing spreadsheets.

Off-the-shelf LLM agents can sit on top of this, but the default deployment shape — agents running in-process, with shared memory, shared credentials, and flat network access — means a compromised or hallucinating agent reaches everything the agent process can reach. Loading every tool, every credential, and every dataset into one agent context isn’t a security boundary; it’s a single failure surface.

Eleanor was built to make agentic operations usable in a security context — meaning real isolation, real credential boundaries, and real fault containment, not just “an agent that calls APIs.”


System

Eleanor exposes Slack and Discord adapters over a structured operational data layer. A user asks a question in Slack. A routing rule maps (platform, channel/user) to the right orchestrator, gated by a per-platform channel/user allow-list. The control plane spawns an ephemeral Docker container for that orchestrator, materializes its skills and context documents from PostgreSQL + MinIO, runs the Claude Agent SDK session inside the container, and streams responses back to the platform adapter. After idle timeout, the container is torn down.

The agent inside the container has:

  • A unique CLAUDE.md, skills, and context for its orchestrator (per-orchestrator MinIO prefix + global shared prefix, mounted read-only)
  • MCP tool connections routed through an in-cluster proxy, with per-MCP credentials encrypted at rest (Fernet keys derived from a service secret)
  • No real credentials — it holds dummy AWS keys and proxy URLs
  • No internet access — it attaches only to internal networks
  • A read-only filesystem with source compiled to .pyc bytecode
  • Seccomp restrictions — ~270 allowed syscalls, default deny

The control plane (FastAPI, PostgreSQL, MinIO, Docker socket proxy) and the execution plane (per-orchestrator agent containers) communicate only through the message channel and the credential proxies, with service-to-service auth (X-Service-Key from a Docker secret) on the message channel. Neither plane trusts the other.

Conversations persist across container teardowns: turn-level history, token counts, costs, and durations are written to PostgreSQL keyed by conversation_id, and the Claude Agent SDK session can resume from a checkpoint when the container restarts. Every action — caller, engineer, orchestrator, action, cost, tokens, duration — also lands in a queryable audit log API.


Architecture

  CONTROL PLANE (app container)
  ├── FastAPI (~30 endpoints)
  ├── PlatformRegistry
  │     ├── Slack adapter (Socket Mode)
  │     └── Discord adapter (Gateway)
  ├── Routing rules (DB) → orchestrator_id
  ├── AccessControlConfig (channel/user allow-list)
  ├── AgentEngine (session management)
  ├── ContainerManager (Docker via socket proxy)
  ├── ProjectWriter (DB + MinIO → container disk)
  └── ConversationStore (PostgreSQL turns + cost/tokens)

        │  agent-net  (internal, no internet)

  EXECUTION PLANE (ephemeral agent containers)
  ├── FastAPI agent-service
  ├── Claude Agent SDK session
  ├── MCP server connections (via mcp-proxy)
  ├── Read-only /workspace, read-write /home/agent
  └── Resource limits: 4GB / 1 CPU / 256 PIDs
        │                    │
        │ bedrock-net        │ mcp-net
        ▼                    ▼
  bedrock-proxy          mcp-proxy
  (SigV4 / API key)      (nginx reverse proxy)
        │                    │
        │ egress-net         │ egress-net
        ▼                    ▼
  AWS Bedrock            Real MCP servers

  PERSISTENT STORAGE (db-net, internal)
  ├── PostgreSQL — orchestrators, MCP configs, conversations
  └── MinIO — context documents, skills (prefix-isolated)

  DOCKER API (docker-proxy-net, internal)
  └── docker-socket-proxy — filtered API
        (containers + networks only, no exec/images/volumes)

Six Docker networks, each blocking a specific threat:

NetworkMembersBlocks
agent-netcontrol plane ↔ agent containersInternet egress from agents, lateral movement to other services
bedrock-netagent ↔ bedrock-proxyDirect Bedrock access (forces credential injection at proxy)
mcp-netagent ↔ mcp-proxyDirect MCP server access
egress-netproxies ↔ outside worldBypass of credential injection
db-netcontrol plane ↔ PostgreSQL/MinIOAgent access to database, source of truth tampering
docker-proxy-netcontrol plane ↔ socket proxyDirect Docker API access (escape via volumes/exec)

Security model

Control plane vs execution plane separation

The single most important design decision. The control plane manages orchestrators, routing, credentials, and persistence; the execution plane runs agent code. Neither plane has the privileges of the other:

  • The control plane can spawn and tear down containers, but never executes agent prompts or tool calls itself.
  • The execution plane can call MCP tools and the LLM, but cannot reach the database, MinIO, the host Docker socket, or the internet directly.
  • The boundary is enforced by Docker networks, not by application logic. A bug in agent code can’t reach into the database; a bug in the control plane can’t accidentally leak a credential to an agent.

This is the difference between “my agent is sandboxed by my application” and “my agent is sandboxed by the kernel + Docker + iptables.” Only the second one is meaningful in a security context.

Proxy credential pattern

Agents never see real API keys. The flow looks like this:

  agent container       bedrock-proxy            AWS Bedrock
  ─────────────────────────────────────────────────────────
  curl bedrock-proxy
    -H "Authorization:
        Bearer DUMMY"
                ───►   nginx strips header
                       injects real SigV4
                       signs request          ───►  Bedrock API

                       passes response back   ◄────       │
  ◄────  response

Three proxies handle the three credential domains:

  • bedrock-proxy — holds AWS SigV4 credentials (or a Bedrock API key); agent sends a dummy token, proxy injects the real one
  • openrouter-proxy — same pattern, OpenRouter Bearer token
  • mcp-proxy — nginx reverse proxy that routes agent MCP requests to real upstream MCP servers, holding any auth those servers require

The agent container has no .env, no IAM role, no AWS profile. If a prompt injection convinces the agent to cat ~/.aws/credentials, there is nothing to read. If the container is compromised at the OS level, an attacker has dummy tokens that work nowhere except inside the cluster.

This pattern is not specific to security — it’s good agent hygiene generally. But in a security operations context where the agent is a high-value target because it has access to security data, removing real credentials from the blast radius is the difference between a contained incident and a real one.

Isolated container execution

Each orchestrator runs in its own ephemeral Docker container. The container is hardened:

  • Read-only root filesystem — only /home/agent is writable, and that’s wiped on teardown
  • Source compiled to .pyc — the .py files are deleted from the agent image. Claude Code’s Read tool, called against the agent’s own source, returns bytecode gibberish. Combined with no internet egress, exfiltration of agent logic is blocked at multiple layers.
  • Seccomp profile — ~270 syscalls allowed, default deny. ptrace, mount, sysctl, unshare, kexec_load all blocked. A successful RCE inside the container can’t escalate to host kernel attacks.
  • Resource limits — 4 GB RAM, 1 CPU, 256 PIDs. Fork bombs, runaway memory, and process-table exhaustion all bounded.
  • Idle teardown — containers are auto-killed after 5 minutes idle. The blast radius window is short.

Filtered Docker socket access

The control plane needs to spawn and tear down containers, which means it needs Docker API access. Mounting /var/run/docker.sock directly is one of the most common container-escape vectors — anyone with the socket has root on the host.

Instead, a docker-socket-proxy sits between the control plane and the real socket. The proxy allows only the API surface the control plane actually needs (containers, networks) and blocks exec, images, volumes, swarm, and system. A compromise of the control plane can spawn an agent container, but can’t docker exec into another container, mount host paths, or push images.

Persistent workspaces with SDK session resume

Containers are ephemeral; the conversations they run are not. Every turn writes to PostgreSQL — message content, tool calls, token counts, cost, duration — keyed by conversation_id (or an external thread ID for Slack/Discord). When a container is torn down or a host restarts, the next turn spins up a new container and the Claude Agent SDK resumes from the last checkpoint.

This decouples session lifetime from container lifetime, which matters for two reasons. First, it means idle teardown (5-minute default) is a real win — there’s no penalty to recycling a container if no one’s using it, because the conversation can pick up later from the same point. Second, it means the audit trail and the session state share the same backing store; a single SQL query reconstructs both what was said and what it cost per conversation.

Audit log API

Every meaningful action — message received, container spawned, tool call issued, response sent — is appended to a queryable audit log via a dedicated API. Each row carries caller, engineer (the user the action was attributed to), orchestrator, action type, cost, tokens, and duration. Compliance questions (“who asked Eleanor what, when, and what did it cost?”) get answered by SQL, not by piecing together container logs.


Impact

Q1 2026 production data:

MetricValue
Queries handled145 (Jan–Mar 2026)
Avg response time98 seconds
Total AI cost$151.64 (~$1.05/query)
Reports generated29 (HTML security reports)
Total Eleanor compute time237 minutes
Estimated human-equivalent time~1,866 hours
Speed multiplier~473× vs. manual
Projected annual AI cost~$607

The 473× speedup is the wrong metric to lead with. The bigger story is that most of the work Eleanor does is extremely time-consuming or impractical to do at the cadence security operations actually needs. Cross-referencing fleet AWS inventory against CrowdStrike sensor versions, walking attack paths through flattened security-group chains, doing multi-region S3 compliance audits — these tasks are expensive enough manually that running them regularly is hard to justify against everything else competing for the same hours. Eleanor makes them cheap enough to be routine.

A representative example: a CrowdStrike certificate rotation readiness assessment across the AWS fleet. Eleanor pulled instance inventory, cross-referenced against current sensor versions, classified by role and environment type, identified the single instance still running an unsafe version, and generated a formatted HTML report with charts. End-to-end time: about two minutes. Manual equivalent: a junior engineer with access to both consoles for 1–2 weeks, assuming they don’t make a mistake. Without this stack, that kind of assessment is a project, not a routine query.


Tradeoffs and what I’d do differently

Cold start cost. Ephemeral containers spawn in 15–30 seconds. In-process agents (the alternative pattern) start instantly. For a security bot where users tolerate latency on the order of “I asked Slack a question and got a real answer,” 15 seconds is fine. For anything user-facing or interactive, this would be the wrong shape. The tradeoff is intentional: I bought a kernel-enforced security boundary in exchange for cold-start latency.

Database connection pool is the scaling bottleneck. With a 10-connection pool and per-orchestrator containers each opening their own DB conversation, ~10 concurrent active orchestrators is the comfortable ceiling. The fix is a per-container connection budget plus a spawn queue; I haven’t built it yet because the team isn’t pushing past that limit.

No rate limiting on platform adapters. Slack and Discord adapters take messages as fast as they arrive. A misbehaving channel — or a prompt-injection-driven message storm — could drive container spawns until the host runs out of memory. This is on the next-up list.

The proxy credential pattern was the highest-leverage decision. If I were rebuilding from scratch, every other security control could change. This one wouldn’t. Once you accept that the agent is a high-value target and that prompt injection is a real attack vector, you have to remove credentials from the agent’s blast radius. The proxy pattern does that with less than 200 lines of nginx config.


Architecture and aggregate impact only. Implementation details, queries, prompts, and configuration are not part of this writeup.