Case study

LLM-Daily-Summary

Automated Finding Triage & Jira Reporting

Daily pipeline that pulls security findings from the SecOps-Platform database via MCP, fans out one Docker container per finding for parallel analysis through a Claude-via-Bedrock subagent, and creates Slack notifications and Jira tickets for the engineering team to act on. Every step lands in a verifiable HMAC-SHA256 audit chain.

Problem

A small security team running a multi-region SaaS environment receives a steady stream of security findings — Inspector V2 vulnerabilities, GuardDuty threats, CrowdStrike detections, network exposure issues. The volume isn’t crushing on any individual day. The cumulative volume over months is.

Each finding, triaged manually, takes 15–45 minutes: query the database for context, correlate against historical alerts, assess real risk, write up an analysis, format it into a Jira ticket. Multiply by hundreds of findings per quarter and the math doesn’t work for a team this size. The honest outcome — what was actually happening before this pipeline existed — is that most findings sat untriaged. They didn’t get ignored on purpose; they got ignored by accident, because nobody had the bandwidth.

SIEM-style alerting doesn’t fix this. It just produces more alerts without context. What’s needed is something that does the triage analysis itself — context-aware, database-aware, structured, ticketable — at a per-finding cost low enough that no finding has to wait.

System

The pipeline runs on a recurring schedule. Each run does this:

Pulls findings via the SecOps-Platform’s MCP server (get_security_findings) — UNIONs five finding sources, applies exclusion patterns, returns new findings within a configurable lookback window.
For each finding, spawns a Docker container running Claude Code with an Eleanor-style orchestrator agent and MCP PostgreSQL access for correlation. Up to 10 containers run concurrently via asyncio.gather. AWS credentials and project root are mounted read-only.
The orchestrator inside the container delegates to a security-finding-analyst subagent (along with devsecops-engineer and security-controller subagents for specific finding classes), routing LLM calls through AWS Bedrock — no third-party API key sits inside the container.
The subagent queries the database for context (related instances, historical similar findings, blast radius) and writes a structured analysis report.
Markdown reports are converted to Atlassian Document Format via a dedicated containerised Node.js microservice, then posted as Jira tickets in the security project.
Slack receives a start notification and a completion summary (success rate, duration, ticket links).
Every step lands in a structured-logging stack with HMAC-SHA256-signed, hash-chained JSONL audit trail and PII/secret redaction.

The whole batch — typically 10–30 findings — completes in 2–5 minutes.

Architecture

                  Cron / scheduled trigger
                            │
                            ▼
   ┌────────────────────────────────────────────────────┐
   │             ORCHESTRATOR PROCESS                    │
   │                                                    │
   │  • Fetch findings (get_security_findings)          │
   │      → SecOps-Platform PostgreSQL                  │
   │  • Apply exclusion patterns                        │
   │  • Bulk-check existing Jira tickets                │
   │  • Spawn analysis containers (asyncio.gather)      │
   └────────────────────────┬───────────────────────────┘
                            │
                ┌───────────┼───────────┐
                ▼           ▼           ▼
   ┌──────────────────┐  (up to 10 concurrent containers)
   │ ANALYSIS CONTAINER (one per finding)
   │
   │   Claude Code + Eleanor orchestrator agent
   │   ├── security-finding-analyst (subagent)
   │   ├── devsecops-engineer (subagent)
   │   └── security-controller (subagent)
   │
   │   LLM via AWS Bedrock (no API key in container)
   │
   │   MCP tools:
   │   └── postgres → SecOps-Platform (read-only)
   │
   │   Mounts: AWS creds (ro), project root (ro)
   │   Lifetime: 30–120 seconds typical
   └──────────────────┬───────────────────
                      │  structured analysis
                      ▼
   ┌────────────────────────────────────────────────────┐
   │            POST-ANALYSIS PIPELINE                   │
   │                                                    │
   │  • Markdown → ADF (Node.js microservice,           │
   │      with cache + circuit breaker + subprocess     │
   │      fallback)                                     │
   │  • Jira ticket creation (3 concurrent, exp backoff)│
   │  • Slack notifications (30/min rate limit)         │
   │  • Audit log entry (HMAC-SHA256, hash-chained)     │
   └────────────────────────┬───────────────────────────┘
                            │
                            ▼
   ┌────────────────────────────────────────────────────┐
   │         FALLBACK / RESILIENCE                       │
   │                                                    │
   │  Slack circuit breaker → JSONL fallback file       │
   │  Jira circuit breaker → JSONL fallback file        │
   │  Both: 5 failures → open, 60s recovery             │
   └────────────────────────────────────────────────────┘

Key design decisions

Per-finding container execution

Every finding analyzed gets its own Docker container. The split is process-level — Docker namespace separation, AWS credentials and project root mounted read-only — not kernel-hardened. (See Tradeoffs below: this container is not the same shape as Eleanor’s hardened agent containers, and the wording reflects that.) The benefits that actually hold:

Crash containment. A bad LLM response, a malformed finding, a runaway tool call — all bounded to one container, not the orchestrator.
No cross-finding data leakage. Each container starts with its own filesystem namespace; the agent analyzing finding A never sees the context, intermediate state, or partial outputs of finding B running in parallel.
Reproducibility. Re-running a single finding’s analysis is docker run against a known image, not “spin up the whole orchestrator and pray it picks the right one.”
Easy parallelism. asyncio.gather over per-finding coroutines, with container.wait() running in an executor so multiple containers progress concurrently rather than serially.

For batch security work where the user is “the next morning’s Slack summary,” 15–30 seconds of cold start per finding is invisible. The split also means a future hardening pass (the same cap_drop + seccomp + read-only-rootfs treatment Eleanor’s containers get) can land without rewriting the orchestrator.

Verifiable HMAC-SHA256 audit chain

The audit trail isn’t just signed log lines — it’s a verifiable chain. Each event carries:

A per-event HMAC-SHA256 signature over event_id|timestamp|event_type|event_hash, with a service-derived key held outside the application’s running environment.
A previous_hash linking it to the prior event. The chain replay function (verify_audit_chain) walks every entry and flags chain_break with the expected vs. found previous hash if any line was tampered with after the fact.
A separate sequence-number column so reordering and gap-detection queries are cheap.

This is standard tamper-evident logging done as a real chain (the prior version of this writeup said “hash-chained” — the verifier elevates that from a claim to an actual integrity guarantee, which matters for the audit story). Compliance regimes (SOX, GDPR, HIPAA, PCI-DSS, ISO 27001) all want some version of “you can prove this analysis was generated when you say it was, and that nobody changed it.” A signed-and-verifiable chain is the cheapest way to answer that.

PII redaction at the structured-logging boundary

The audit chain and PII redactor live inside one structured-logging stack — async logging, factory, health checks, log management, OpenTelemetry integration, and a production-logging orchestrator that runs the redactor as a late-priority log processor (priority 90) so structured logs go through it before emission.

The redactor handles ~13 categories: PII (email, phone, IPv4, SSN, credit card, DOB, driver’s license) plus security secrets (API keys, bearer tokens, AWS keys, JWT, private keys, passwords). Six redaction strategies are available — REMOVE, MASK, HASH, CORRELATION_HASH, TRUNCATE, REPLACE — and they’re applied per destination: Jira gets less aggressive redaction (it’s an internal tracker with access controls); Slack gets the strict version (channels can have wider audiences); the audit log gets full redaction (it leaves the system if pulled for compliance).

Framing it as one stack rather than two separate features (audit + redaction) matches how the code is actually organized. They share the same processor pipeline and health checks.

Markdown → ADF as a separate service

Atlassian Document Format is a structured JSON representation that Jira’s API requires, and converting markdown into it correctly — preserving headings, tables, code blocks, links, mixed inline formatting — is finicky enough that a third-party library does it badly and a hand-rolled converter does it worse.

The pipeline runs a dedicated Node.js microservice for the conversion, called over HTTP. The orchestrator caches converted reports (same finding, same content → same ADF, no need to reconvert), and the service has its own circuit breaker plus a subprocess-based fallback if the long-running service is wedged. The total complexity is more than I’d want for a one-off, but the conversion is on the critical path of every Jira ticket, and getting it right matters more than keeping it small.

Circuit breakers + JSONL fallback

Slack and Jira are both external dependencies. Either can be down at the worst possible moment. The pipeline has a circuit breaker per destination — five consecutive failures opens the breaker for 60 seconds — and when the breaker is open, output is written to a local JSONL fallback file instead of disappearing.

When the breaker closes again, queued entries are replayed in order. The result: an external outage might delay notifications, but never silently loses analysis output.

Impact

Metric	Value
Findings processed since Aug 2025	2,259
Concurrent analysis containers	10
Typical batch	10 findings in 2–5 minutes
Per-finding infra cost (incl. AI)	~$0.71
Resource ceiling	~5 GB RAM for full parallel batch
Audit compliance posture	SOX / GDPR / HIPAA / PCI-DSS / ISO 27001 alignable

The 2,259 number understates the impact. A more honest framing is the counterfactual: at 15–45 minutes per manual triage, those findings would have represented 565–1,694 hours of analyst time over eight months. They wouldn’t have taken that time, because nobody had it to give. They would have piled up as untriaged backlog — the most common failure mode for security teams at this scale.

The pipeline doesn’t replace human judgment on the highest-risk findings; the analyst still reviews critical Jira tickets before they move into engineering work. What it does is attach a baseline analysis to every finding automatically, so the team’s manual time is spent on review and prioritization, not on the first-pass triage no one was doing.

Tradeoffs and what I’d do differently

Containers are not hardened. The current containers.run call uses Docker defaults — no cap_drop, no security_opt (no seccomp, no no-new-privileges), writeable rootfs, no mem_limit / nano_cpus / pids_limit, no network_mode lockdown. The mounts that are read-only (AWS creds, project root) help but don’t make this an isolation boundary in the Eleanor-AI sense. The fix is ~10 lines in container_manager.py — same cap_drop=["ALL"], security_opt=["no-new-privileges:true", "seccomp=…"], read_only=True with tmpfs for /tmp, plus memory and PID limits. It’s on the next-up list, but I deliberately didn’t ship it yet because the trust boundary here is different: the agent reads from a controlled MCP source (SecOps-Platform), there’s no user-controlled prompt input, and the container’s network reach is limited by what the host already permits. Worth doing anyway, but not load-bearing for safety today.

The biggest design call that paid off: per-finding container execution with parallel orchestration. The model trades resource cost for three things at once. Speed — asyncio.gather over per-finding coroutines means the wall-clock time of a batch is set by the slowest finding, not the sum of all of them. Accuracy — each finding runs with its own filesystem namespace and its own analysis context, so there’s no shared state to bleed between findings running in parallel. Scale — adding throughput means adding concurrent containers, not stretching a serial run. The cost is RAM and CPU per batch, not latency. The scale path is straightforward: bigger host, or move to Kubernetes and let the scheduler place per-finding pods across nodes.

Architecture and aggregate impact only. Finding contents, analysis prompts, and Jira ticket templates are not part of this writeup.