# AegisFlow **Enterprise AI Compliance Monitoring · System Whitepaper** *Version 1.0 · February 2026* --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [The Problem](#2-the-problem) 3. [What AegisFlow Is](#3-what-aegisflow-is) 4. [Users & Roles](#4-users--roles) 5. [Product Architecture](#5-product-architecture) 6. [Feature Walkthrough](#6-feature-walkthrough) 7. [Compliance Engine (Deep Dive)](#7-compliance-engine-deep-dive) 8. [Security Model](#8-security-model) 9. [Technology Stack](#9-technology-stack) 10. [Data Model](#10-data-model) 11. [API Surface](#11-api-surface) 12. [Operational Readiness](#12-operational-readiness) 13. [Deployment Options](#13-deployment-options) 14. [Roadmap](#14-roadmap) 15. [Appendix · Glossary](#15-appendix--glossary) --- ## 1. Executive Summary AegisFlow is a **vendor-agnostic governance control plane for enterprise AI usage**. Companies that have rolled out OpenAI, Anthropic, Google Gemini, xAI and other large-language-model platforms across their workforce face a sudden, urgent question: *Who is using what model, for what purpose, on what data, at what cost — and is any of it leaking PII or violating policy?* AegisFlow answers that question in a single console. It plugs into the **Admin APIs** of 12 major AI vendors, normalizes their usage / cost / safety telemetry into a unified event stream, runs every event through an inline **PII / PHI / prompt-injection compliance engine**, persists the result behind a **multi-key Fernet vault with KEK wrapping**, and surfaces it through dashboards, live WebSocket monitoring, per-vendor analytics, an immutable audit log, and SIEM forwarders. The product is production-hardened — circuit breakers on every outbound call, Prometheus metrics, Kubernetes liveness/readiness probes, brute-force lockout, rate limits, and centralized exception envelopes — and ships with **51/51 automated tests passing**. --- ## 2. The Problem ### 2.1 The new attack surface Generative AI tools have become the default interface for everything from sales follow-ups to legal-draft review. In the typical enterprise: - Employees paste **customer PII** (emails, SSNs, credit cards, MRNs) into ChatGPT and Claude. - Marketers expose **unannounced product details** to Gemini and Perplexity. - Engineers leak **proprietary source** into Cursor and code assistants. - Anyone can spin up a personal API key on OpenRouter / DeepSeek with zero corporate oversight. - **Prompt-injection** payloads in third-party content can hijack agents and exfiltrate data. There is no analog of CASB or DLP for this surface. The vendors give you a billing dashboard. They do not give you cross-vendor visibility, redaction, or audit-grade evidence for SOC 2 / ISO 27001 / HIPAA reviews. ### 2.2 The compliance gap Security and compliance teams are now being asked, by their boards and their auditors: > *"Show me, across every AI vendor in the company, who sent prompts containing PII or PHI in the last 30 days — and what controls fired."* Without a control plane, the answer is "we cannot tell." That answer is increasingly unacceptable in regulated industries. AegisFlow is the answer. --- ## 3. What AegisFlow Is AegisFlow is a **full-stack SaaS web application** with three deployment modes: 1. **Hosted preview / live demo** — a public URL the team can poke at instantly. 2. **Customer-managed Kubernetes** — full Helm + raw manifests included; runs in any CNCF-compliant cluster. 3. **On-premises / air-gapped** — Docker Compose + an egress allow-list of just the vendor Admin APIs. Functionally, it is composed of seven planes: | Plane | Responsibility | | --- | --- | | **Sources** | 12 LLM vendors via their Admin APIs | | **Collector** | Pulls telemetry on demand or schedule; wraps every call in a circuit breaker | | **Pipeline** | Normalizes raw vendor responses into a canonical `event` shape | | **Compliance Engine** | Inline PII / PHI / prompt-injection scanning during ingest | | **Storage** | MongoDB for events + Fernet KMS-wrapped vault for vendor credentials | | **Surface** | REST API, WebSocket stream, immutable audit log | | **Consumers** | Dashboards, Threat Intel, SIEM forwarders, exports | --- ## 4. Users & Roles ### 4.1 Day-to-day users - **Compliance analysts** — review violations, export evidence binders. - **Security engineers** — investigate prompt-injection or PII exfiltration. - **Platform owners** — manage vendor keys, rotation, RBAC. - **CFO / Procurement** — see cost roll-ups across vendors. ### 4.2 RBAC matrix | Role | View KPIs | View events | Manage policies | Vault keys | Team / SSO | Encryption | | --- | --- | --- | --- | --- | --- | --- | | **Admin** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | **Analyst** | ✅ | ✅ | read-only | masked | — | — | | **Viewer** | ✅ | ✅ | — | — | — | — | Enforced server-side via `require_role()` on every protected route — frontend hiding is a UX nicety only. --- ## 5. Product Architecture ``` SOURCES COLLECTOR PIPELINE STORAGE SURFACE CONSUMERS ───────── ───────── ──────── ─────── ─────── ───────── 12 LLM vendors ───► Adapters ───► Ingest ───► MongoDB events ───► REST /api/* ───► Dashboards (OpenAI/Claude/…) (real Compliance Fernet+KEK vault WebSocket Threat Intel Admin APIs) scan inline /ws/events SIEM egress Policy engine Policy engine ``` ### 5.1 Vendor adapters Every supported vendor implements a uniform Python `Adapter` contract: ```python class Adapter: async def validate_credential(self, key: str) -> bool: ... async def fetch_usage(self, key: str, since: datetime) -> list[RawEvent]: ... async def fetch_cost(self, key: str, since: datetime) -> list[CostPoint]: ... ``` `integrations/registry.py` dispatches `(vendor_id, key_id) → Adapter` and dispatch failures fall back to a deterministic mock so the dataset is never blank. **Every outbound call is wrapped in a circuit breaker** with a 5-fail / 30-second open state, half-open trial, and Prometheus metrics on transitions. Concrete adapters: - `openai_admin.py` — `sk-admin-…` keys against the five Usage endpoints + the Costs endpoint, 90-day look-back window. - `anthropic_admin.py` — `sk-ant-admin-…` keys against `/v1/organizations/usage_report/messages` + `/cost_report`. - `xai_admin.py` — `/v1/models` for key validation; usage shipped as mock until xAI publishes Admin endpoints. - `gemini_admin.py` — validates against the generative API; BigQuery cost path stubbed pending SA-JSON upload. - `custom_adapter.py` — POSTs the decrypted key to a customer-supplied `collector_url` and normalizes the response. The other 7 vendors (Azure OpenAI, Mistral, Cohere, Hugging Face, OpenRouter, Perplexity, DeepSeek, Ollama) run through the mock fallback today; key-validation paths are roadmapped. ### 5.2 Pipeline Raw vendor responses land in `raw_events`. The pipeline step normalizes them into the canonical event: ```jsonc { "id": "uuid", "user_id": "uuid", // tenant "vendor_id": "openai", "model": "gpt-4o", "actor_email": "user@acme.io", "tokens_in": 932, "tokens_out": 412, "cost_usd": 0.0184, "violation_type": "pii_credit_card", // optional "violation_label": "Credit Card Number", "severity": "critical", // low|medium|high|critical "prompt_excerpt": "Charge card 4111 ...", "compliance_hits": ["pii_credit_card","pii_email"], "created_at": "2026-02-17T14:31:09Z" } ``` During normalization, every event passes through `core/compliance.evaluate_event()`. Detector hits write `violation_type`, `severity`, and an array of `compliance_hits` directly onto the event — no async queue, no downstream consumer needed. ### 5.3 Storage - **MongoDB**: events, raw_events, sync_runs, users, audit_log, login_attempts. - **Vault**: `vault_keys` collection — vendor API keys encrypted at rest with `MultiFernet`, wrapped by a KEK derived from `FERNET_KEY`. Rotation is zero-downtime: add the new key to the head of `FERNET_KEYS_HISTORY`, restart, and a background self-test re-encrypts and proves the chain. **A backup leak alone is not sufficient to recover the keys.** - **Redis** (optional): `/dashboard/overview` cache with a 30-second TTL. Falls back to an in-memory TTL dict so a missing Redis URL is non-fatal. ### 5.4 Surface - **REST** at `/api/*` — JSON over httpOnly cookies. CORS allow-listed via `FRONTEND_URL` + regex for `*.preview.emergentagent.com` and `*.emergent.host`. - **WebSocket** at `/api/ws/events` — cookie-authenticated stream that emits `hello → event* → heartbeat` frames. Unauthenticated upgrades are closed with policy violation 1008. - **Immutable audit log** — every privileged action records actor, action, IP (XFF-aware), user-agent, and a millisecond timestamp. ### 5.5 Consumers - **Frontend dashboards** — 22 React pages organized into a 6-section sidebar (Monitoring · Providers · Compliance · Security · Platform · Admin). - **SIEM egress** — Splunk HEC, Datadog Logs, Sumo Logic HTTP source, or a generic JSON webhook. Test endpoint included. - **Exports** — CSV / JSON pulls for any event filter, suitable as SOC 2 evidence binders. --- ## 6. Feature Walkthrough ### 6.1 Global Dashboard Headline KPIs: 7-day events, total cost, violation count, average cost per event. A multi-series timeline chart breaks events down by vendor; the recent feed lists the last 10 high-severity violations. The page is **Redis-cached for 30 seconds** and invalidated on every sync. ### 6.2 AI Activity & Violations A single filterable audit table (`/api/events`). The sidebar offers two entry points pointing at the same route — *AI Activity* (everything) and *Violations* (`?only_violations=1`). The URL is bidirectionally bound to state, so deep-linking and bookmarking work. ### 6.3 PII / PHI Detection The compliance-engine UI: four headline KPIs (scanned, flagged, detectors active, critical hits), an 8-card grid showing 7d/30d counts per detector, and a recent-detections feed showing the actual prompt excerpt that fired the detector. A **Re-scan last 14 days** button calls `POST /compliance/backfill` and refreshes counts in place. ### 6.4 Threat Intelligence Posture tiles (active vendors, circuit-breaker state, audit volume, anomaly count) above a circuit-breaker panel showing live state transitions, a recent-anomalies panel, and an audit timeline. This is where a security analyst lives during an incident. ### 6.5 Live Monitoring A WebSocket-fed event stream. Connection status pulses green when live; events scroll in framer-motion staggered cards. The pause and clear controls let an operator freeze the feed during triage. ### 6.6 Vendor Analytics Four cross-vendor KPIs (total events 7d, total spend, average P50 latency, P95 hot-vendor) above a per-vendor table: events 7d/30d, tokens 7d, spend 7d, P50 / P95 latency, and last-sync status. Useful for procurement reviews and SLA enforcement. ### 6.7 API Key Vault Add / rotate / remove vendor credentials. The key never leaves the server in plaintext after the initial POST: it is encrypted with the active Fernet chain head before insertion. UI shows only the masked tail. For Gemini, a separate Service Account JSON can be uploaded against the same key record. ### 6.8 Policy Engine A list of keyword + regex rules with adjustable severity and hit counters. Rules apply during the compliance scan; matches enrich the existing `violation_type` if the engine's automatic detection misses something domain-specific. ### 6.9 Architecture An animated SVG end-to-end pipeline diagram (Sources → Collector → Pipeline + Compliance → Storage → Surface → Consumers) with framer-motion flowing-dash edges. Doubles as living onboarding documentation. --- ## 7. Compliance Engine (Deep Dive) The compliance engine lives in `core/compliance.py` and consists of **eight pure-function detectors** plus a single `evaluate_event(event)` mutator. ### 7.1 Detector catalog | Code | Label | Category | Severity | Mechanism | | --- | --- | --- | --- | --- | | `pii_email` | Email Address | pii | low | RFC-ish regex | | `pii_ssn` | Social Security Number | pii | critical | US SSN regex with area / group / serial validity | | `pii_credit_card` | Credit Card Number | pii | critical | Regex + **Luhn check** | | `pii_phone` | Phone Number | pii | medium | US + intl regex | | `pii_ip` | IP Address | pii | low | IPv4 octet validity | | `phi_mrn` | Medical Record Number | phi | high | `MRN: 8821547` / `Patient ID 12345` | | `phi_dob` | Date of Birth | phi | medium | `DOB: 1984-06-12` / `date of birth 6/12/84` | | `prompt_injection` | Prompt Injection | security | high | 13 jailbreak signature patterns (DAN, "ignore previous instructions", "developer mode", role-tag injection, etc.) | Credit-card detection is **Luhn-validated** to eliminate false positives on long digit runs (order numbers, hashes, etc.). ### 7.2 Where it runs The engine is invoked synchronously in four places: 1. **Seed data generation** (`_gen_raw_event`) — guarantees the dataset has realistic ~14% flag rate. 2. **Real adapter sync** (`routes/collector.py::sync_key`) — every event returned by a vendor Admin API. 3. **Mock fallback sync** — every synthetic event when an adapter is not wired. 4. **Pipeline ingest** — events promoted from `raw_events` to `events` get scanned a second time so a detector update + backfill cleans up history. ### 7.3 Severity precedence The engine never *downgrades* a violation already set by upstream. If a vendor adapter or policy rule already labeled an event `critical`, the engine attaches its hits to `compliance_hits` but leaves the primary `violation_type` alone. This way the worst signal always wins and you keep an audit trail of every concurrent hit. ### 7.4 Backfill API `POST /api/compliance/backfill?days=14` re-scans the last N days of events and updates anything the latest detectors catch. Useful when you ship a new regex. --- ## 8. Security Model ### 8.1 Identity - **JWT** issued at login, signed with `JWT_SECRET` (HS256). Set in **two httpOnly cookies**: `access_token` (short-lived) and `refresh_token` (longer-lived), both `SameSite=None; Secure`. - **bcrypt** for password hashing. The default cost factor is set high enough to make per-attempt hashing dominant. - **Brute-force lockout** — 5 failed attempts per email in 15 minutes triggers a temporary lock with a database-backed counter (`login_attempts` collection), preventing trivial credential stuffing. - **Refresh token rotation** — every refresh issues a new pair and invalidates the previous. ### 8.2 Authorization - **RBAC** with `Admin / Analyst / Viewer`. Enforced via `Depends(require_role("admin"))` on protected routes. The frontend hides UI but is not the security boundary. - **Tenant isolation** — every Mongo query filters by `user_id`. There is no shared collection; multi-tenancy is enforced at the data layer. ### 8.3 Secrets at rest - **MultiFernet chain** — list of keys, the first is the encrypt-with key; subsequent keys are decrypt-only for graceful rotation. - **KEK wrapping** — the encrypted chain is *itself* persisted in MongoDB wrapped by a KEK derived from `FERNET_KEY`. A leaked DB backup, without the KEK, decrypts to noise. - **Rotation flow**: prepend the new key to `FERNET_KEYS_HISTORY`, restart, the startup self-test re-encrypts the chain and emits a rotation event to the audit log. Operators can verify decryption depth via `/api/ready`. ### 8.4 Transport - **HSTS preload** (`max-age=63072000; includeSubDomains; preload`) - **Strict Content-Security-Policy** with `default-src 'self'`. (Open backlog: drop `'unsafe-inline'` for nonce-based CSP once Shadcn supports the `nonce` hook.) - **COOP / CORP** prevent cross-window leakage. - **X-Frame-Options: DENY**, **Referrer-Policy: no-referrer**, **Permissions-Policy** locked down. ### 8.5 Application-layer - **slowapi rate limits**, **XFF-aware** — `/auth/register` at 10/hr, `/leads` at 60/min. - **Centralized exception handlers** — uniform `{detail, error:{code,message,request_id}}` envelope. **No stack traces in 5xx responses.** - **Circuit breakers** on every outbound LLM call — protects you from a misbehaving vendor cascading into a backend outage. - **Immutable audit log** captures actor + IP + user-agent + action + diff on every privileged operation. ### 8.6 What the engine does NOT do We are deliberately transparent about coverage gaps: - We do **not** offer at-rest encryption beyond the credential vault. The events themselves live in MongoDB cleartext, on the assumption you encrypt the disk / volume at the infrastructure layer (LUKS, EBS). - We do **not** implement DLP-style outbound blocking. Detection is post-fact; in-line gating belongs in a forward proxy, not a usage governance plane. - We do **not** ship a customer-data-exfil canary detector — that is a roadmap item. --- ## 9. Technology Stack ### 9.1 Frontend - **React 19** + **React Router 7** — the latest stable. - **Tailwind CSS** with custom design tokens; **Shadcn UI** primitives. - **framer-motion** for entrance animations, sidebar transitions, and the animated architecture diagram. - **Recharts** for time-series charts on the dashboard. - **Sonner** for non-blocking toasts; **axios** with `withCredentials: true` for the API. - Typography: **Cabinet Grotesk** (display) + **IBM Plex Sans / Mono**. ### 9.2 Backend - **FastAPI** + **Motor** (async MongoDB) + **Pydantic v2** + **WebSockets** + **BackgroundTasks**. - Async I/O end-to-end with **httpx** for outbound calls and **tuned connection pools**. - **PyJWT** + **bcrypt**, **slowapi**, **cryptography (MultiFernet)**, **prometheus_client**, **structured JSON logging** with **request-id propagation** through middleware. ### 9.3 Infrastructure - **MongoDB 5.0+** — replica-set optional, sharding not required at the typical scale. - **Redis 7+** — optional, the app falls back to in-memory TTL caches. - **Supervisor** in the preview environment; **Kubernetes** manifests + a sample **Helm chart values file** for production. ### 9.4 Test & CI - **pytest** suite — **51 tests passing**, covering auth, brute-force lockout, RBAC, vendor adapter dispatch with mocked HTTP, dashboard cache hit / miss, encryption rotation, KEK wrap, SA upload, rate-limit headers, and sync-run growth. - **Playwright** e2e for landing / sale page / UTM tracking / security headers. - **GitHub Actions CI** — `ruff` lint, `pytest`, `pip-audit`, `yarn build`, `npm audit`, Playwright on every PR. --- ## 10. Data Model Collections (MongoDB): | Collection | Purpose | Key fields | | --- | --- | --- | | `users` | Tenants + auth | `id, email, password_hash, role, org_name, created_at` | | `events` | Canonical event stream | as in §5.2 above | | `raw_events` | Pre-normalization payloads | `id, user_id, vendor_id, run_id, ingested, raw, created_at` | | `sync_runs` | Sync history | `id, user_id, vendor_id, status, pulled, errors, started_at, finished_at` | | `vendors` | System + custom catalog | `id, name, product, color, models, user_id, collector_url` | | `vault_keys` | Encrypted credentials | `key_id, vendor_id, api_key_encrypted, sa_json_encrypted, name, user_id, created_at` | | `policies` | Customer policy rules | `id, user_id, name, pattern, severity, hits, enabled` | | `audit_log` | Immutable trail | `id, user_id, action, target, actor_ip, actor_ua, diff, ts` | | `login_attempts` | Brute-force tracker | `email, ip, failures, last_attempt, locked_until` | Indexes are created on `(user_id, created_at)` for events and raw_events, and on `email` for users / login_attempts. --- ## 11. API Surface All routes prefixed with `/api`. ~60 endpoints total; the high-value ones: ### 11.1 Auth & Identity ``` POST /auth/register rate-limited 10/hr POST /auth/login brute-force protected POST /auth/logout clears cookies POST /auth/refresh rotates pair GET /auth/me current user ``` ### 11.2 Vendors & Vault ``` GET /vendors system + custom POST /vendors add custom vendor DELETE /vendors/{vendor_id} GET /keys masked list POST /keys Fernet-encrypted insert DELETE /keys/{key_id} POST /keys/{key_id}/service-account DELETE /keys/{key_id}/service-account ``` ### 11.3 Collector & Pipeline ``` POST /collector/sync/{key_id} real adapter or mock GET /collector/runs sync history GET /pipeline/status raw vs structured counts POST /pipeline/ingest promote raw → events (compliance-scanned) ``` ### 11.4 Analytics ``` GET /dashboard/overview Redis-cached GET /events filter + paginate GET /vendors/analytics per-vendor P50/P95/spend ``` ### 11.5 Compliance Engine ``` GET /compliance/detectors catalog GET /compliance/summary 7d/30d counts + recent[] POST /compliance/scan/{id} ad-hoc single-event re-scan POST /compliance/backfill?days=N re-scan recent ``` ### 11.6 Real-time & Ops ``` WS /ws/events cookie or ?token= auth GET /health K8s liveness GET /ready Mongo + Redis + breaker state GET /metrics Prometheus scrape ``` ### 11.7 Enterprise ``` GET / PUT /team RBAC management (admin only) GET / POST /policies policy engine GET /audit/export?fmt=csv|json GET / POST /integrations/{splunk|datadog|sumo|webhook} POST /integrations/{provider}/test GET / POST /sso/saml SSO/SAML config GET / POST /sso/scim GET /deployment/{helm|manifests|onprem} GET / POST /encryption/rotate GET /encryption/audit ``` --- ## 12. Operational Readiness ### 12.1 Observability - **Structured JSON logs** on stdout with `request_id` propagated across handlers, adapters, and outbound vendor calls. Requests are traceable through the entire pipeline. - **Prometheus metrics** at `/api/metrics`: - `aegisflow_http_requests_total{route,method,status}` - `aegisflow_request_duration_seconds_bucket{route}` (histogram) - `aegisflow_adapter_calls_total{vendor,outcome}` - `aegisflow_circuit_breaker_state{vendor}` - `aegisflow_db_pool_active`, `aegisflow_db_pool_idle` - `aegisflow_websocket_clients` - **Kubernetes probes** — `/api/health` (liveness) and `/api/ready` (Mongo + Redis + circuit-breaker state). ### 12.2 Resilience - **Circuit breaker** on every outbound vendor call: 5 failures within 60 seconds opens the breaker for 30 seconds, then a half-open trial decides whether to close. - **Tuned async pools** — Motor `maxPoolSize=100, minPoolSize=10`; httpx `max_connections=50, max_keepalive=20`. - **Centralized exception envelope** — unhandled errors return `{error:{code,message,request_id}}` with no stack trace. Stack traces are logged with the same `request_id` for offline correlation. - **In-memory fallback cache** for the dashboard so a Redis outage degrades performance but never breaks the page. ### 12.3 Capacity expectations The product is designed for **mid-market enterprise volumes** — tens of thousands of events per day across a handful of vendors. The MongoDB indexes and pagination model comfortably handle a 90-day window of ~3M events. For a Fortune-500-scale rollout (1M events / day per vendor), a horizontal-shard pattern on `(user_id, created_at)` is on the roadmap. --- ## 13. Deployment Options ### 13.1 Hosted (preview) Supervisor-managed FastAPI + React on the Emergent platform. Two-click deploy to production at `*.emergent.host`. ### 13.2 Customer Kubernetes Manifests served live from `/api/deployment/manifests`. Helm values stub at `/api/deployment/helm`. Required: - One backend Deployment (FastAPI behind uvicorn). - One frontend Deployment (NGINX serving the React build). - One MongoDB StatefulSet (or a managed Atlas connection string). - Optional Redis Deployment. - ConfigMap with non-secret env; Secret with `JWT_SECRET`, `FERNET_KEY`, vendor creds. ### 13.3 On-prem / air-gapped `docker-compose.yml` template includes the backend, frontend, MongoDB, Redis, and a NGINX reverse proxy. Required egress allow-list: ``` api.openai.com api.anthropic.com generativelanguage.googleapis.com api.x.ai *.azure.com (regional, e.g., eastus.api.cognitive.microsoft.com) api.mistral.ai api.cohere.com api-inference.huggingface.co openrouter.ai api.perplexity.ai api.deepseek.com + customer-supplied SIEM endpoints ``` For deeply air-gapped deployments, every outbound vendor adapter can be disabled and customer-managed mirrors substituted via the `EXTRA_ORIGINS` and `*_BASE_URL` overrides. ### 13.4 Environment variables ``` # Required MONGO_URL "mongodb://..." DB_NAME "aegisflow_db" JWT_SECRET <64-char hex> FERNET_KEY ADMIN_EMAIL, ADMIN_PASSWORD DEMO_EMAIL, DEMO_PASSWORD # Optional FERNET_KEYS_HISTORY "" # rotation chain (comma-separated) REDIS_URL "" # falls back to in-memory FRONTEND_URL "https://..." # primary CORS origin EXTRA_ORIGINS "" # additional comma-separated ``` --- ## 14. Roadmap ### 14.1 Shipped - Real Admin-API adapters for OpenAI, Anthropic, xAI, Gemini - Multi-tenant org + RBAC (Admin · Analyst · Viewer) - SSO / SCIM configuration (Okta, Entra ID, Google, SAML 2.0 / OIDC) - 8-detector PII / PHI / prompt-injection engine - Live monitoring (WebSocket stream) - Per-vendor analytics, Threat Intelligence, animated architecture viz - Multi-Fernet rotation with KEK-wrapped chain in MongoDB - Circuit breakers on every outbound vendor call - Prometheus metrics + Kubernetes probes - CI pipeline (ruff · pytest · pip-audit · yarn build · npm audit · Playwright) ### 14.2 Backlog (P1) - Gemini Service-Account JSON → BigQuery query path (storage is done; query path is the next step). - Frontend Vitest unit tests for components. - Real key-validation paths for Azure OpenAI, Mistral, Cohere, Hugging Face, OpenRouter, Perplexity, DeepSeek, Ollama. ### 14.3 Backlog (P2) - Scheduled (cron) sync runs with exponential backoff. - Slack and PagerDuty alert channels for critical violations. - BYO Snowflake / BigQuery destination for downstream BI tooling. - PDF compliance report export per organization. - Light-mode adaptive toggle. - Tighten CSP to nonce-based strategy (drop `'unsafe-inline'`). - API versioning prefix (`/api/v1`). - Customer-data-exfil canary detector. - Horizontal-shard pattern on `events.(user_id, created_at)` for F500 scale. --- ## 15. Appendix · Glossary | Term | Definition | | --- | --- | | **Admin API** | A vendor's tenancy-wide API (separate from user-facing chat APIs) that exposes usage, cost, and audit telemetry. Examples: `sk-admin-…` for OpenAI, `sk-ant-admin-…` for Anthropic. | | **Compliance Engine** | The inline scanner in `core/compliance.py` that runs on every event during ingest. | | **Detector** | One pure-function pattern in the compliance engine (e.g., `pii_ssn`). | | **Event** | A normalized record of one LLM call: who, what model, how many tokens, what cost, what violations. | | **Fernet** | The symmetric authenticated-encryption scheme from `cryptography`. **MultiFernet** chains multiple keys for rotation. | | **KEK** | Key-Encryption Key — wraps the Fernet chain in storage so a backup leak alone is insufficient. | | **Pipeline** | The step that normalizes `raw_events` into `events` and runs the compliance engine inline. | | **PII / PHI** | Personally Identifiable Information / Protected Health Information. | | **Prompt injection** | An adversarial payload embedded in a prompt or in third-party content that aims to hijack the model's instructions. | | **RBAC** | Role-Based Access Control. AegisFlow ships three roles: Admin, Analyst, Viewer. | | **SIEM** | Security Information & Event Management. Splunk / Datadog / Sumo / Elastic. | | **Vault** | The encrypted credential store (`vault_keys` collection) holding vendor API keys. | --- *End of document.*