Secure RAG agent

SecureRAGAgent composes the safety layers into one pipeline so you can put a RAG agent in front of real users without wiring each guard yourself:

query → RBAC gate → input guardrails (PII + injection) → hybrid retrieval (+ rerank)
      → trusted chunks → LLM (cost budget) → output guardrails → groundedness eval
      → citation validation → output sanitization → OTel trace + audit row → result

from largestack import SecureRAGAgent

rag = SecureRAGAgent(
    ["Refunds are available within 30 days of purchase.",
     "Warranty covers manufacturing defects for 12 months."],
    llm="deepseek/deepseek-chat",   # or "ollama/llama3.2:1b" for local/offline
    cost_budget=0.05,
)
res = await rag.answer("What is the refund window?")
print(res.answer)        # grounded, cited answer
print(res.grounded, res.citations, res.cost, res.trace_id)

answer() always returns a SecureRagResult — policy decisions never raise:

Field	Meaning
`answer`	the (sanitized, cited) answer text
`allowed` / `denied_reason`	RBAC outcome (False + reason if the caller lacks permission)
`blocked_by_guardrail`	set if an input/output guard blocked the request
`grounded` / `groundedness`	faithfulness of the answer vs the retrieved chunks
`citations` / `sources`	per-sentence citations and the cited sources
`sanitized`	True if output sanitization altered the answer
`cost` / `trace_id`	per-query cost and the trace id

RBAC gating

Pass any object exposing check(user_id, permission) -> bool (e.g. the built-in RBAC):

from largestack._enterprise.rbac import RBAC
rbac = RBAC(); rbac.add_role("support", ["rag.query"]); rbac.add_user("alice", roles=["support"])

rag = SecureRAGAgent(docs, llm="deepseek/deepseek-chat", rbac=rbac, required_permission="rag.query")
denied = await rag.answer("…", user_id="eve")     # allowed=False, no LLM call, audited
ok     = await rag.answer("…", user_id="alice")    # runs the full pipeline

Options

Arg	Default	Notes
`guardrails`	`("pii", "injection")`	guard names; run pre-retrieval and at the LLM step
`dense` / `embed_fn`	`False`	`dense=True` (local sentence-transformers) or a sync `embed_fn` enables hybrid BM25+dense retrieval
`reranker`	`None`	pass a `Reranker` to rerank candidates
`cost_budget`	`0.5`	per-query USD ceiling
`groundedness_threshold`	`0.5`	min faithfulness to mark `grounded=True`
`sanitize_output`	`True`	strip active HTML/script from the answer before returning
`audit`	`True`	write RBAC-deny / guard-block events to the audit trail

Deliberately not auto-wired (documented seams)

Vector DB (Qdrant/etc.): start with dense=True or your own embed_fn; swap the store via largestack._vectorstores when you outgrow in-memory/BM25.
SIEM export: every run writes an audit row — use largestack siem-export for your SIEM.
LangSmith: the engine emits Phoenix/OTel traces; LangSmith is not bundled.

See also: Guardrails · OWASP coverage & red-team · RAG.