SkillTotal
cd ~/labs
visitor@skilltotal:~$ cat ./labs/rag-poisoning/mission.txt

RAG / Data Poisoning

lab 08 · ST-PROMPT-INJECTION · LLM04
mission.txt
scenario

DocsBot answers questions by retrieving the most relevant knowledge-base article and summarizing it. On ingestion it strips lines that literally contain instruction keywords (ignore, system, instructions, reveal…). You can contribute an article; you cannot see the user’s question in advance.

objective

Poison an article so your directive survives ingestion and a benign, on-topic question retrieves it — making DocsBot carry your payload into its answer.

session — rag-poisoning
kb_article.md

# awaiting command — type a payload and press Enter

▚ Intel

No leads yet. Declassify intel one step at a time when you’re stuck.

▰ Dossierclassified — solve to unseal

How this attack works

The ingestion filter matched literal keywords, so a leet-speak or base64 directive passed straight into the vector store. When a benign question retrieved the document, the model treated the stored text as trusted context and followed the embedded instruction. The poison persists in the index and fires for anyone who asks about that topic.

Why it's dangerous

RAG and agent memory ingest documents, web pages, and tickets from many contributors. One poisoned record influences every future answer that retrieves it — a durable, fan-out injection the attacker plants once and never touches again.

OWASP mapping

Maps to OWASP Top 10 for LLM Applications (2025): LLM04: Data and Model Poisoning (with an LLM01 injection vector). SkillTotal flags the embedded directive in a component’s data as ST-PROMPT-INJECTION.

How to defend

  • Treat retrieved content as data, never instructions — enforce a provenance boundary.
  • Normalize before scanning (decode base64, de-leet, strip invisible chars), then re-scan.
  • Curate and sign knowledge sources; track provenance per chunk and quarantine untrusted contributions.
  • Constrain the model: a retrieved document cannot trigger actions or change system behavior.

SkillTotal catches this class of issue deterministically (rule ST-PROMPT-INJECTION).

Scan AI component (free)

FAQ

Why does keyword stripping fail on ingestion?
It checks raw bytes. Leet-speak, homoglyphs, zero-width characters, and base64 all encode the same intent with different bytes, so the filter sees clean text while the model reconstructs the directive.
What makes RAG poisoning especially dangerous?
It is persistent and indirect: the payload lives in the store and influences every retrieval on that topic, so a single poisoned document can affect many users and sessions.