Indirect Prompt Injection
ReaderBot summarizes a web page when you ask it to. Before reading, it runs a sanitizer that strips any line containing a banned word (ignore, reveal, secret, password, system, instruction, api key). You control the page content (left panel), but not ReaderBot’s own instructions.
Poison the Untrusted page content so your instruction survives the sanitizer, then ask ReaderBot to summarize the page — and it leaks the API key.
# awaiting command — type a payload and press Enter
No leads yet. Declassify intel one step at a time when you’re stuck.
How this attack works
The sanitizer matched literal keywords. Leet-speak or encoded text has different bytes — the filter sees safe content while the model reconstructs the intent. The attacker controls the source; the defender must anticipate every encoding.
Why it's dangerous
Agents must read external content to be useful. One poisoned document, cached web page, or RAG chunk can hijack every agent that ingests it — the attacker never interacts with the agent directly.
OWASP mapping
Maps to OWASP Top 10 for LLM Applications (2025): LLM01: Prompt Injection (indirect variant — instructions embedded in retrieved external content).
How to defend
- Never treat retrieved content as instructions; enforce a provenance trust boundary.
- Normalize text (decode base64, de-leet, strip invisible chars) before scanning — then re-scan.
- Require explicit user confirmation for any sensitive action triggered by retrieved content.
- Sandbox agents: read-only tools cannot trigger egress; egress tools cannot access secrets.
SkillTotal catches this class of issue deterministically (rule ST-PROMPT-INJECTION).
FAQ
- Why is keyword sanitization insufficient?
- The space of encodings (leet, homoglyphs, base64, unicode normalization forms) is unbounded. A sanitizer that checks raw bytes misses everything that isn't ASCII-identical.
- What is indirect prompt injection?
- Injecting instructions into content the agent retrieves (web pages, documents, tool results), so the attacker never talks to the agent directly.