Why is keyword sanitization insufficient?

The space of encodings (leet, homoglyphs, base64, unicode normalization forms) is unbounded. A sanitizer that checks raw bytes misses everything that isn't ASCII-identical.

What is indirect prompt injection?

Injecting instructions into content the agent retrieves (web pages, documents, tool results), so the attacker never talks to the agent directly.

cd ~/labs

visitor@skilltotal:~$ cat ./labs/indirect-prompt-injection/mission.txt

Indirect Prompt Injection

lab 02 · ST-PROMPT-INJECTION · LLM01

mission.txt

scenario

ReaderBot summarizes a web page when you ask it to. Before reading, it runs a sanitizer that strips any line containing a banned word (ignore, reveal, secret, password, system, instruction, api key). You control the page content (left panel), but not ReaderBot’s own instructions.

objective

Poison the Untrusted page content so your instruction survives the sanitizer, then ask ReaderBot to summarize the page — and it leaks the API key.

session — indirect-prompt-injection

untrusted_page.html

# awaiting command — type a payload and press Enter

▚ Intel

No leads yet. Declassify intel one step at a time when you’re stuck.

▰ Dossierclassified — solve to unseal

How this attack works

The sanitizer matched literal keywords. Leet-speak or encoded text has different bytes — the filter sees safe content while the model reconstructs the intent. The attacker controls the source; the defender must anticipate every encoding.

Why it's dangerous

Agents must read external content to be useful. One poisoned document, cached web page, or RAG chunk can hijack every agent that ingests it — the attacker never interacts with the agent directly.

OWASP mapping

Maps to OWASP Top 10 for LLM Applications (2025): LLM01: Prompt Injection (indirect variant — instructions embedded in retrieved external content).

How to defend

Never treat retrieved content as instructions; enforce a provenance trust boundary.
Normalize text (decode base64, de-leet, strip invisible chars) before scanning — then re-scan.
Require explicit user confirmation for any sensitive action triggered by retrieved content.
Sandbox agents: read-only tools cannot trigger egress; egress tools cannot access secrets.

SkillTotal catches this class of issue deterministically (rule ST-PROMPT-INJECTION).

Scan AI component (free)

FAQ

Why is keyword sanitization insufficient?: The space of encodings (leet, homoglyphs, base64, unicode normalization forms) is unbounded. A sanitizer that checks raw bytes misses everything that isn't ASCII-identical.
What is indirect prompt injection?: Injecting instructions into content the agent retrieves (web pages, documents, tool results), so the attacker never talks to the agent directly.