What is prompt injection in an AI component?

Text embedded in a component (a tool description, a skill file, a README) that instructs the agent to do something you did not ask for — ignore prior instructions, reveal a system prompt, or exfiltrate data. SkillTotal flags strong injection phrases as a malicious indicator, including obfuscated variants.

How does it detect exfiltration?

It correlates sensitive-data access (credential paths, embedded secrets) with network egress in the same component and raises a critical combined finding, rather than flagging filesystem or network use on its own.

What is SARIF and why does it matter?

SARIF is the standard format for static-analysis results. Exporting SARIF lets you import SkillTotal findings into GitHub code scanning, IDEs, and CI dashboards.

Is the engine open source?

Yes — Apache-2.0. The full static engine and CLI are free and offline; paid features are separate, server-side add-ons.

What is SkillTotal's false-positive rate?

Zero false-malicious verdicts across 119 trusted, widely-used components (popular MCP servers, npm/PyPI packages, and agent skills) in the 2026-06-17 calibration run. Capability findings (filesystem/network/shell) are reported but never scored as malicious, so powerful-but-legitimate components are not flagged. A benign false-positive gate — which must stay at zero — runs before every release. This is a benchmark on a curated trusted set, not a precision guarantee against every component in the wild.

← Back to home

How SkillTotal analysis works

SkillTotal answers one question about an AI component: does it look malicious, just powerful, or clean — and shows you why. Here is how it reaches that verdict.

Scan a component — free →

Deterministic, never executed

Detection is pure static analysis — regular expressions plus AST parsing. The engine never executes your code and never calls an LLM, so results are reproducible and it is safe to point at something dangerous. The same open-source engine runs on the website and locally via pipx install skilltotal. (Optional dynamic analysis is a separate paid service that runs only in our isolated sandbox — never on your machine.)

Every finding carries evidence

A confirmed finding always points at a file, a line range, and the exact snippet. Signals that cannot be anchored to evidence are surfaced separately as "needs review" and never affect the score. Everything is derived only from the component's own files — never from your environment.

Capability is not risk

The report separates malicious indicators — deliberate deception such as tool poisoning, hidden-Unicode smuggling, prompt injection, and decode-and-execute — from powerful capabilities like filesystem, network, and shell access, which are often legitimate. Only malicious indicators and genuinely risky constructs drive the risk score, so the scanner does not cry wolf.

What it detects

Install-time execution, obfuscated decode-and-execute droppers, credential and secret exfiltration (sensitive-data access combined with network egress), dangerous MCP tools and tool poisoning, prompt injection — including homoglyph, full-width and zero-width obfuscation — unsafe deserialization, and sensitive-path access.

Methodology

SkillTotal performs static security analysis of AI components — MCP servers, agent skills/plugins, npm and PyPI packages, and AI-generated projects. The engine combines capability analysis, dangerous-pattern detection, privilege analysis, supply-chain (install-time) analysis, prompt-surface analysis, and data-flow correlation (secret access combined with network egress). Findings are mapped to risk categories and contribute to a 0–100 risk score; capabilities are reported but never inflate it — capability ≠ risk.

How it differs from SAST and SCA

SAST tools (e.g. Semgrep) look for insecure patterns in your first-party code; SCA tools (e.g. Snyk, Socket) match your dependencies against known-vulnerability databases. Neither reads an MCP tool description, a SKILL.md, or a plugin manifest as instructions that steer your agent, and neither correlates "reads a credential path" with "sends data to the network" as an exfiltration signal inside a single component. SkillTotal treats the component as something your agent will trust and act on — analyzing its tool metadata, prompt/instruction text, capabilities, and data-flow. It complements SAST and SCA; it does not replace them.

Coverage by component type

Legend: ✅ analyzed by default for this component type · ⚠️ the engine detects this, but the surface is uncommon for this type — so it is flagged only when the component actually contains it (e.g. prompt-injection text inside an npm/PyPI package) · ❌ not applicable to this type · 🚧 planned (SkillTotal Cloud).

Columns are the component types SkillTotal scans. AI project = a scanned repository or folder — an agent skill/plugin, an AI-generated codebase, or a set of prompts/configs — that is not a published npm/PyPI package.

Category	MCP	npm	PyPI	AI project
Prompt injection / instruction override	✅	⚠️	⚠️	✅
Tool poisoning (MCP tool metadata)	✅	❌	❌	⚠️
Dangerous capabilities (shell / fs / network)	✅	✅	✅	⚠️
Data exfiltration (secret access + egress)	✅	✅	✅	⚠️
Secret theft / sensitive-path access	✅	✅	✅	⚠️
Dynamic code execution	✅	✅	✅	⚠️
Obfuscation (decode-and-execute)	✅	✅	✅	✅
Hidden-Unicode smuggling	✅	✅	✅	✅
Embedded secrets (hardcoded keys/tokens)	✅	✅	✅	✅
Install-time / supply-chain hooks	⚠️	✅	✅	❌
Overprivileged / auto-approved tools	✅	❌	❌	⚠️
Runtime behavior analysis	🚧	🚧	🚧	🚧
Sandbox analysis	🚧	🚧	🚧	🚧

Scan by component type

How SkillTotal analyzes each kind of AI component:

Typical findings

An MCP tool can execute arbitrary shell commands
A package downloads and runs code from an external URL
Access to credential locations (~/.aws, ~/.ssh, .env) detected
Dynamic code execution (eval / exec) detected
Prompt-injection / instruction-override phrasing in a tool description or skill
Sensitive-data access combined with outbound network egress
Hardcoded API keys or tokens
An MCP server with auto-approved or overprivileged tools

Out of scope

SkillTotal statically analyzes a single component's own files. It does not execute code, observe runtime behavior, or assess your environment, deployment, or infrastructure. It is not a substitute for:

a penetration test
an application-security (app-sec) review
an architecture / design review
a cloud-security or infrastructure assessment
a Kubernetes / container runtime audit
a business-logic review
a manual code review

Runtime behavior and sandbox analysis are planned for SkillTotal Cloud.

How accurate is it

The number that matters for a scanner is how often it cries wolf. In the latest calibration run (2026-06-17) SkillTotal raised zero false-malicious verdicts across 119 trusted, widely-used components — popular MCP servers, npm/PyPI packages, and agent skills. Powerful but legitimate capabilities — filesystem, network, shell — are reported, never scored as malicious. This is a false-positive benchmark on a curated trusted set — evidence the scanner does not cry wolf on popular software, not a precision guarantee for every component in the wild.

On detection, the engine flags every malicious archetype in its open test set: instruction override, MCP tool-poisoning, hidden-Unicode (ASCII smuggling), decode-and-execute droppers, install-time and postinstall credential exfiltration, import-time stealers, and typosquat droppers. We report detection as archetype coverage rather than a single percentage — real registry malware is taken down quickly, so a live network "detection rate" is noisy, and we will not headline a number that flatters us for the wrong reason.

It is reproducible. The engine, fixtures, and harness are open source (Apache-2.0): run the deterministic floor with pytest tests/test_offline_calibration.py, or re-run the real-world corpus with fetch_corpus.py then calibrate.py. A false-positive gate (benign false positives must stay zero) runs before every release. See the measured detection benchmark (recall, precision, and per-technique coverage — reproducible with one command), or the State of AI Component Security report for the aggregate risk and OWASP distribution across real components.

Export

Every report exports to JSON and to SARIF 2.1.0, so you can wire SkillTotal into code scanning and CI alongside your other tools.

FAQ

What is prompt injection in an AI component?: Text embedded in a component (a tool description, a skill file, a README) that instructs the agent to do something you did not ask for — ignore prior instructions, reveal a system prompt, or exfiltrate data. SkillTotal flags strong injection phrases as a malicious indicator, including obfuscated variants.
How does it detect exfiltration?: It correlates sensitive-data access (credential paths, embedded secrets) with network egress in the same component and raises a critical combined finding, rather than flagging filesystem or network use on its own.
What is SARIF and why does it matter?: SARIF is the standard format for static-analysis results. Exporting SARIF lets you import SkillTotal findings into GitHub code scanning, IDEs, and CI dashboards.
Is the engine open source?: Yes — Apache-2.0. The full static engine and CLI are free and offline; paid features are separate, server-side add-ons.
What is SkillTotal's false-positive rate?: Zero false-malicious verdicts across 119 trusted, widely-used components (popular MCP servers, npm/PyPI packages, and agent skills) in the 2026-06-17 calibration run. Capability findings (filesystem/network/shell) are reported but never scored as malicious, so powerful-but-legitimate components are not flagged. A benign false-positive gate — which must stay at zero — runs before every release. This is a benchmark on a curated trusted set, not a precision guarantee against every component in the wild.