Is nltk safe?
nltk-3.9.4 is an AI python_package analyzed by SkillTotal's deterministic static scanner. The scan found no malicious indicators, though 6 risky constructs are reported for review. It can: dynamic code execution, filesystem read, filesystem write, network egress and shell execution — capabilities are what the code can do, not a verdict on intent. Risk score 20/100 (low).
nltk-3.9.4 3.9.4
- Python shell/command execution
- Python dynamic code execution
- Unsafe deserialization
No malicious indicators found by static analysis.
Automated static-analysis result. It can contain false positives and false negatives, and is not a claim about the intent of nltk's authors. Report a false positive.
Findings (6)
It loads data with a format that can rebuild arbitrary objects (e.g. pickle, or unsafe YAML).
db_out = shelve.open(db, "n")
db_in = shelve.open(db)
Why it matters: Feeding such a loader untrusted data can execute code hidden inside that data.
Fix: Deserialize untrusted data with a safe format/loader: JSON, or yaml.safe_load / Loader=SafeLoader. Reserve pickle/marshal for data you fully control.
The code turns strings into live code at runtime (eval / new Function / exec).
funcopy = eval(src, dict(_wrapper_=wrapper))
dec_func = eval(src, dict(_func_=func, _call_=caller))
return eval(s[start_position : match.end()]), match.end()
exec("import %s as model" % options.model)w = eval("numpy." + window + "(window_len)")Why it matters: If those strings aren't fixed and trusted, they become a way to run arbitrary code.
Fix: Avoid evaluating dynamically constructed code; if unavoidable, ensure the input is a trusted constant and never derived from external data.
The component can run operating-system commands or spawn processes.
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
p = Popen(_senna_cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
p = subprocess.Popen(cmd, stdout=sys.stdout)
p = subprocess.Popen(
["svn", "status", "-v", filename],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)p = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE
)p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
p = subprocess.Popen(
["which", alternative],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)proc = subprocess.run(
["dot", "-T%s" % t],
capture_output=True,
input=dot_string,
text=True,
)proc = subprocess.run(
["dot", "-T%s" % t],
input=bytes(dot_string, encoding="utf8"),
)p = subprocess.Popen(cmd, stdout=output, stderr=output)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)self._hunpos = Popen(
[self._hunpos_bin, self._hunpos_model],
shell=False,
stdin=PIPE,
stdout=PIPE,
stderr=PIPE,
)p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
process = subprocess.Popen(
["dot", "-T%s" % output_format],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)Why it matters: Powerful and often legitimate — confirm the commands aren't built from untrusted input.
Fix: Confirm the command and its arguments are fully controlled and not derived from untrusted input; avoid shell=True.
The component reads files from disk.
with open(version_file) as infile:
with open(filename, "rb") as infile:
with open(filename, "rb") as infile:
with open(filename, "rb") as infile:
with open(filename) as infile:
with open(filename) as infile:
with open(usp) as infile:
with open(annfile) as infile:
with open(textfile) as infile:
"text": _ieer_read_text(m.group("text"), root_label),"headline": _ieer_read_text(m.group("headline"), root_label),return _ieer_read_text(s, root_label)
with open(weightfile_name) as weightfile:
with open(mapper_file, encoding="utf-8") as raw:
with open(ngram_file, encoding="utf-8") as f:
with open(f) as infile:
with open(path) as lin_file:
fr = open(self.read_file)
with self.abspath(framefile).open() as fp:
with self.abspath(framefile).open() as fp:
with open(self._textids) as fp:
with self.abspath(framefile).open() as fp:
with self.abspath(framefile).open() as fp:
open(self._fileid, "rb"), self._encoding
self._stream = open(self._fileid, "rb")
Why it matters: Usually legitimate, but worth confirming it can't be steered into reading sensitive files.
Fix: Confirm which files are read and that paths cannot be influenced by untrusted input to reach sensitive locations.
The component writes or deletes files on disk.
with open(filename, "wb") as outfile:
with open(filename, "wb") as outfile:
with open(filename, "wb") as outfile:
with open(filename, "w") as outfile:
with open(filename, "w") as outfile:
with open(filename, "w") as outfile:
logfile = open(logfilename, "a", 1) # 1 means 'line buffering'
with open(trainfile_name, "w") as trainfile:
os.remove(trainfile_name)
os.remove(trainfile_name)
os.remove(weightfile_name)
with open(f"{tab_dir}/weights.txt", "w") as f:with open(f"{tab_dir}/mapping.tab", "w") as f:with open(f"{tab_dir}/labels.txt", "w") as f:with open(f"{tab_dir}/alwayson.tab", "w") as f:os.remove(os.path.join(temp_dir, f))
os.remove(os.path.join(temp_dir, f))
outfile = open(outfile, "w")
os.remove(self.write_file.name)
with open(filename, "wb") as outfile:
os.remove(filepath)
with open(filepath, "wb") as outfile:
with open(filename, "wb") as f:
os.remove(input_file.name)
os.remove(output_file.name)
Why it matters: Usually legitimate, but worth confirming the paths can't be controlled by untrusted input.
Fix: Confirm which files are written/deleted and that paths cannot be influenced by untrusted input.
The component makes outbound network requests.
from urllib.parse import unquote_plus
if unquote_plus(sp) == "SHUTDOWN THE SERVER":
usp = unquote_plus(sp)
import urllib.request
from urllib.request import url2pathname
p = os.path.join(path_, url2pathname(resource_name))
pkg_dir = os.path.join(path_, url2pathname(pkg))
pkg_zip = os.path.join(path_, url2pathname(pkg + ".zip"))
p = os.path.join(path_, url2pathname(zipfile))
local_path = url2pathname(path_)
from urllib.error import HTTPError, URLError
import requests
response = requests.get(requests.compat.urljoin(self.url, "live"))
response = requests.get(requests.compat.urljoin(self.url, "ready"))
import requests
self.session = requests.Session()
import urllib.request
from urllib.parse import unquote, urlparse
parsed = urlparse(raw)
raw = unquote(parsed.path)
parsed = urlparse(str(url_input))
validate_path(unquote(parsed.path), context=f"{context}.file_scheme")opener = urllib.request.build_opener(_ValidatingRedirectHandler())
import requests
Why it matters: Usually legitimate, but confirm the destinations are expected and no sensitive data leaves.
Fix: Confirm the destination hosts are expected and that no sensitive data is sent off-host.
Check your own component
Run the same evidence-backed scan on any MCP server, agent skill, or package.
Scan your own componentOr get notified if this component's risk changes:
How we determine this: deterministic static analysis (regex + AST), evidence-anchored, no code execution. Methodology →