Is nltk safe?

nltk-3.9.4 is an AI python_package analyzed by SkillTotal's deterministic static scanner. The scan found no malicious indicators, though 6 risky constructs are reported for review. It can: dynamic code execution, filesystem read, filesystem write, network egress and shell execution — capabilities are what the code can do, not a verdict on intent. Risk score 20/100 (low).

nltk-3.9.4 3.9.4

python_package · pypi:nltk

LOW

/ 100 risk score

Snapshot · scanned Jul 2, 2026 · nltk-3.9.4@3.9.4 · engine 0.24.0 / ruleset 25

No malicious indicators - review capabilities before installing

Notable — review in context (capabilities are not malware):

Python shell/command execution
Python dynamic code execution
Unsafe deserialization

No malicious indicators found by static analysis.

Automated static-analysis result. It can contain false positives and false negatives, and is not a claim about the intent of nltk's authors. Report a false positive.

Capabilities — what this component can do (not a risk score):

dynamic code executionfilesystem readfilesystem writenetwork egressshell execution

Findings (6)

HIGHUnsafe deserializationST-DESERIALIZE-PY

It loads data with a format that can rebuild arbitrary objects (e.g. pickle, or unsafe YAML).

nltk/sem/chat80.py:615-615

db_out = shelve.open(db, "n")

nltk/sem/chat80.py:635-635

db_in = shelve.open(db)

Why it matters: Feeding such a loader untrusted data can execute code hidden inside that data.

Fix: Deserialize untrusted data with a safe format/loader: JSON, or yaml.safe_load / Loader=SafeLoader. Reserve pickle/marshal for data you fully control.

HIGHPython dynamic code executionST-DYN-PY

The code turns strings into live code at runtime (eval / new Function / exec).

nltk/decorators.py:136-136

funcopy = eval(src, dict(_wrapper_=wrapper))

nltk/decorators.py:204-204

dec_func = eval(src, dict(_func_=func, _call_=caller))

nltk/internals.py:231-231

return eval(s[start_position : match.end()]), match.end()

nltk/sem/util.py:274-274

exec("import %s as model" % options.model)

nltk/tokenize/texttiling.py:531-531

w = eval("numpy." + window + "(window_len)")

Why it matters: If those strings aren't fixed and trusted, they become a way to run arbitrary code.

Fix: Avoid evaluating dynamically constructed code; if unavoidable, ensure the input is a trusted constant and never derived from external data.

HIGHPython shell/command executionST-SHELL-PY

The component can run operating-system commands or spawn processes.

nltk/classify/megam.py:172-172

p = subprocess.Popen(cmd, stdout=subprocess.PIPE)

nltk/classify/senna.py:138-138

p = Popen(_senna_cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)

nltk/classify/tadm.py:85-85

p = subprocess.Popen(cmd, stdout=sys.stdout)

nltk/downloader.py:2564-2568

p = subprocess.Popen(
        ["svn", "status", "-v", filename],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )

nltk/inference/prover9.py:207-209

p = subprocess.Popen(
            cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE
        )

nltk/internals.py:140-140

p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)

nltk/internals.py:590-594

p = subprocess.Popen(
                    ["which", alternative],
                    stdout=subprocess.PIPE,
                    stderr=subprocess.PIPE,
                )

nltk/parse/dependencygraph.py:566-571

proc = subprocess.run(
                    ["dot", "-T%s" % t],
                    capture_output=True,
                    input=dot_string,
                    text=True,
                )

nltk/parse/dependencygraph.py:573-576

proc = subprocess.run(
                    ["dot", "-T%s" % t],
                    input=bytes(dot_string, encoding="utf8"),
                )

nltk/parse/malt.py:273-273

p = subprocess.Popen(cmd, stdout=output, stderr=output)

nltk/sem/boxer.py:267-267

p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

nltk/sem/boxer.py:271-276

p = subprocess.Popen(
                cmd,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
            )

nltk/tag/hunpos.py:97-103

self._hunpos = Popen(
            [self._hunpos_bin, self._hunpos_model],
            shell=False,
            stdin=PIPE,
            stdout=PIPE,
            stderr=PIPE,
        )

nltk/tokenize/repp.py:113-113

p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

nltk/translate/api.py:132-137

process = subprocess.Popen(
                ["dot", "-T%s" % output_format],
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
            )

Why it matters: Powerful and often legitimate — confirm the commands aren't built from untrusted input.

Fix: Confirm the command and its arguments are fully controlled and not derived from untrusted input; avoid shell=True.

MEDIUMPython filesystem readST-FS-PY-READ

The component reads files from disk.

nltk/__init__.py:33-33

with open(version_file) as infile:

nltk/app/chartparser_app.py:815-815

with open(filename, "rb") as infile:

nltk/app/chartparser_app.py:2272-2272

with open(filename, "rb") as infile:

nltk/app/chartparser_app.py:2310-2310

with open(filename, "rb") as infile:

nltk/app/chartparser_app.py:2313-2313

with open(filename) as infile:

nltk/app/chunkparser_app.py:1415-1415

with open(filename) as infile:

nltk/app/wordnet_app.py:110-110

with open(usp) as infile:

nltk/chunk/named_entity.py:235-235

with open(annfile) as infile:

nltk/chunk/named_entity.py:247-247

with open(textfile) as infile:

nltk/chunk/util.py:582-582

"text": _ieer_read_text(m.group("text"), root_label),

nltk/chunk/util.py:588-588

"headline": _ieer_read_text(m.group("headline"), root_label),

nltk/chunk/util.py:591-591

return _ieer_read_text(s, root_label)

nltk/classify/maxent.py:1542-1542

with open(weightfile_name) as weightfile:

nltk/corpus/reader/crubadan.py:78-78

with open(mapper_file, encoding="utf-8") as raw:

nltk/corpus/reader/crubadan.py:97-97

with open(ngram_file, encoding="utf-8") as f:

nltk/corpus/reader/ipipan.py:191-191

with open(f) as infile:

nltk/corpus/reader/lin.py:43-43

with open(path) as lin_file:

nltk/corpus/reader/nkjp.py:256-256

fr = open(self.read_file)

nltk/corpus/reader/nombank.py:110-110

with self.abspath(framefile).open() as fp:

nltk/corpus/reader/nombank.py:133-133

with self.abspath(framefile).open() as fp:

nltk/corpus/reader/pl196x.py:110-110

with open(self._textids) as fp:

nltk/corpus/reader/propbank.py:106-106

with self.abspath(framefile).open() as fp:

nltk/corpus/reader/propbank.py:129-129

with self.abspath(framefile).open() as fp:

nltk/corpus/reader/util.py:212-212

open(self._fileid, "rb"), self._encoding

nltk/corpus/reader/util.py:215-215

self._stream = open(self._fileid, "rb")

Why it matters: Usually legitimate, but worth confirming it can't be steered into reading sensitive files.

Fix: Confirm which files are read and that paths cannot be influenced by untrusted input to reach sensitive locations.

MEDIUMPython filesystem write/deleteST-FS-PY-WRITE

The component writes or deletes files on disk.

nltk/app/chartparser_app.py:798-798

with open(filename, "wb") as outfile:

nltk/app/chartparser_app.py:2295-2295

with open(filename, "wb") as outfile:

nltk/app/chartparser_app.py:2327-2327

with open(filename, "wb") as outfile:

nltk/app/chartparser_app.py:2330-2330

with open(filename, "w") as outfile:

nltk/app/chunkparser_app.py:1394-1394

with open(filename, "w") as outfile:

nltk/app/chunkparser_app.py:1430-1430

with open(filename, "w") as outfile:

nltk/app/wordnet_app.py:224-224

logfile = open(logfilename, "a", 1)  # 1 means 'line buffering'

nltk/classify/maxent.py:1440-1440

with open(trainfile_name, "w") as trainfile:

nltk/classify/maxent.py:1478-1478

os.remove(trainfile_name)

nltk/classify/maxent.py:1545-1545

os.remove(trainfile_name)

nltk/classify/maxent.py:1546-1546

os.remove(weightfile_name)

nltk/classify/maxent.py:1596-1596

with open(f"{tab_dir}/weights.txt", "w") as f:

nltk/classify/maxent.py:1598-1598

with open(f"{tab_dir}/mapping.tab", "w") as f:

nltk/classify/maxent.py:1600-1600

with open(f"{tab_dir}/labels.txt", "w") as f:

nltk/classify/maxent.py:1602-1602

with open(f"{tab_dir}/alwayson.tab", "w") as f:

nltk/classify/weka.py:135-135

os.remove(os.path.join(temp_dir, f))

nltk/classify/weka.py:242-242

os.remove(os.path.join(temp_dir, f))

nltk/classify/weka.py:277-277

outfile = open(outfile, "w")

nltk/corpus/reader/nkjp.py:280-280

os.remove(self.write_file.name)

nltk/data.py:769-769

with open(filename, "wb") as outfile:

nltk/downloader.py:707-707

os.remove(filepath)

nltk/downloader.py:719-719

with open(filepath, "wb") as outfile:

nltk/draw/util.py:1870-1870

with open(filename, "wb") as f:

nltk/parse/malt.py:219-219

os.remove(input_file.name)

nltk/parse/malt.py:220-220

os.remove(output_file.name)

Why it matters: Usually legitimate, but worth confirming the paths can't be controlled by untrusted input.

Fix: Confirm which files are written/deleted and that paths cannot be influenced by untrusted input.

MEDIUMPython network egressST-NET-PY

The component makes outbound network requests.

nltk/app/wordnet_app.py:64-64

from urllib.parse import unquote_plus

nltk/app/wordnet_app.py:87-87

if unquote_plus(sp) == "SHUTDOWN THE SERVER":

nltk/app/wordnet_app.py:106-106

usp = unquote_plus(sp)

nltk/data.py:42-42

import urllib.request

nltk/data.py:48-48

from urllib.request import url2pathname

nltk/data.py:653-653

p = os.path.join(path_, url2pathname(resource_name))

nltk/data.py:669-669

pkg_dir = os.path.join(path_, url2pathname(pkg))

nltk/data.py:670-670

pkg_zip = os.path.join(path_, url2pathname(pkg + ".zip"))

nltk/data.py:676-676

p = os.path.join(path_, url2pathname(zipfile))

nltk/data.py:1129-1129

local_path = url2pathname(path_)

nltk/downloader.py:173-173

from urllib.error import HTTPError, URLError

nltk/parse/corenlp.py:116-116

import requests

nltk/parse/corenlp.py:151-151

response = requests.get(requests.compat.urljoin(self.url, "live"))

nltk/parse/corenlp.py:162-162

response = requests.get(requests.compat.urljoin(self.url, "ready"))

nltk/parse/corenlp.py:206-206

import requests

nltk/parse/corenlp.py:217-217

self.session = requests.Session()

nltk/pathsec.py:16-16

import urllib.request

nltk/pathsec.py:21-21

from urllib.parse import unquote, urlparse

nltk/pathsec.py:85-85

parsed = urlparse(raw)

nltk/pathsec.py:89-89

raw = unquote(parsed.path)

nltk/pathsec.py:207-207

parsed = urlparse(str(url_input))

nltk/pathsec.py:209-209

validate_path(unquote(parsed.path), context=f"{context}.file_scheme")

nltk/pathsec.py:249-249

opener = urllib.request.build_opener(_ValidatingRedirectHandler())

nltk/twitter/twitterclient.py:32-32

import requests

Why it matters: Usually legitimate, but confirm the destinations are expected and no sensitive data leaves.

Fix: Confirm the destination hosts are expected and that no sensitive data is sent off-host.

Check your own component

Run the same evidence-backed scan on any MCP server, agent skill, or package.

Scan your own component

Or get notified if this component's risk changes:

How we determine this: deterministic static analysis (regex + AST), evidence-anchored, no code execution. Methodology →