Traditional threat modeling — STRIDE, PASTA, attack trees — assumes you’re reasoning about deterministic systems. You enumerate inputs, trace data flows, identify trust boundaries, and the system behaves consistently within those constraints.
AI systems break that assumption at the foundation.
When the model is the attack surface, the threat landscape shifts in three fundamental ways:
1. The Input Space Is Unbounded
A SQL injection attack targets a parser with known grammar. An LLM processes natural language — infinite valid inputs, each potentially triggering different behavior. You can’t enumerate the attack surface; you can only probe it.
This means traditional coverage metrics are meaningless. “We tested 1,000 inputs” tells you almost nothing about the 10,001st.
2. Behavior Is Probabilistic, Not Deterministic
The same prompt, at temperature > 0, can produce different outputs. Security controls that rely on consistent output formatting — content filters, structured output validators — have failure modes that don’t appear in testing and surface unpredictably in production.
3. The Training Data Is Part of the Attack Surface
Data poisoning, backdoor attacks, and membership inference attacks operate at a layer most security teams have never had to think about. The “supply chain” for an AI system includes every document it was trained on.
What To Do About It
The honest answer: adapt your threat model to match probabilistic systems.
- Red team with adversarial prompts, not just functional test cases
- Treat model outputs as untrusted user input — validate downstream, always
- Monitor output distributions in production, not just errors
- Include the model card and training provenance in your supply chain review
The frameworks are catching up. MITRE ATLAS gives you ATT&CK-style adversarial ML taxonomy. OWASP LLM Top 10 covers the most common failure modes. But the mindset shift matters more than the framework: when behavior is probabilistic, security is statistical.