Day 5 of 21

AI Threat Modeling Frameworks

⏱ 20 min 📊 Medium CompTIA SecAI+ Prep

Welcome to Domain 2: Securing AI Systems — this domain represents 40% of the CY0-001 exam and spans the next 8 lessons. You will spend more time in Domain 2 than any other because this is where candidates pass or fail.

Today's lesson covers Objective 2.1: Threat Modeling and introduces the frameworks you will use throughout Domain 2 to identify, categorize, and prioritize AI threats. These are not theoretical constructs — they are actively used by security teams and actively tested on the exam.

OWASP LLM Top 10

The OWASP Top 10 for LLM Applications is the single most referenced framework on the SecAI+ exam. It identifies the ten most critical security risks specific to applications built on large language models.

LLM01: Prompt Injection. The most discussed LLM vulnerability. Attackers craft inputs that override the model's instructions. Direct injection manipulates user-facing prompts; indirect injection hides instructions in external data the model processes. This is the number one risk because it is difficult to fully prevent and can lead to data exfiltration, unauthorized actions, and complete bypass of safety controls.

LLM02: Insecure Output Handling. Applications that trust LLM output without validation are vulnerable. If model output is rendered as HTML, an attacker can achieve cross-site scripting through the model. If output is passed to a database query, SQL injection becomes possible. The fix is treating LLM output as untrusted user input — always validate and sanitize.

LLM03: Training Data Poisoning. Manipulating training data to introduce backdoors, biases, or vulnerabilities into the model. This is particularly dangerous because the effects are embedded in the model's weights and persist through fine-tuning.

LLM04: Model Denial of Service. Crafting inputs that consume excessive computational resources — extremely long prompts, recursive reasoning loops, or complex queries that maximize token generation. Unlike traditional DoS, model DoS can be achieved with a single well-crafted request.

LLM05: Supply Chain Vulnerabilities. Using compromised pre-trained models, poisoned datasets, or vulnerable plugins. The AI supply chain includes model registries, dataset repositories, and third-party integrations — each is an attack surface.

LLM06: Sensitive Information Disclosure. The model reveals confidential data from its training set, system prompts, or connected data sources. This can happen through direct questioning, prompt injection, or inference attacks that reconstruct training data.

LLM07: Insecure Plugin Design. Plugins extend LLM capabilities but often have excessive permissions, lack input validation, or fail to properly authenticate. A compromised or poorly designed plugin gives an attacker a bridge from the model to backend systems.

LLM08: Excessive Agency. The model has too many permissions, too much autonomy, or access to too many tools. When an LLM can execute code, send emails, or modify databases without human approval, a successful prompt injection becomes a full system compromise.

LLM09: Overreliance. Organizations trust LLM output without verification, leading to decisions based on hallucinated or incorrect information. Overreliance is a human factor vulnerability — the model does not need to be attacked; users simply trust it too much.

LLM10: Model Theft. Extracting the model's weights, architecture, or capabilities through repeated API queries. A stolen model can be analyzed to find vulnerabilities, used to generate training data, or deployed as a competing product.

Knowledge Check

An LLM-powered customer service bot generates a response containing a JavaScript snippet. The application renders this response in a web browser without sanitization, executing the script. Which OWASP LLM risk does this demonstrate?

This is LLM02: Insecure Output Handling. The vulnerability is not in the model itself but in the application's failure to sanitize model output before rendering it. The application treats LLM output as trusted HTML, enabling cross-site scripting. The fix is to treat all LLM output as untrusted input and sanitize it before rendering.

OWASP ML Security Top 10

While the LLM Top 10 focuses specifically on language model applications, the OWASP Machine Learning Security Top 10 covers broader ML attack vectors. These include classical ML attacks that do not require a language model.

Key entries include adversarial perturbation (subtle input modifications that cause misclassification), model inversion (reconstructing training data from model outputs), membership inference (determining if a specific data point was used in training), and model stealing (replicating a model through API queries).

The exam expects you to distinguish between the two OWASP lists. The LLM Top 10 is specific to generative AI applications. The ML Security Top 10 covers all machine learning models, including classification, regression, and clustering systems.

Knowledge Check

An attacker adds imperceptible noise to an image, causing a facial recognition system to misidentify the person. Which OWASP list addresses this attack type?

Adversarial perturbation against a computer vision model falls under the OWASP ML Security Top 10, not the LLM Top 10. The ML Security Top 10 covers all machine learning models including image classifiers. The LLM Top 10 is specific to language model applications.

MITRE ATLAS

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the AI equivalent of MITRE ATT&CK. If you are familiar with ATT&CK for traditional cybersecurity, ATLAS follows the same structure but focuses on adversarial techniques targeting AI systems.

ATLAS organizes AI attacks into tactics (what the attacker is trying to achieve) and techniques (how they achieve it). Tactics include reconnaissance (gathering information about AI systems), resource development (acquiring tools and compute for attacks), initial access (gaining entry to AI infrastructure), and impact (the damage achieved).

ATLAS also includes case studies — real-world examples of adversarial AI attacks documented with specific techniques used, targets, and outcomes. These case studies are invaluable for threat modeling because they show how theoretical attacks play out in practice.

For the exam, know that ATLAS is the primary framework for mapping adversarial AI techniques to defensive controls. It provides a common language for describing AI attacks, just as ATT&CK provides a common language for traditional cyber attacks.

Knowledge Check

An organization wants to document AI-specific attack techniques using a framework that mirrors the structure of MITRE ATT&CK. Which resource should they use?

MITRE ATLAS follows the same tactics-and-techniques structure as MITRE ATT&CK but focuses specifically on adversarial AI attacks. It provides case studies and a common vocabulary for describing AI threats, making it the natural complement to ATT&CK for organizations already using that framework.

MIT AI Risk Repository and CVE AI Working Group

The MIT AI Risk Repository is an academic resource that categorizes AI risks by domain, severity, and likelihood. It provides a structured taxonomy of risks beyond just security — including societal, ethical, and operational risks. For the exam, understand that this repository is a risk identification resource, not a technical attack framework. It helps organizations understand the broader risk landscape when making governance decisions about AI deployment.

The CVE AI Working Group extends the Common Vulnerabilities and Exposures system to cover AI-specific vulnerabilities. Just as traditional software vulnerabilities get CVE identifiers, AI vulnerabilities — such as specific prompt injection techniques or model-specific weaknesses — are being cataloged in a standardized way. This enables security teams to track disclosed AI vulnerabilities using the same infrastructure they use for traditional software.

Side-by-side comparison of OWASP LLM Top 10 and MITRE ATLAS frameworks

The two most exam-tested AI security frameworks compared. Use both together for complete coverage.

STRIDE Applied to AI

STRIDE is a classic threat modeling framework that maps well to AI architectures. Each STRIDE category has specific AI implications.

Spoofing — An attacker impersonates a legitimate user, model, or data source. In AI contexts, this includes adversarial examples that trick a model into misidentifying inputs, or spoofed data sources that inject poisoned training data.

Tampering — Unauthorized modification of data or models. Data poisoning and model tampering are direct examples. An attacker who can modify model weights, training data, or inference inputs can fundamentally alter system behavior.

Repudiation — Actions that cannot be attributed to a specific actor. AI systems often lack comprehensive audit trails, making it difficult to determine who submitted a specific prompt, who modified training data, or who approved a model deployment.

Information Disclosure — Unauthorized access to sensitive information. AI-specific examples include training data extraction, system prompt leakage, membership inference attacks, and model inversion.

Denial of Service — Making the system unavailable. Model DoS attacks use carefully crafted inputs to exhaust computational resources. A single complex prompt can consume more GPU time than thousands of normal requests.

Elevation of Privilege — Gaining unauthorized capabilities. In AI systems, this often manifests as excessive agency — where a prompt injection causes the model to use tools or access data beyond its intended scope.

Knowledge Check

A security team is conducting STRIDE analysis on their AI chatbot. An attacker sends prompts that cause the chatbot to access an admin API it was not designed to use. Which STRIDE category does this fall under?

The attacker is gaining unauthorized capabilities — causing the chatbot to use an admin API beyond its intended scope. This maps to Elevation of Privilege in the STRIDE model. The chatbot's permissions are effectively escalated through prompt manipulation.

Knowledge Check

Which framework would BEST help an organization create a comprehensive list of AI-specific adversarial techniques with associated case studies?

MITRE ATLAS specifically catalogs adversarial AI techniques with real-world case studies. STRIDE is a general threat modeling framework, OWASP LLM Top 10 covers the top 10 risks but lacks detailed technique enumeration, and the MIT AI Risk Repository focuses on broader risk categorization rather than specific attack techniques.

🛡️

Day 5 Complete

"You now know the core frameworks: OWASP LLM Top 10 for language model risks, OWASP ML Security Top 10 for broader ML attacks, MITRE ATLAS for adversarial technique mapping, and STRIDE adapted for AI architectures. These frameworks are the foundation for everything in Domain 2."

Next Lesson

Implementing Security Controls for AI Models and Gateways

→