Welcome to Domain 2: Securing AI Systems — this domain represents 40% of the CY0-001 exam and spans the next 8 lessons. You will spend more time in Domain 2 than any other because this is where candidates pass or fail.
Today's lesson covers Objective 2.1: Threat Modeling and introduces the frameworks you will use throughout Domain 2 to identify, categorize, and prioritize AI threats. These are not theoretical constructs — they are actively used by security teams and actively tested on the exam.
The OWASP Top 10 for LLM Applications is the single most referenced framework on the SecAI+ exam. It identifies the ten most critical security risks specific to applications built on large language models.
LLM01: Prompt Injection. The most discussed LLM vulnerability. Attackers craft inputs that override the model's instructions. Direct injection manipulates user-facing prompts; indirect injection hides instructions in external data the model processes. This is the number one risk because it is difficult to fully prevent and can lead to data exfiltration, unauthorized actions, and complete bypass of safety controls.
LLM02: Insecure Output Handling. Applications that trust LLM output without validation are vulnerable. If model output is rendered as HTML, an attacker can achieve cross-site scripting through the model. If output is passed to a database query, SQL injection becomes possible. The fix is treating LLM output as untrusted user input — always validate and sanitize.
LLM03: Training Data Poisoning. Manipulating training data to introduce backdoors, biases, or vulnerabilities into the model. This is particularly dangerous because the effects are embedded in the model's weights and persist through fine-tuning.
LLM04: Model Denial of Service. Crafting inputs that consume excessive computational resources — extremely long prompts, recursive reasoning loops, or complex queries that maximize token generation. Unlike traditional DoS, model DoS can be achieved with a single well-crafted request.
LLM05: Supply Chain Vulnerabilities. Using compromised pre-trained models, poisoned datasets, or vulnerable plugins. The AI supply chain includes model registries, dataset repositories, and third-party integrations — each is an attack surface.
LLM06: Sensitive Information Disclosure. The model reveals confidential data from its training set, system prompts, or connected data sources. This can happen through direct questioning, prompt injection, or inference attacks that reconstruct training data.
LLM07: Insecure Plugin Design. Plugins extend LLM capabilities but often have excessive permissions, lack input validation, or fail to properly authenticate. A compromised or poorly designed plugin gives an attacker a bridge from the model to backend systems.
LLM08: Excessive Agency. The model has too many permissions, too much autonomy, or access to too many tools. When an LLM can execute code, send emails, or modify databases without human approval, a successful prompt injection becomes a full system compromise.
LLM09: Overreliance. Organizations trust LLM output without verification, leading to decisions based on hallucinated or incorrect information. Overreliance is a human factor vulnerability — the model does not need to be attacked; users simply trust it too much.
LLM10: Model Theft. Extracting the model's weights, architecture, or capabilities through repeated API queries. A stolen model can be analyzed to find vulnerabilities, used to generate training data, or deployed as a competing product.
While the LLM Top 10 focuses specifically on language model applications, the OWASP Machine Learning Security Top 10 covers broader ML attack vectors. These include classical ML attacks that do not require a language model.
Key entries include adversarial perturbation (subtle input modifications that cause misclassification), model inversion (reconstructing training data from model outputs), membership inference (determining if a specific data point was used in training), and model stealing (replicating a model through API queries).
The exam expects you to distinguish between the two OWASP lists. The LLM Top 10 is specific to generative AI applications. The ML Security Top 10 covers all machine learning models, including classification, regression, and clustering systems.
MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the AI equivalent of MITRE ATT&CK. If you are familiar with ATT&CK for traditional cybersecurity, ATLAS follows the same structure but focuses on adversarial techniques targeting AI systems.
ATLAS organizes AI attacks into tactics (what the attacker is trying to achieve) and techniques (how they achieve it). Tactics include reconnaissance (gathering information about AI systems), resource development (acquiring tools and compute for attacks), initial access (gaining entry to AI infrastructure), and impact (the damage achieved).
ATLAS also includes case studies — real-world examples of adversarial AI attacks documented with specific techniques used, targets, and outcomes. These case studies are invaluable for threat modeling because they show how theoretical attacks play out in practice.
For the exam, know that ATLAS is the primary framework for mapping adversarial AI techniques to defensive controls. It provides a common language for describing AI attacks, just as ATT&CK provides a common language for traditional cyber attacks.
The MIT AI Risk Repository is an academic resource that categorizes AI risks by domain, severity, and likelihood. It provides a structured taxonomy of risks beyond just security — including societal, ethical, and operational risks. For the exam, understand that this repository is a risk identification resource, not a technical attack framework. It helps organizations understand the broader risk landscape when making governance decisions about AI deployment.
The CVE AI Working Group extends the Common Vulnerabilities and Exposures system to cover AI-specific vulnerabilities. Just as traditional software vulnerabilities get CVE identifiers, AI vulnerabilities — such as specific prompt injection techniques or model-specific weaknesses — are being cataloged in a standardized way. This enables security teams to track disclosed AI vulnerabilities using the same infrastructure they use for traditional software.
STRIDE is a classic threat modeling framework that maps well to AI architectures. Each STRIDE category has specific AI implications.
Spoofing — An attacker impersonates a legitimate user, model, or data source. In AI contexts, this includes adversarial examples that trick a model into misidentifying inputs, or spoofed data sources that inject poisoned training data.
Tampering — Unauthorized modification of data or models. Data poisoning and model tampering are direct examples. An attacker who can modify model weights, training data, or inference inputs can fundamentally alter system behavior.
Repudiation — Actions that cannot be attributed to a specific actor. AI systems often lack comprehensive audit trails, making it difficult to determine who submitted a specific prompt, who modified training data, or who approved a model deployment.
Information Disclosure — Unauthorized access to sensitive information. AI-specific examples include training data extraction, system prompt leakage, membership inference attacks, and model inversion.
Denial of Service — Making the system unavailable. Model DoS attacks use carefully crafted inputs to exhaust computational resources. A single complex prompt can consume more GPU time than thousands of normal requests.
Elevation of Privilege — Gaining unauthorized capabilities. In AI systems, this often manifests as excessive agency — where a prompt injection causes the model to use tools or access data beyond its intended scope.