OWASP Top 10 For LLMs
The OWASP Top 10 for LLMs landed quietly in 2023 and has since become the reference document that security teams reach for when asked to evaluate an LLM deployment. Most organizations treat it as a checklist. The ones operating at scale treat it as a map of failure modes they've already started hitting. The gap between reading the list and understanding what each risk actually looks like in prod…
- Indirect prompt injection, injected through retrieved data, not user input, is more dangerous in practice than direct injection.
- LLM output is untrusted input for any downstream execution context, SQL, shell, API, rendering engine.
- Access control for sensitive data must live at the retrieval layer, not the generation layer.
- Excessive agency failures don't require adversarial input, they happen when capability exceeds defined task scope.
- Overreliance is a workflow design failure, not a model failure, it requires process controls, not just technical ones.
LLM01 Prompt Injection: The Attack Surface Is Your Data
Direct prompt injection, a user crafting malicious input to manipulate the model, gets most of the attention. Indirect prompt injection is more dangerous in practice. It happens when the LLM processes external data, documents, emails, web pages, database records, that contains embedded instructions designed to override system behavior. In a RAG deployment, this means every document in your corpus is a potential attack surface. An injected instruction in a PDF that your agent retrieves and summa
LLM02 Insecure Output Handling: The Response Is Not the Risk
Insecure output handling is misread as a content moderation problem. It isn't. It's a downstream processing problem. The risk is not what the LLM says, it's what happens to that output in the system that receives it. When LLM output is passed directly to a SQL query, a shell command, a JavaScript renderer, or an API call without sanitization, the output becomes an injection vector into that downstream system. I've seen this surface as indirect SQL injection through natural language query interf
LLM06 Sensitive Information Disclosure: Context Windows Leak
Sensitive information disclosure doesn't require an attacker. It happens because LLMs are trained on or given context that includes sensitive data, and they recite that data in response to queries that didn't ask for it directly. The model has no access control layer, if something is in the context window, it can come out in the response. In RAG systems, this manifests as the retrieval step pulling documents with sensitive content that are semantically relevant but not appropriate for the query
Frequently asked questions
- Which OWASP LLM risk causes the most production incidents?
- From what I've observed, LLM08 (Excessive Agency) and LLM06 (Sensitive Information Disclosure) generate the most actual production incidents, not LLM01 (Prompt Injection), which gets the most attention. Excessive agency failures happen in internal deployments with no adversarial activity. Sensitive information disclosure happens silently through m…
- Does the OWASP Top 10 for LLMs apply to internal-only deployments?
- Yes, and in some ways more so. The risks that require adversarial input, like direct prompt injection, are reduced in authenticated internal environments. But the risks that don't require adversarial input, excessive agency, sensitive information disclosure, overreliance, are just as present and often more severe because internal deployments tend …
- How do you prioritize which OWASP LLM risks to address first?
- Prioritize based on your deployment's specific capability profile. If your LLM has tool use or executes actions, address LLM08 (Excessive Agency) and LLM01 (Prompt Injection) first. If it processes sensitive data through RAG or retrieval, address LLM06 (Sensitive Information Disclosure) first. If it's a decision-support system for consequential ch…
- Is prompt injection solved by using a more capable model?
- No. More capable models are generally more susceptible to sophisticated prompt injection, not less, because they're better at following complex instructions, including injected ones. Model capability doesn't substitute for architectural separation of instruction and data processing paths. The mitigation is structural, not a model selection decisio…