π§ OWASP LLM Security Skill
π― Purpose
Enforce LLM-specific security controls aligned with OWASP LLM Top 10 2025.
Key Principle: "LLMs introduce unique security risks requiring specialized controls."
π Scope
-
π Prompt Injection Prevention
-
πΎ Insecure Output Handling
-
ποΈ Model Denial of Service
-
οΏ½οΏ½ Supply Chain Vulnerabilities
-
π Insecure Plugin Design
-
π Excessive Agency
-
π‘οΈ Data Leakage Prevention
βοΈ Security Rules
MUST Requirements
owasp_llm_top_10_controls: llm01_prompt_injection: input_validation: - sanitize_user_input: remove_injection_attempts - prompt_templates: use_parameterized_prompts - context_isolation: separate_user_system_contexts
detection:
- monitor_outputs: alert_on_suspicious_responses
- log_inputs: track_all_prompts_for_analysis
- anomaly_detection: flag_unusual_input_patterns
llm02_insecure_output_handling: output_validation: - sanitize_before_render: html_encode_llm_outputs - validate_format: check_expected_output_structure - xss_prevention: never_trust_llm_generated_html
secure_integration:
- escaped_rendering: use_safe_templating_engines
- csp_headers: content_security_policy_strict
llm03_training_data_poisoning: data_quality: - source_validation: verify_training_data_sources - adversarial_testing: test_for_backdoor_triggers - data_provenance: track_dataset_origins
monitoring:
- model_behavior: detect_unexpected_outputs
- regular_retraining: update_with_clean_datasets
llm04_model_denial_of_service: rate_limiting: - per_user_limits: 100_requests_per_hour - cost_caps: maximum_tokens_per_request - timeout_enforcement: 30_second_maximum_response_time
resource_management:
- queue_management: priority_queues_for_critical_users
- circuit_breakers: auto_disable_on_abuse_detection
llm05_supply_chain_vulnerabilities: third_party_models: - vendor_assessment: security_evaluation_before_use - model_provenance: verify_official_sources_only - sbom: software_bill_of_materials_for_ai_components
monitoring:
- dependency_scanning: check_for_vulnerable_libraries
- model_updates: track_security_patches_from_vendors
llm06_sensitive_info_disclosure: data_protection: - no_pii_training: never_train_on_personal_data - output_filtering: redact_potential_secrets_in_responses - context_limits: limit_context_window_to_reduce_leakage
testing:
- red_team: attempt_to_extract_training_data
- regression_tests: verify_no_memorization_of_secrets
llm07_insecure_plugin_design: plugin_security: - least_privilege: plugins_minimal_permissions_required - input_validation: validate_all_plugin_inputs - authentication: require_auth_for_plugin_execution
review_process:
- security_review: all_plugins_security_audited
- sandboxing: isolate_plugin_execution_environment
llm08_excessive_agency: authorization: - human_approval: require_approval_for_critical_actions - scope_limits: restrict_llm_to_read_only_operations - audit_trail: log_all_llm_initiated_actions
safeguards:
- action_validation: confirm_intended_action_before_execution
- rollback_capability: undo_mechanism_for_llm_actions
llm09_overreliance: human_oversight: - fact_checking: verify_llm_outputs_before_trust - disclaimer: inform_users_llm_may_hallucinate - critical_decisions: never_fully_automate_without_review
llm10_model_theft: access_controls: - api_authentication: require_strong_auth_for_model_access - rate_limiting: prevent_model_extraction_via_queries - watermarking: embed_watermarks_in_model_outputs
MUST NOT Prohibitions
prohibited_llm_practices:
- unvalidated_prompts: accepting_raw_user_input_to_llm
- trusting_outputs: using_llm_responses_without_validation
- no_rate_limits: allowing_unlimited_api_calls
- exposing_models: public_access_to_model_weights
- training_on_secrets: including_api_keys_in_training_data
- unmonitored_usage: no_logging_or_alerting_for_abuse
π‘ Examples
Example 1: Prompt Injection Prevention
Secure LLM Integration with Input Validation
import re from typing import Optional
def sanitize_user_input(user_input: str) -> str: """Remove potential prompt injection attempts""" # Remove system-like instructions injection_patterns = [ r"ignore previous instructions", r"disregard above", r"you are now", r"system:", r"admin:", ]
cleaned = user_input
for pattern in injection_patterns:
cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
# Limit length to prevent token exhaustion
max_length = 500
cleaned = cleaned[:max_length]
return cleaned.strip()
def safe_llm_query(user_question: str) -> Optional[str]: """Safely query LLM with validated input""" # Validate and sanitize input sanitized_input = sanitize_user_input(user_question)
if not sanitized_input:
return "Invalid input detected"
# Use parameterized prompt template
system_prompt = "You are a helpful assistant. Answer factually and concisely."
user_prompt = f"User question: {sanitized_input}"
# Query LLM with separated contexts
response = query_llm(
system=system_prompt,
user=user_prompt,
max_tokens=150,
temperature=0.7
)
# Validate output before returning
if contains_suspicious_content(response):
log_security_event("Suspicious LLM output detected", response)
return "Response blocked for security reasons"
return html_escape(response) # XSS prevention
Example 2: Rate Limiting and DoS Prevention
LLM API Rate Limiting
from flask import Flask, request, jsonify from flask_limiter import Limiter from flask_limiter.util import get_remote_address import time
app = Flask(name)
Rate limiter configuration
limiter = Limiter( app=app, key_func=get_remote_address, default_limits=["100 per hour", "10 per minute"], storage_uri="redis://localhost:6379" )
Cost-based limiting
MAX_TOKENS_PER_REQUEST = 1000 MAX_COST_PER_USER_PER_DAY = 100 # USD
@app.route('/api/llm/query', methods=['POST']) @limiter.limit("10 per minute") def llm_query(): """LLM API endpoint with comprehensive DoS protection""" user_id = request.headers.get('X-User-ID') user_input = request.json.get('prompt', '')
# Token limit enforcement
estimated_tokens = len(user_input.split()) * 1.3 # Rough estimate
if estimated_tokens > MAX_TOKENS_PER_REQUEST:
return jsonify({
"error": "Request exceeds maximum token limit"
}), 400
# Daily cost limit check
user_cost_today = get_user_cost_today(user_id)
if user_cost_today >= MAX_COST_PER_USER_PER_DAY:
return jsonify({
"error": "Daily cost limit exceeded"
}), 429
# Timeout enforcement
start_time = time.time()
timeout_seconds = 30
try:
response = query_llm_with_timeout(
user_input,
timeout=timeout_seconds
)
# Track usage cost
cost = calculate_usage_cost(user_input, response)
record_user_cost(user_id, cost)
return jsonify({
"response": response,
"cost": cost,
"remaining_daily_quota": MAX_COST_PER_USER_PER_DAY - user_cost_today
})
except TimeoutError:
return jsonify({
"error": "Request timed out"
}), 504
Example 3: Sensitive Data Leakage Prevention
Output Filtering to Prevent Data Leakage
import re from typing import List, Tuple
Patterns for sensitive data detection
SENSITIVE_PATTERNS = { 'api_key': r'[A-Za-z0-9]{32,}', 'email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}', 'ssn': r'\b\d{3}-\d{2}-\d{4}\b', 'credit_card': r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b', 'aws_key': r'AKIA[0-9A-Z]{16}', }
def detect_sensitive_data(text: str) -> List[Tuple[str, str]]: """Detect potential sensitive data in LLM output""" findings = []
for data_type, pattern in SENSITIVE_PATTERNS.items():
matches = re.findall(pattern, text)
for match in matches:
findings.append((data_type, match))
return findings
def redact_sensitive_data(text: str) -> str: """Redact sensitive data from LLM output""" redacted = text
for data_type, pattern in SENSITIVE_PATTERNS.items():
redacted = re.sub(pattern, f"[REDACTED_{data_type.upper()}]", redacted)
return redacted
def safe_llm_output_handler(llm_response: str) -> dict: """Validate and sanitize LLM output before display""" # Detect sensitive data sensitive_findings = detect_sensitive_data(llm_response)
if sensitive_findings:
# Log security incident
log_security_event(
"Sensitive data detected in LLM output",
findings=sensitive_findings
)
# Redact sensitive data
safe_response = redact_sensitive_data(llm_response)
return {
"response": safe_response,
"warning": "Some content was redacted for security",
"redacted_count": len(sensitive_findings)
}
# HTML escape to prevent XSS
return {
"response": html_escape(llm_response),
"warning": None
}
π Integration
Policies: AI Policy
Skills: ai-governance, secure-development, data-classification
Frameworks: OWASP LLM Top 10 2025, ISO 27001 A.14
π Document Control
-
Version: 1.0 | Updated: 2026-02-10
-
License: Apache-2.0
-
Classification: