All Lessons Course Details All Courses Enroll
Courses/ ISACA AAISM Certification Prep/ Day 16
Day 16 of 18

Security Controls and Monitoring for Deployed AI

⏱ 18 min 📊 Advanced ISACA AAISM Certification Prep

Deployment is where governance becomes operational. Today we cover the monitoring stack for production AI, controls for specific attack types, and how to design alert thresholds and escalation procedures that actually work.

The AI monitoring stack

Production AI monitoring extends beyond traditional application monitoring:

What to monitor:

- Model performance — Accuracy, precision, recall, F1 score. Track against baseline established at deployment.

- Data drift — Changes in input data distribution compared to training data. Drift indicates the model may be operating outside its validated parameters.

- Output distribution — Changes in the pattern of model outputs. A fraud detection model suddenly flagging 50% more transactions needs investigation.

- Fairness metrics — Ongoing monitoring for demographic disparity in model outcomes.

- Latency and throughput — Model inference performance. Degradation may indicate resource issues or attacks.

- Security events — Unusual access patterns, rate anomalies, potential adversarial inputs.

How often to monitor:

- Real-time: Security events, latency, availability

- Daily: Output distribution, performance trends

- Weekly: Data drift, fairness metrics

- Monthly: Comprehensive performance review against baselines

AI monitoring stack table showing metrics, frequency, and reviewer for each monitoring dimension
The complete AI monitoring stack — from real-time security events to monthly compliance reviews.

Controls by attack type

Implement controls targeted at specific AI attack vectors:

Prompt injection (for LLMs) — Input sanitization, system prompt protection, prompt firewalls that detect injection patterns, output filtering to prevent information leakage from successful injections.

Data poisoning — Data source validation, anomaly detection on training data, training pipeline access controls, model validation after retraining.

Model evasion — Adversarial robustness testing before deployment, input anomaly detection, ensemble methods that are harder to evade, monitoring for systematic evasion patterns.

Model extraction — API rate limiting, query pattern monitoring (detecting systematic probing), watermarking model outputs, limiting output precision (don't return exact probabilities).

Membership inference — Differential privacy in training, output perturbation, limiting the information returned with predictions.

For each attack type, implement preventive controls (stop the attack) and detective controls (detect the attack if prevention fails).

Knowledge Check
Monitoring detects an unusual pattern: thousands of carefully crafted queries to your AI model API from a single source, each slightly different. The queries seem designed to map the model's decision boundaries. What attack is MOST likely being attempted?
**Model extraction.** Systematic probing with carefully crafted queries that map decision boundaries is the signature of a model extraction attack. The attacker is trying to build a replica of your model by observing its responses to varied inputs. Controls: rate limiting, query pattern detection, output perturbation.

Performance drift monitoring

Performance drift is one of the most common and insidious AI risks:

Types of drift:

- Data drift — Input data distribution changes over time. Example: a model trained on pre-pandemic data encounters post-pandemic patterns.

- Concept drift — The relationship between inputs and correct outputs changes. Example: what constitutes "normal" network behavior evolves as infrastructure changes.

- Model decay — Gradual degradation in model performance without a specific cause. Entropy over time.

Monitoring approach:

- Establish baseline metrics at deployment

- Define acceptable drift ranges (thresholds)

- Monitor continuously against baselines

- Alert when thresholds are breached

- Investigate root cause before retraining

Key principle: Drift doesn't always mean the model is wrong — it means the model is operating outside its validated parameters. Investigation is needed before action.

Alert thresholds and escalation

Effective alerting prevents both alert fatigue and missed incidents:

Threshold design:

- Warning threshold — Early indicator of potential issue. Triggers investigation but not immediate action. Example: model accuracy drops 2% from baseline.

- Action threshold — Significant deviation requiring response. Triggers predefined response procedure. Example: model accuracy drops 5% from baseline.

- Critical threshold — Severe deviation requiring immediate action. Triggers containment procedures. Example: model accuracy drops 10% or fairness metric breach.

Escalation matrix:

- Warning → ML engineering team for investigation

- Action → Security manager + ML lead for response decision

- Critical → CISO + business stakeholders for containment and communication

Alert fatigue prevention:

- Set thresholds based on statistical significance, not arbitrary numbers

- Use trend analysis in addition to point-in-time thresholds

- Correlate alerts across multiple metrics before escalating

- Review and adjust thresholds quarterly based on operational experience

Knowledge Check
A production AI model's accuracy has been slowly declining: 94% → 93% → 92% → 91% over four months. Each monthly drop is below the 2% warning threshold. The cumulative decline from baseline is now 3%. How should this be handled?
**Trend monitoring catches gradual drift.** Individual period thresholds alone miss slow degradation. Monitoring should include cumulative drift from baseline, not just period-over-period changes. The investigation determines whether this is normal decay requiring retraining or a systematic issue requiring deeper analysis.

GenAI-specific controls

Generative AI requires controls beyond traditional AI monitoring:

Output filtering — Real-time screening of generated content for harmful, biased, inappropriate, or confidential content. Multiple filtering layers: keyword matching, semantic analysis, and policy-based rules.

Content classification — Classify generated outputs by risk level. Automated decisions about which outputs can be delivered directly and which require human review.

Conversation monitoring — For chatbots and conversational AI: monitor conversation patterns, detect attempts to manipulate the AI, and flag conversations that approach policy boundaries.

Hallucination detection — Monitor for factually incorrect but confidently stated outputs. Compare generated claims against verified sources where possible.

Audit logging — Log all inputs and outputs (with appropriate privacy controls). Essential for incident investigation, compliance, and understanding how the AI is being used.

Final Check
An organization has deployed comprehensive monitoring for their AI systems but receives over 500 alerts per week, of which fewer than 5% result in actual issues. What is the MOST effective improvement?
**Smarter alerting, not less monitoring.** Reducing monitoring coverage or raising all thresholds creates blind spots. Alert correlation (requiring multiple metrics to align) and statistical significance testing reduces false positives while maintaining detection capability. This is the same principle behind effective SIEM tuning.
📡
Day 16 Complete
"Monitoring isn't just watching dashboards — it's a system of defined thresholds, escalation procedures, and trend analysis designed to catch both sudden failures and gradual drift."
Next Lesson
Domain 3 Capstone: Controls Assessment