Today's capstone integrates everything from Domain IV. You'll work through building a monitoring and incident runbook for a deployed AI system and answer 10 scenario-based practice questions.
Background: MedTech Solutions has deployed SafeScreen AI, a high-risk AI system that screens mammography images for potential breast cancer. The system provides a risk score (0–100) and a recommendation (further review, routine follow-up, or clear). It operates in hospitals across the EU and US.
Deployment model: Human-on-the-loop — radiologists review all cases flagged as "further review" (score > 70) before patient notification. Cases scored "clear" (score < 30) proceed through the standard workflow. Cases in the middle range (30–70) are queued for radiologist review within 48 hours.
Current state: The system has been deployed for 3 months. No formal monitoring framework or incident response plan exists.
Define the monitoring framework for SafeScreen AI:
Performance KPIs:
- Sensitivity (true positive rate): Baseline 94.5%, threshold: never below 92%
- Specificity (true negative rate): Baseline 88.2%, threshold: never below 85%
- False negative rate: Baseline 5.5%, threshold: alert above 6%, halt above 8%
- Processing time: Baseline 12 seconds, threshold: alert above 30 seconds
Fairness metrics:
- Sensitivity by age group (under 40, 40–60, over 60): gap threshold ≤ 3%
- Sensitivity by ethnicity: gap threshold ≤ 3%
- False negative rate by demographics: gap threshold ≤ 2%
Drift detection:
- Input data distribution comparison: weekly statistical tests
- Score distribution monitoring: flag if mean risk score shifts more than 10%
- Confidence score monitoring: flag if average confidence drops below 80%
Alert levels:
- Green: all metrics within normal parameters
- Yellow: one or more metrics approaching thresholds — investigate within 24 hours
- Red: thresholds exceeded — escalate immediately, consider system halt
In 2020, the UK exam grading controversy demonstrated the catastrophic consequences of deploying a high-risk AI system without adequate monitoring or incident response. When COVID-19 cancelled A-level exams, the Office of Qualifications and Examinations Regulation (Ofqual) deployed an algorithm to predict students' grades based on their schools' historical performance and teachers' predicted grades. The algorithm systematically downgraded nearly 40% of teacher-predicted grades, disproportionately affecting students at state schools and in disadvantaged areas while inflating grades at elite private schools. The system effectively perpetuated socioeconomic inequality at scale, affecting university admissions for hundreds of thousands of students.
The governance failures in this case map directly to the SafeScreen AI capstone scenario. First, there was no adequate monitoring framework — Ofqual did not track outcomes by school type or socioeconomic indicators in real time. Second, the incident response was disastrously slow: despite immediate public outcry and clear evidence of disparate impact, Ofqual initially defended the algorithm for over a week before the UK government reversed the grades and reverted to teacher predictions. Third, the system lacked a rollback plan — the reversion to teacher predictions was an emergency measure, not a pre-planned fallback. The eventual U-turn affected university admissions that had already been processed, creating cascading administrative chaos.
For the AIGP exam, the UK grading algorithm case is a powerful example of why monitoring frameworks must include disaggregated fairness metrics, why incident response plans must define severity levels and response timelines before deployment, and why rollback procedures must be tested and ready. An AIGP auditor reviewing this system would have flagged the same critical gap identified in Scenario Question 10: the absence of an incident response plan for a high-risk AI system affecting fundamental rights.
Want to see these concepts applied to full case studies? Check out AIGP Scenarios — 10 real-world governance simulations mapped to the AIGP exam domains.