Module 5 Assignment: Evaluation, uncertainty, and error analysis#
Scenario#
You are advising an AI review team evaluating a proposed applied AI system before pilot deployment. The stakeholders are: technical lead, domain owner, governance reviewer, and end-user representative.
Task#
Answer the module question: How do we know whether an AI system is useful, robust, and honest?
Use the module lab and course readings to produce: AI system review package with architecture, evidence, limitations, and deployment recommendation focused on evaluation, uncertainty, and error analysis: Build an evaluation table with accuracy, error slices, and uncertainty notes..
Required Evidence#
Define the decision or system boundary in one paragraph.
Identify the dataset, proxy data, or evidence source you used: synthetic system evidence including task features, model outputs, confidence signals, and review outcomes.
Compare at least two alternatives, baselines, policies, or designs.
Report one quantitative result or structured scoring table.
Explain two failure modes and one mitigation for each.
State what additional evidence would be required before real deployment.
Submission#
Submit the completed notebook plus a 900-1200 word memo. The memo must include clear headings for context, method, evidence, risks, recommendation, and open questions.
# Assignment workspace for Module 5: Evaluation, uncertainty, and error analysis
module = 5
decision = "How do we know whether an AI system is useful, robust, and honest?"
artifact = "AI system review package with architecture, evidence, limitations, and deployment recommendation focused on evaluation, uncertainty, and error analysis: Build an evaluation table with accuracy, error slices, and uncertainty notes."
alternatives = [
{"option": "baseline_or_manual_process", "strength": "", "risk": "", "evidence": ""},
{"option": "ai_assisted_or_advanced_option", "strength": "", "risk": "", "evidence": ""},
]
recommendation = {
"decision": decision,
"recommended_option": "",
"minimum_evidence_before_pilot": [],
"monitoring_metric": "",
"rollback_trigger": "",
}
{"module": module, "artifact": artifact, "alternatives": alternatives, "recommendation": recommendation}
{'module': 5,
'artifact': 'AI system review package with architecture, evidence, limitations, and deployment recommendation focused on evaluation, uncertainty, and error analysis: Build an evaluation table with accuracy, error slices, and uncertainty notes.',
'alternatives': [{'option': 'baseline_or_manual_process',
'strength': '',
'risk': '',
'evidence': ''},
{'option': 'ai_assisted_or_advanced_option',
'strength': '',
'risk': '',
'evidence': ''}],
'recommendation': {'decision': 'How do we know whether an AI system is useful, robust, and honest?',
'recommended_option': '',
'minimum_evidence_before_pilot': [],
'monitoring_metric': '',
'rollback_trigger': ''}}
Acceptance Criteria#
Your submission is complete only if another reviewer can reproduce your reasoning from the evidence you provide. You do not need production-grade data, but you must be explicit about proxy-data limits and what would change with real institutional data.