Part VII: Practice and Teaching Kit
Appendix D

Evaluation Rubrics

Objective: Provide detailed rubrics for assessing AI product work.

D.1 AI Product Prototype Quality Rubric

Use this rubric to assess the quality of an AI product prototype.

Dimension 1 - Inadequate 2 - Developing 3 - Competent 4 - Exemplary
Problem Definition Vague or unvalidated user need Problem stated but not research-backed Problem validated with evidence Deep user insight with quantified impact
AI Integration AI used as gimmick AI added superficially AI core to value proposition AI essential and thoughtfully designed
Core Functionality Does not run or core feature missing Works partially with significant gaps Core feature works reliably Feature polished and robust
User Experience No clear UX; confusing Basic UX but friction exists Clear, intuitive interaction Delightful and accessible
Eval Coverage No automated testing Minimal test coverage Core paths covered Comprehensive with edge cases

AI product prototype quality rubric

D.2 Eval Pipeline Completeness Rubric

Component 1 - Missing 2 - Partial 3 - Complete 4 - Comprehensive
Test Dataset No dataset Under 50 examples 100+ diverse examples 500+ with expert verification
Automated Checks None Basic correctness only Correctness + safety Full quality dimensions
Baseline Metrics No baseline Single naive baseline Multiple baselines Industry benchmarks included
Regression Detection No monitoring Manual spot checks Automated on PR Continuous monitoring
Human Review Integration None Ad hoc reviews Regular sampling Systematic review program

Eval pipeline completeness rubric

D.3 RAG System Performance Rubric

Metric 1 - Poor 2 - Acceptable 3 - Good 4 - Excellent
Retrieval Precision <60% 60-75% 75-90% >90%
Retrieval Recall <50% 50-70% 70-85% >85%
Hallucination Rate >20% 10-20% 5-10% <5%
Context Relevance <40% 40-60% 60-80% >80%
Answer Faithfulness <50% 50-70% 70-85% >85%

RAG system performance rubric

D.4 UX Design for AI Products Rubric

Based on the PEEK framework (Predict, Explain, Execute, Keep in loop):

PEEK Element 1 - Missing 2 - Rudimentary 3 - Implemented 4 - Polished
Predict No transparency about AI actions Basic status indicators Clear progress and confidence signals Rich context and timing expectations
Explain No explanation of AI reasoning Minimal explanation provided Clear reasoning when requested Proactive, contextual explanations
Execute User must do everything manually AI does some work AI handles routine tasks Smart delegation with user control
Keep in Loop AI acts without user awareness Post-hoc notifications Appropriate timing of updates Seamless handoffs with context

PEEK UX design rubric for AI products

D.5 Team AI Readiness Assessment

Team AI Readiness Checklist

Technical Skills require that the team can prompt engineer effectively, understands AI limitations, can debug AI behavior, and is familiar with eval methodologies. Process Maturity requires that the team uses structured product discovery, has rapid prototyping capability, practices iterative development, and conducts regular retrospectives. Organizational readiness requires that leadership supports AI experimentation, cross-functional collaboration exists, user feedback loops are established, and risk tolerance is balanced appropriately.