Appendix D: Evaluation Rubrics - AI-Powered Products

Objective: Provide detailed rubrics for assessing AI product work.

D.1 AI Product Prototype Quality Rubric

Use this rubric to assess the quality of an AI product prototype.

Dimension	1 - Inadequate	2 - Developing	3 - Competent	4 - Exemplary
Problem Definition	Vague or unvalidated user need	Problem stated but not research-backed	Problem validated with evidence	Deep user insight with quantified impact
AI Integration	AI used as gimmick	AI added superficially	AI core to value proposition	AI essential and thoughtfully designed
Core Functionality	Does not run or core feature missing	Works partially with significant gaps	Core feature works reliably	Feature polished and robust
User Experience	No clear UX; confusing	Basic UX but friction exists	Clear, intuitive interaction	Delightful and accessible
Eval Coverage	No automated testing	Minimal test coverage	Core paths covered	Comprehensive with edge cases

AI product prototype quality rubric

D.2 Eval Pipeline Completeness Rubric

Component	1 - Missing	2 - Partial	3 - Complete	4 - Comprehensive
Test Dataset	No dataset	Under 50 examples	100+ diverse examples	500+ with expert verification
Automated Checks	None	Basic correctness only	Correctness + safety	Full quality dimensions
Baseline Metrics	No baseline	Single naive baseline	Multiple baselines	Industry benchmarks included
Regression Detection	No monitoring	Manual spot checks	Automated on PR	Continuous monitoring
Human Review Integration	None	Ad hoc reviews	Regular sampling	Systematic review program

Eval pipeline completeness rubric

D.3 RAG System Performance Rubric

Metric	1 - Poor	2 - Acceptable	3 - Good	4 - Excellent
Retrieval Precision	<60%	60-75%	75-90%	>90%
Retrieval Recall	<50%	50-70%	70-85%	>85%
Hallucination Rate	>20%	10-20%	5-10%	<5%
Context Relevance	<40%	40-60%	60-80%	>80%
Answer Faithfulness	<50%	50-70%	70-85%	>85%

RAG system performance rubric

D.4 UX Design for AI Products Rubric

Based on the PEEK framework (Predict, Explain, Execute, Keep in loop):

PEEK Element	1 - Missing	2 - Rudimentary	3 - Implemented	4 - Polished
Predict	No transparency about AI actions	Basic status indicators	Clear progress and confidence signals	Rich context and timing expectations
Explain	No explanation of AI reasoning	Minimal explanation provided	Clear reasoning when requested	Proactive, contextual explanations
Execute	User must do everything manually	AI does some work	AI handles routine tasks	Smart delegation with user control
Keep in Loop	AI acts without user awareness	Post-hoc notifications	Appropriate timing of updates	Seamless handoffs with context

PEEK UX design rubric for AI products

D.5 Team AI Readiness Assessment

Team AI Readiness Checklist

Technical Skills require that the team can prompt engineer effectively, understands AI limitations, can debug AI behavior, and is familiar with eval methodologies. Process Maturity requires that the team uses structured product discovery, has rapid prototyping capability, practices iterative development, and conducts regular retrospectives. Organizational readiness requires that leadership supports AI experimentation, cross-functional collaboration exists, user feedback loops are established, and risk tolerance is balanced appropriately.