Objective: Provide detailed rubrics for assessing AI product work.
D.1 AI Product Prototype Quality Rubric
Use this rubric to assess the quality of an AI product prototype.
| Dimension | 1 - Inadequate | 2 - Developing | 3 - Competent | 4 - Exemplary |
|---|---|---|---|---|
| Problem Definition | Vague or unvalidated user need | Problem stated but not research-backed | Problem validated with evidence | Deep user insight with quantified impact |
| AI Integration | AI used as gimmick | AI added superficially | AI core to value proposition | AI essential and thoughtfully designed |
| Core Functionality | Does not run or core feature missing | Works partially with significant gaps | Core feature works reliably | Feature polished and robust |
| User Experience | No clear UX; confusing | Basic UX but friction exists | Clear, intuitive interaction | Delightful and accessible |
| Eval Coverage | No automated testing | Minimal test coverage | Core paths covered | Comprehensive with edge cases |
AI product prototype quality rubric
D.2 Eval Pipeline Completeness Rubric
| Component | 1 - Missing | 2 - Partial | 3 - Complete | 4 - Comprehensive |
|---|---|---|---|---|
| Test Dataset | No dataset | Under 50 examples | 100+ diverse examples | 500+ with expert verification |
| Automated Checks | None | Basic correctness only | Correctness + safety | Full quality dimensions |
| Baseline Metrics | No baseline | Single naive baseline | Multiple baselines | Industry benchmarks included |
| Regression Detection | No monitoring | Manual spot checks | Automated on PR | Continuous monitoring |
| Human Review Integration | None | Ad hoc reviews | Regular sampling | Systematic review program |
Eval pipeline completeness rubric
D.3 RAG System Performance Rubric
| Metric | 1 - Poor | 2 - Acceptable | 3 - Good | 4 - Excellent |
|---|---|---|---|---|
| Retrieval Precision | <60% | 60-75% | 75-90% | >90% |
| Retrieval Recall | <50% | 50-70% | 70-85% | >85% |
| Hallucination Rate | >20% | 10-20% | 5-10% | <5% |
| Context Relevance | <40% | 40-60% | 60-80% | >80% |
| Answer Faithfulness | <50% | 50-70% | 70-85% | >85% |
RAG system performance rubric
D.4 UX Design for AI Products Rubric
Based on the PEEK framework (Predict, Explain, Execute, Keep in loop):
| PEEK Element | 1 - Missing | 2 - Rudimentary | 3 - Implemented | 4 - Polished |
|---|---|---|---|---|
| Predict | No transparency about AI actions | Basic status indicators | Clear progress and confidence signals | Rich context and timing expectations |
| Explain | No explanation of AI reasoning | Minimal explanation provided | Clear reasoning when requested | Proactive, contextual explanations |
| Execute | User must do everything manually | AI does some work | AI handles routine tasks | Smart delegation with user control |
| Keep in Loop | AI acts without user awareness | Post-hoc notifications | Appropriate timing of updates | Seamless handoffs with context |
PEEK UX design rubric for AI products
D.5 Team AI Readiness Assessment
Technical Skills require that the team can prompt engineer effectively, understands AI limitations, can debug AI behavior, and is familiar with eval methodologies. Process Maturity requires that the team uses structured product discovery, has rapid prototyping capability, practices iterative development, and conducts regular retrospectives. Organizational readiness requires that leadership supports AI experimentation, cross-functional collaboration exists, user feedback loops are established, and risk tolerance is balanced appropriately.