Model selection and routing decisions activate all three disciplines: AI PM defines performance requirements, cost constraints, and quality thresholds that guide model choice; Vibe-Coding tests different models and routing strategies against real workloads to find the best performance per dollar; AI Engineering implements the selection logic, fallback chains, and monitoring that ensure the right model serves each request.
Vibe-coding lets you test model behavior across different providers and versions before building routing infrastructure. Quickly compare how different models handle your specific task types, edge cases, and failure modes. Vibe-coding model variants reveals which models genuinely excel at your use cases versus which merely seem adequate in theory, enabling data-driven routing decisions rather than guesswork.
PM decisions shaped by model routing include: Which tasks require premium model quality versus cost efficiency? How do routing failures affect user experience? What latency guarantees are acceptable for different task types? PMs should define quality thresholds for different task criticality levels and establish budgets that align with user value. The 60% cost reduction from intelligent routing only materializes if product requirements clearly specify where quality can be sacrificed for speed and cost.
Objective: Master model selection, routing strategies, and capability-based allocation.
Chapter Overview
This chapter covers the engineering decisions that determine how models are selected, routed, and allocated to tasks. Model selection involves understanding open versus closed models, size versus capability trade-offs, and task-model matching. Model routers direct requests to appropriate models based on task requirements, cost constraints, and quality targets. Ensembles and specialization combine multiple models for better results than any single model. Structured outputs and tool compatibility enable reliable integration with external systems. The chapter concludes with latency, cost, and quality trade-offs that guide optimization priorities.
Four Questions This Chapter Answers
- What are we trying to learn? How to match model capabilities to task requirements while optimizing for latency, cost, and quality trade-offs specific to our product.
- What is the fastest prototype that could teach it? A routing experiment sending the same requests to different models and comparing results, cost, and latency.
- What would count as success or failure? A routing strategy that consistently sends requests to the cheapest model that can handle them adequately.
- What engineering consequence follows from the result? Model routing is a high-leverage optimization; intelligent routing can dramatically reduce costs while maintaining quality.
Learning Objectives
- Select appropriate models for specific tasks and constraints
- Design intelligent routing strategies for multi-model systems
- Build ensembles and specialized models for improved quality
- Implement structured outputs and tool calling reliably
- Optimize for latency, cost, and quality trade-offs