Part IV: Engineering
Chapter 18

18.1 Prompts vs Workflow Graphs

Single prompts are powerful but limited. Workflow graphs transform isolated AI calls into coherent, stateful applications that can handle complex, multi-step tasks.

The Limitation of Stateless Prompts

A stateless prompt interaction follows a simple pattern where you send a request, receive a response, and start fresh on the next interaction. This works well for discrete tasks like translation, summarization, or simple question answering. However, it breaks down when tasks require context continuity across multiple exchanges where the system must remember prior conversation history, intermediate state that accumulates during problem solving, conditional logic based on previous outcomes, and coordination between multiple AI capabilities.

The Stateless Trap

Teams often build impressive demos with stateless prompts, then discover the architecture cannot scale to production requirements. A chatbot that works perfectly in demo mode may fail spectacularly when users expect it to remember their preferences, incomplete transactions, or prior context.

Common Misconception

Workflow graphs add complexity that may not pay off. Some teams default to building elaborate workflow architectures before confirming that simple prompts cannot handle the task. Workflows are harder to debug, require more engineering investment, and introduce more failure points. Start with prompts and only add workflow structure when you encounter specific limitations that require it.

When to Choose Prompts vs Workflows

Complexity differs significantly between the two approaches. Stateless prompts handle single tasks with one response, while workflow graphs accommodate multi-step stateful processes.

Context handling follows similar logic. Prompts require all context bundled in a single prompt, whereas workflow graphs accumulate context across multiple interactions.

Failure recovery demonstrates the resilience advantage of workflow graphs. Stateless prompts require retrying the entire operation when something fails, but workflow graphs can resume from the last checkpoint, preserving intermediate progress.

Human oversight integrates differently as well. Prompts typically require minimal or no human intervention, while workflow graphs support approval gates and interruption points for human review.

Latency tolerance varies with architecture. Stateless prompts demand low latency since they involve a single call, while workflow graphs with sequential calls typically have higher tolerance for latency.

Cost predictability also differs. Stateless prompts are easier to estimate since costs are straightforward per-call calculations, while workflow graph costs vary based on the execution path taken through the graph.

Workflow Graph Fundamentals

A workflow graph represents your AI application as a directed graph where nodes are operations (AI calls, tools, human tasks) and edges define the flow between them. Key concepts include:

Nodes

Operations that produce outputs: LLM calls, tool executions, data transformations, human approval steps.

Edges

Transitions between nodes, often conditional based on previous outputs or state.

State

Data accumulated as the workflow progresses, shared between nodes along execution paths.

Checkpoints

Saved points in execution that enable resumption after interruption.


# Simple workflow graph representation
class WorkflowGraph:
    def __init__(self):
        self.nodes = {}
        self.edges = []
        self.state = {}
    
    def add_node(self, name, operation):
        self.nodes[name] = operation
    
    def add_edge(self, from_node, to_node, condition=None):
        self.edges.append({
            'from': from_node,
            'to': to_node,
            'condition': condition  # Optional routing function
        })
    
    def execute(self, start_node, initial_input):
        current_node = start_node
        while current_node:
            result = self.nodes[current_node](self.state, initial_input)
            current_node = self.route(result)
        return self.state

# DataForge example: document processing workflow
def document_processing_workflow():
    wf = WorkflowGraph()
    
    wf.add_node('classify', classify_document)
    wf.add_node('extract', extract_entities)
    wf.add_node('validate', validate_extraction)
    wf.add_node('store', store_results)
    wf.add_node('notify', send_notification)
    
    wf.add_edge('classify', 'extract')
    wf.add_edge('extract', 'validate')
    wf.add_edge('validate', 'store', condition=lambda s: s['valid'])
    wf.add_edge('validate', 'notify', condition=lambda s: not s['valid'])
    wf.add_edge('store', 'notify')
    
    return wf
            

Benefits of Structured Workflow Graphs

Predictability and Testability

Workflow graphs are explicit about execution paths, making it possible to test each node independently and verify edge conditions. You can enumerate all possible paths through your application.

Observability

Each node can emit logs, metrics, and traces. You know exactly where a workflow is executing and can debug failures with precision.

Resumability

State is preserved at checkpoints. If a workflow fails midway, you can resume from the last successful checkpoint rather than restarting entirely.

Human-in-the-Loop

Approval gates and interruption points integrate naturally. Human review becomes another node type in the graph.

Composability

Sub-workflows can be encapsulated and reused. Common patterns like approval-then-continue become reusable components.

DataForge: Multi-Stage Document Processing

DataForge processes incoming documents through a workflow graph that classifies content, extracts structured data, validates results against business rules, and routes for human review when confidence is low. The workflow graph ensures each document is processed consistently while allowing exceptions to be handled appropriately.

When the extraction confidence falls below 0.85, the workflow automatically pauses for human review rather than proceeding with potentially incorrect data.

Running Product: QuickShip Exception Handling Workflow

QuickShip built their exception handling system as a workflow graph with four stages: classify exception type, retrieve relevant context, generate resolution options, and present to human or auto-resolve.

When they initially built this as stateless prompts, the system would forget context across stages and make inconsistent decisions. Converting to a workflow graph with state preserved at each node solved this: the classifier's output feeds directly into the context retriever, ensuring consistent understanding throughout the workflow.

The workflow graph also enabled checkpointing: if a customer escalated mid-resolution, the agent could resume from the last checkpoint rather than starting over. Customer resolution time dropped from 8 minutes to 90 seconds for complex exceptions.

Hybrid Approaches

Most production AI applications benefit from combining prompts and workflows. Use prompts for flexible natural language generation within well-defined boundaries, rapid prototyping before committing to workflow structure, and components that genuinely benefit from maximum flexibility. Use workflow graphs for critical business logic that must execute consistently, multi-step processes with state dependencies, integration points requiring human oversight, and error recovery and retry logic.

Key Takeaway

Choose workflow graphs when reliability, observability, and resumability matter. Choose prompts when flexibility matters more than predictability. Most production systems use both.