14.2 Test Generation - AI-Powered Products

AI-generated code needs tests just like human-written code. The difference is that AI can also help write the tests.

Testing AI-Generated Code (see evals)

AI-generated code requires testing for two reasons: to verify correctness and to understand what the code does. Writing tests forces you to understand the generated code well enough to specify expected behavior.

The testing approach should differ from traditional development because AI-generated code may have unexpected behaviors that unit tests would not catch. Integration tests and property-based tests are particularly valuable.

Testing Priorities for AI Code

When testing AI-generated code, prioritize in this order. Integration tests come first to verify the code works with real dependencies and external systems. Edge case tests come second to cover boundary conditions and unusual inputs that AI-generated code often handles poorly. Property tests come third to verify invariants that should hold regardless of input. Unit tests come fourth to cover individual functions with clear responsibilities.

Using AI for Test Generation

AI can help write tests for AI-generated code. This is not circular; the AI that generated the code may not be the best choice for testing it. Use a different AI perspective or a human to test.

AI Test Generation Approaches

AI can help write tests using several approaches. Coverage-guided generation creates tests to cover existing code paths based on what code exists. Specification-based generation creates tests from specifications or documentation, deriving test cases from stated requirements. Property-based generation creates tests that verify invariants, checking that certain properties always hold regardless of input. Fuzzing generates random inputs to find crashes and unexpected behaviors.

QuickShip: Test Generation for Classification

QuickShip used AI to generate tests for their email classification code:

Unit tests: AI generated tests for the classification function with various email inputs

Property tests: AI generated tests verifying that classification always returns a valid category and confidence between 0 and 1

Edge case tests: AI identified edge cases: empty email, very long email, emails with no delivery terms, emails with multiple exception types

Integration tests: AI generated tests for the full classification pipeline with mock email data

What to Test

Happy Path Tests

Verify the code works for typical, expected inputs. These establish baseline functionality.

Edge Case Tests

Verify the code handles unusual inputs gracefully. AI-generated code is often weak on edge cases because training data skews toward common cases.

Error Handling Tests

Verify the code fails gracefully when things go wrong: invalid inputs, service unavailability, resource constraints.

Property Tests

Property tests verify invariants that should always hold regardless of input. These include output being always valid JSON, classification always being one of the valid categories, confidence always being between 0 and 1, and the function being idempotent such that calling it twice produces the same result.

Test Generation Prompt

"Generate tests for [function/module] that cover: [coverage targets]. Include edge cases and error conditions. Use [test framework]."

Coverage Targets

For production readiness, aim for meaningful coverage, not necessarily 100% coverage. Coverage metrics can be gamed; what matters is that tests verify behavior.

Coverage Guidelines

Coverage targets vary by code importance. Critical path code should achieve 90%+ coverage because these features must work reliably. Business logic should have 80%+ coverage because complex decisions must be tested thoroughly. Integration coverage should focus on key flows covered with complete coverage not always practical. Utilities should focus on complex utilities while simple helpers can have lower coverage.

Test Maintenance

Tests require maintenance as code evolves, and AI-generated tests are not exempt from this reality. Build test maintenance into your workflow by updating affected tests when code changes, determining whether the change or the test is wrong when tests fail after code modifications, periodically reviewing test quality to remove brittle or low-value tests, and using test coverage as a guide rather than a rigid target.

Key Takeaways

AI-generated code needs tests to verify correctness and to understand the behavior of code you did not write yourself. Testing priorities are integration tests, edge cases, property tests, and unit tests in that order. AI can help write tests for AI-generated code, though using a different AI perspective than the one that generated the code often produces better results. Test coverage targets vary by code importance, focusing on critical path code while accepting lower coverage for utilities and less critical features. Tests require maintenance as code evolves, so build test maintenance into your workflow rather than treating it as optional.

Exercise: Testing a Vibe-Coded Function

Apply testing to a vibe-coded function by following a systematic process. First, identify the function's expected behavior by understanding what inputs produce what outputs. Second, generate edge case tests covering boundary conditions, unusual inputs, and extreme values. Third, generate property tests verifying invariants that should always hold. Fourth, run tests and fix any failures, determining whether the code or the test needs adjustment. Fifth, evaluate coverage and add tests for any gaps identified.

What's Next

In Section 14.3, we examine Architectural Hardening, exploring how to improve the structure of AI-generated code.