AI-generated code needs tests just like human-written code. The difference is that AI can also help write the tests.
Testing AI-Generated Code (see evals)
AI-generated code requires testing for two reasons: to verify correctness and to understand what the code does. Writing tests forces you to understand the generated code well enough to specify expected behavior.
The testing approach should differ from traditional development because AI-generated code may have unexpected behaviors that unit tests would not catch. Integration tests and property-based tests are particularly valuable.
When testing AI-generated code, prioritize in this order. Integration tests come first to verify the code works with real dependencies and external systems. Edge case tests come second to cover boundary conditions and unusual inputs that AI-generated code often handles poorly. Property tests come third to verify invariants that should hold regardless of input. Unit tests come fourth to cover individual functions with clear responsibilities.
Using AI for Test Generation
AI can help write tests for AI-generated code. This is not circular; the AI that generated the code may not be the best choice for testing it. Use a different AI perspective or a human to test.
AI Test Generation Approaches
AI can help write tests using several approaches. Coverage-guided generation creates tests to cover existing code paths based on what code exists. Specification-based generation creates tests from specifications or documentation, deriving test cases from stated requirements. Property-based generation creates tests that verify invariants, checking that certain properties always hold regardless of input. Fuzzing generates random inputs to find crashes and unexpected behaviors.
QuickShip used AI to generate tests for their email classification code:
Unit tests: AI generated tests for the classification function with various email inputs
Property tests: AI generated tests verifying that classification always returns a valid category and confidence between 0 and 1
Edge case tests: AI identified edge cases: empty email, very long email, emails with no delivery terms, emails with multiple exception types
Integration tests: AI generated tests for the full classification pipeline with mock email data
What to Test
Happy Path Tests
Verify the code works for typical, expected inputs. These establish baseline functionality.
Edge Case Tests
Verify the code handles unusual inputs gracefully. AI-generated code is often weak on edge cases because training data skews toward common cases.
Error Handling Tests
Verify the code fails gracefully when things go wrong: invalid inputs, service unavailability, resource constraints.
Property Tests
Property tests verify invariants that should always hold regardless of input. These include output being always valid JSON, classification always being one of the valid categories, confidence always being between 0 and 1, and the function being idempotent such that calling it twice produces the same result.
"Generate tests for [function/module] that cover: [coverage targets]. Include edge cases and error conditions. Use [test framework]."
Coverage Targets
For production readiness, aim for meaningful coverage, not necessarily 100% coverage. Coverage metrics can be gamed; what matters is that tests verify behavior.
Coverage targets vary by code importance. Critical path code should achieve 90%+ coverage because these features must work reliably. Business logic should have 80%+ coverage because complex decisions must be tested thoroughly. Integration coverage should focus on key flows covered with complete coverage not always practical. Utilities should focus on complex utilities while simple helpers can have lower coverage.
Test Maintenance
Tests require maintenance as code evolves, and AI-generated tests are not exempt from this reality. Build test maintenance into your workflow by updating affected tests when code changes, determining whether the change or the test is wrong when tests fail after code modifications, periodically reviewing test quality to remove brittle or low-value tests, and using test coverage as a guide rather than a rigid target.
Key Takeaways
AI-generated code needs tests to verify correctness and to understand the behavior of code you did not write yourself. Testing priorities are integration tests, edge cases, property tests, and unit tests in that order. AI can help write tests for AI-generated code, though using a different AI perspective than the one that generated the code often produces better results. Test coverage targets vary by code importance, focusing on critical path code while accepting lower coverage for utilities and less critical features. Tests require maintenance as code evolves, so build test maintenance into your workflow rather than treating it as optional.
Apply testing to a vibe-coded function by following a systematic process. First, identify the function's expected behavior by understanding what inputs produce what outputs. Second, generate edge case tests covering boundary conditions, unusual inputs, and extreme values. Third, generate property tests verifying invariants that should always hold. Fourth, run tests and fix any failures, determining whether the code or the test needs adjustment. Fifth, evaluate coverage and add tests for any gaps identified.
What's Next
In Section 14.3, we examine Architectural Hardening, exploring how to improve the structure of AI-generated code.