At Assembled, engineering velocity is our competitive edge. We pride ourselves on delivering new features at a fast pace. But how do we maintain quality without slowing down? The answer lies in robust testing. As Martin Fowler aptly puts it:
[Testing] can drastically reduce the number of bugs that get into production… But the biggest benefit isn't about merely avoiding production bugs, it's about the confidence that you get to make changes to the system.
Martin Fowler
Despite this, writing comprehensive tests is often overlooked due to time constraints or the complexity involved. Large Language Models (LLMs) have shifted this dynamic by making it significantly easier and faster to generate robust tests. Tasks that previously required hours can now be completed in just 5–10 minutes.
We've observed tangible benefits within our team:
In this blog post, we'll explore how we’ve used LLMs to enhance our testing practices.
To get started, you'll need access to a high-quality LLM for code generation like OpenAI's o1-preview or Anthropic's Claude 3.5 Sonnet.
Then, you should craft a precise prompt that guides the model to produce the desired output. Here's a sample prompt we've found effective for generating Go unit tests:
Help me write a comprehensive set of unit tests in Golang for the following function:
<function_to_test>
// Insert your function code here
</function_to_test>
Here are the definitions of the associated structs used in the function:
<struct_definitions>
// Optionally insert any relevant struct definitions here
</struct_definitions>
Please ensure that:
- The tests use the fixture pattern by defining different test cases in a slice.
- The tests follow Go's testing best practices, including proper naming conventions and code organization.
- Use the `testing` and `require` packages as shown in the example below.
- Cover various scenarios, including normal cases, edge cases, and error handling.
<test_example>
// Include an example of a good unit test from your codebase
</test_example>
In this prompt, you need to provide:
Once you’ve dropped this into an LLM and generated a result, you might need to review and refine the generated tests. You should check for compilation issues, add any potential edge cases the LLM missed, and adjust the style to match your codebase conventions. We’ve found that a few iterations of back and forth are sometimes necessary to arrive at an acceptable test suite. Once you’re close enough, just copy and paste the resulting tests back into your codebase.
If you have an AI-assisted code editor like Copilot or Cursor, the principles remain the same; though, because tools can provide context-aware suggestions based on your existing code, you often can get away with less detailed prompts.
Suppose you're building an e-commerce platform and have a function that calculates an order summary. Here's how you might apply the above approach.
// Struct definitions
type OrderItem struct {
ProductID string
Quantity int
UnitPrice float64
Weight float64 // Weight per unit in kg
Category string
}
type OrderSummary struct {
TotalPrice float64
TotalWeight float64
ItemsByCategory map[string]int // Category name to total quantity
}
// Function to test
func CalculateOrderSummary(items []OrderItem) OrderSummary {
itemsByCategory := make(map[string]int)
totalPrice := 0.0
totalWeight := 0.0
for _, item := range items {
totalItemPrice := float64(item.Quantity) * item.UnitPrice
totalItemWeight := float64(item.Quantity) * item.Weight
totalPrice += totalItemPrice
totalWeight += totalItemWeight
itemsByCategory[item.Category] += item.Quantity
}
summary := OrderSummary{
TotalPrice: totalPrice,
TotalWeight: totalWeight,
ItemsByCategory: itemsByCategory
}
return summary
}
Using the suggested prompt, we fed this code into ChatGPT o1-preview and, in just 48 seconds, it generated a comprehensive test suite that was ready to use straight out of the box. Here’s the full prompt and results from ChatGPT.
You’ll notice that the resulting tests are both comprehensive and well written:
testify/require
library, which is prescribed in the original example.
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestCalculateOrderSummary(t *testing.T) {
fixtures := []struct {
Name string
Items []OrderItem
Expected OrderSummary
}{
...
{
Name: "Multiple items in different categories",
Items: []OrderItem{
{
ProductID: "P1",
Quantity: 2,
UnitPrice: 5.0,
Weight: 0.2,
Category: "Books",
},
{
ProductID: "P2",
Quantity: 1,
UnitPrice: 100.0,
Weight: 1.0,
Category: "Electronics",
},
},
Expected: OrderSummary{
TotalPrice: (2 * 5.0) + (1 * 100.0),
TotalWeight: (2 * 0.2) + (1 * 1.0),
ItemsByCategory: map[string]int{
"Books": 2,
"Electronics": 1,
},
},
},
...
}
for _, fixture := range fixtures {
t.Run(fixture.Name, func(t *testing.T) {
result := CalculateOrderSummary(fixture.Items)
require.Equal(t, fixture.Expected.TotalPrice, result.TotalPrice, "TotalPrice mismatch")
require.Equal(t, fixture.Expected.TotalWeight, result.TotalWeight, "TotalWeight mismatch")
require.Equal(t, fixture.Expected.ItemsByCategory, result.ItemsByCategory, "ItemsByCategory mismatch")
})
}
}
The same approach can be applied to more complex testing scenarios. By adjusting the prompt and providing a different set of baseline test cases, you can generate tests for:
At Assembled, we’ve been using LLMs to write tests for a few months now and have seen big boosts in engineering productivity. That said, there are a few considerations to keep in mind as you start using LLMs for test writing:
Using LLMs to generate comprehensive test suites in minutes has been a game changer at Assembled. It reduces the activation energy to write tests and makes it less likely that engineers skip tests due to time constraints. This has resulted in a cleaner, safer codebase that has increased development velocity.
We’ve got a lot of features to build and tests to write. If you’re interested in helping us transform customer support, check out our open roles.
Thanks to Jake Chao, Mae Cromwell, and Whitney Rose for helping with drafts of this post.