I Built an AI QA Assistant That Writes Test Cases in 15 Seconds (And It's Better Than Me)
Let me tell you about the automation that's been saving my QA team 23 hours per day. Yes, you read that right - not a typo.
I've spent the last few months building an AI-powered test case generator using n8n and OpenAI. Today I'm sharing the complete setup, lessons learned, and why this might be the most practical QA automation you'll implement this year.
TL;DR: What We're Building
- n8n workflow that accepts 6+ input formats (natural language, cURL, Postman, Swagger, JIRA formats)
- OpenAI GPT-4o-mini integration generating 5-10 production-ready test cases per feature
- MongoDB memory for contextual awareness across conversations
- Google Sheets export for instant collaboration with your team
- Real metrics: 15-25 seconds per feature vs. 30-60 minutes manual writing
- Cost: ~$2/month for 1,500 test cases (or $22 if using n8n Cloud)
Let's build this thing! 🚀
The Problem: Manual Test Case Hell
Picture this: It's sprint planning, and you're staring at 20 user stories that need comprehensive test cases by end of week. Each one takes 30-60 minutes to write properly:
- Read and understand the requirement (5 mins)
- Draft happy path scenarios (10 mins)
- Think through edge cases (15 mins)
- Document negative scenarios (10 mins)
- Add security considerations (10 mins)
- Format everything consistently (10 mins)
Total: 60 minutes × 20 stories = 20 hours of soul-crushing work.
And here's the kicker - you still miss stuff. I once spent 45 minutes writing test cases for a login feature and completely forgot to test SQL injection attempts. The penetration testing team found it in 5 minutes. 🤦
Why This Solution Actually Works
I've tried "AI test case generators" before. Most produce generic garbage like:
Test Case 1: Verify the feature works
Steps: 1. Use the feature
Expected: It should workThanks, AI. Super helpful. 🙄
This workflow is different because it:
- Understands context through MongoDB memory
- Accepts multiple input formats (because real teams use different tools)
- Follows actual QA best practices (not made-up nonsense)
- Generates realistic test data (not "test123" everywhere)
- Exports to Google Sheets (where your team actually works)
The Architecture: How It All Fits Together
Here's the complete flow (and trust me, each piece matters):
User Input (6 formats supported)
↓
Data Extraction Agent (GPT-4o-mini, temp: 0.1)
↓
Parse & Validate JSON
↓
Test Case Generator Agent (GPT-4o-mini, temp: 0.2)
↓
Transform for Google Sheets
↓
Append to Sheet (auto-mapped columns)
↓
Success Response with metricsKey Design Decisions:
- Two separate AI agents: One for extraction (deterministic), one for generation (slightly creative)
- MongoDB memory: Maintains conversation context so you can say "add security tests" and it knows what feature you're talking about
- Error handling at every step: Because AI is probabilistic and things break
- Temperature tuning: 0.1 for extraction (consistent), 0.2 for generation (varied but controlled)
Note: I tried using a single AI agent initially. It produced inconsistent results because extraction needs to be deterministic while generation benefits from slight creativity. Two agents with different temperature settings solved this completely.
Step-by-Step Setup (15 Minutes)
Prerequisites
You'll need:
- n8n (cloud or self-hosted - I use self-hosted on a $5 DigitalOcean droplet)
- OpenAI API key (get one here)
- MongoDB Atlas (free tier works perfectly)
- Google account (for Sheets integration)
Step 1: MongoDB Setup (3 minutes)
- Create free MongoDB Atlas account
- Create new cluster (M0 free tier)
- Create database:
n8n_db - Create two collections:
n8n_extractor_memoryn8n_generator_memory
- Copy connection string (we'll need this later)
Pro tip: Whitelist all IPs (0.0.0.0/0) in network access if using n8n Cloud. For self-hosted, whitelist your server's IP.
Step 2: Google Sheets Setup (2 minutes)
- Create new Google Sheet: "QA Test Cases"
- Add these exact headers in Row 1:
Test_ID | Title | Type | Priority | Preconditions | Test_Steps | Expected_Result | Test_Data | Flow_Type | Created_At | Status - Note the Sheet ID from URL:
https://docs.google.com/spreadsheets/d/YOUR_SHEET_ID/edit
Step 3: Import n8n Workflow (2 minutes)
- Download
workflow_fully_fixed_v2.jsonfrom the repo - In n8n: Workflows → Import from File
- Select the JSON file
- Click Import
You should now see this beast of a workflow:
Step 4: Configure Credentials (5 minutes)
OpenAI API
Credential Name: OpenAi account
Type: OpenAI API
API Key: sk-proj-xxxxx...Cost check: GPT-4o-mini is crazy cheap:
- Input: $0.15 per 1M tokens
- Output: $0.60 per 1M tokens
- Average per test case: ~$0.001 (one-tenth of a cent!)
MongoDB
Credential Name: MongoDB account
Type: MongoDB
Connection String: mongodb+srv://username:password@cluster.mongodb.net/
Database Name: n8n_dbTest the connection before proceeding. I've wasted hours debugging other issues when the connection string was wrong. 🙃
Google Sheets
Two options here:
Option A: OAuth2 (Quick setup, personal use)
- In n8n, add Google Sheets OAuth2 credential
- Authorize with your Google account
- Done!
Option B: Service Account (Production, team use)
- Create Google Cloud project
- Enable Sheets API
- Create service account
- Download JSON key
- In n8n, add Google Service Account credential
- Upload JSON key
I use Option B for production because it doesn't require re-authorization every few months.
Step 5: Configure Workflow Nodes (3 minutes)
- Open the "Append to Google Sheets" node
- Set your Sheet ID
- Set Sheet Name:
Sheet1(or whatever you named it) - Save workflow
Activate the workflow using the toggle at top-right. You're now live! 🎉
Real-World Usage: 6 Input Formats That Actually Work
The magic is that this workflow accepts whatever format you're already using. No need to convert everything to some proprietary format.
Format 1: Natural Language (Most Common)
Create test cases for user registration. Users should register
with email and password. Password must be 8+ characters with
1 uppercase, 1 number. Email verification required within 24 hours.Output: 8 test cases covering happy path, validation rules, email verification, security, and edge cases.
Format 2: cURL Commands (For API Testing)
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-H "Authorization: Bearer token123" \
-d '{
"name": "John Doe",
"email"\: "john@example.com",
"age": 25
}'Output: 6 test cases covering request validation, auth checks, response codes, error handling.
Format 3: Postman Collections (Copy-Paste)
Just paste your Postman collection JSON directly:
{
"info": {
"name": "User API"
},
"item": [
{
"name": "Create User",
"request": {
"method": "POST",
"url": "{{base_url}}/api/users",
"body": {
"mode": "raw",
"raw": "{\"name\":\"test\",\"email\":\"test@test.com\"}"
}
}
}
]
}Output: Test cases automatically extracted from request/response structure.
Format 4: Swagger/OpenAPI Specs (Best for API)
openapi: 3.0.0
paths:
/orders:
post:
summary: Create order
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
user_id:
type: integer
items:
type: array
total:
type: numberOutput: Comprehensive API test cases with schema validation.
Format 5: JIRA Test Format (API)
Story: USER-123
API Endpoint: POST /api/orders
Description: Creates new order
Request:
- Headers: Authorization (required)
- Body: { "user_id": int, "items": array, "total": float }
Response:
- Status: 201 Created
- Body: { "order_id": int, "status": "pending" }
Security: OAuth 2.0 requiredFormat 6: JIRA Test Format (UI)
Story: USER-456
As a customer
I want to filter products by category
So that I can find items quickly
Acceptance Criteria:
1. Multiple categories can be selected
2. Results update in real-time
3. Filters persist on page refresh
Preconditions:
- User logged in
- Products exist
UI Elements:
- Category dropdown
- Filter button
- Clear buttonThe workflow automatically detects which format you're using and adjusts its extraction logic. No need to tell it "this is a cURL command" - it figures it out.
What You Actually Get: Test Case Quality
Here's a real example from our production system. I gave it this input:
API: POST /api/auth/login
Body: { "email": "string", "password": "string" }
Returns JWT token on success.
Rate limit: 5 attempts per minute.
Session timeout: 15 minutes.Generated Test Cases (Sample)
TC001: Verify successful login with valid credentials
Type: happy
Priority: P0
Preconditions: User account exists | Valid credentials available
Test Steps:
1. Send POST request to /api/auth/login
2. Include valid email in request body
3. Include valid password in request body
4. Verify response status code is 200
5. Verify JWT token is returned in response
6. Verify token expiry is 15 minutes from now
Expected Result:
- HTTP 200 OK response
- Valid JWT token in response body
- Token contains user_id claim
- Token expiry set to 900 seconds
Test Data: {"email":"testuser@example.com","password":"ValidPass123!"}TC002: Verify login failure with invalid password
Type: negative
Priority: P0
Preconditions: User account exists
Test Steps:
1. Send POST request to /api/auth/login
2. Include valid email
3. Include INVALID password
4. Verify response status code is 401
5. Verify error message is clear
Expected Result:
- HTTP 401 Unauthorized
- Error message: "Invalid credentials"
- No token returned
- Login attempt logged for security monitoring
Test Data: {"email":"testuser@example.com","password":"WrongPassword"}TC003: Verify rate limiting after 5 failed attempts
Type: security
Priority: P1
Preconditions: User account exists
Test Steps:
1. Send 5 failed login requests within 1 minute
2. Send 6th login request (even with valid credentials)
3. Verify response status code is 429
4. Wait 60 seconds
5. Verify login succeeds after rate limit reset
Expected Result:
- First 5 requests: HTTP 401
- 6th request: HTTP 429 Too Many Requests
- Error: "Rate limit exceeded. Try again in X seconds"
- After 60s: Login succeeds normally
Test Data: {"email":"testuser@example.com","password":"WrongPassword"}TC004: Verify SQL injection protection
Type: security
Priority: P0
Preconditions: None
Test Steps:
1. Send POST request with SQL injection in email field
2. Verify request is rejected
3. Verify no database error exposed
4. Verify attempt is logged
Expected Result:
- HTTP 400 Bad Request
- Generic error message (no SQL details exposed)
- Security event logged
- Database remains secure
Test Data: {"email":"' OR '1'='1","password":"anything"}Notice how the test cases include:
- Realistic test data (not "test123" everywhere)
- Security considerations (SQL injection, rate limiting)
- Clear expected results (specific HTTP codes, error messages)
- Proper prioritization (P0 for critical, P1 for important)
The MongoDB Memory: Why Context Matters
The workflow maintains conversation context using MongoDB. This enables powerful follow-up interactions:
Conversation Example:
You: Generate test cases for user login API
AI: [Generates 8 test cases covering standard scenarios]
You: Add test cases for biometric authentication
AI: [Generates 5 additional test cases specifically for
biometric auth, understanding we're still talking
about the login feature]
You: What about session management?
AI: [Generates session-specific test cases including
concurrent logins, session timeout, logout scenarios]Without memory, you'd have to re-explain the entire context each time. With memory, it's like talking to a team member who actually remembers what you discussed.
Technical Implementation:
// Memory is stored per conversation session
{
"sessionId": "unique-session-id",
"messages": [
{
"role": "user",
"content": "Generate test cases for login"
},
{
"role": "assistant",
"content": "Generated test cases for login API..."
}
],
"context": {
"feature": "login",
"apiEndpoint": "/api/auth/login",
"lastGenerated": "2024-12-15T10:30:00Z"
}
}This context is automatically passed to subsequent requests, enabling intelligent follow-ups.
The Google Sheets Magic: Instant Collaboration
Test cases are automatically appended to your Google Sheet with proper formatting:
Why Google Sheets?
- Zero learning curve - Everyone knows how to use Sheets
- Real-time collaboration - Your team can review/edit simultaneously
- Easy filtering - Filter by priority, type, status
- Export anywhere - Convert to Jira, TestRail, Excel, PDF
- Version history - See who changed what and when
Column Structure:
| Column | Purpose | Example Values |
|---|---|---|
| Test_ID | Unique identifier | TC001, TC002, TC003 |
| Title | Test case name | "Verify login with valid credentials" |
| Type | Category | happy, negative, boundary, security, regression |
| Priority | Importance | P0 (critical), P1 (high), P2 (medium), P3 (low) |
| Preconditions | Setup needed | "User exists | DB seeded" |
| Test_Steps | Numbered actions | "1. Navigate...\n2. Enter...\n3. Click..." |
| Expected_Result | Expected outcomes | "Success message shown\nUser redirected" |
| Test_Data | Sample data | {"email": "test@test.com", "password": "Pass123!"} |
| Flow_Type | API or UI | API, UI |
| Created_At | Timestamp | 2024-12-15T10:30:00Z |
| Status | Execution status | Not Executed, Passed, Failed |
You can easily import these into Jira using Xray or Zephyr CSV import.
Real Performance Metrics: Does It Actually Save Time?
I tracked our team's usage over the last 3 months. Here's what we found:
Time Savings
| Scenario | Manual Time | AI Time | Time Saved |
|---|---|---|---|
| Simple feature (3-5 cases) | 15 min | 20 sec | 93% faster |
| Medium feature (5-10 cases) | 45 min | 25 sec | 99% faster |
| Complex feature (10-15 cases) | 90 min | 35 sec | 99% faster |
| Average | 45 min | 25 sec | 99% faster |
Quality Metrics
- Test coverage: Increased from 65% to 92%
- Edge cases found: 3x more than manual writing
- Security tests: Went from "sometimes" to "always"
- Consistency: 100% (vs. ~70% with different people writing)
Cost Analysis (Real Numbers)
Our usage (December 2024):
- Test cases generated: 1,247
- Total OpenAI cost: $2.43
- Time saved: 937 hours
- Cost per test case: $0.002 (two-tenths of a cent!)
Manual equivalent:
- 1,247 test cases × 45 minutes = 937 hours
- At $50/hour QA rate = $46,850
ROI: 19,280x (not a typo!)
Even accounting for setup time (15 minutes) and occasional refinements, the ROI is insane.
Common Issues and How I Fixed Them
Issue #1: JSON Parsing Errors
Problem: AI sometimes generated code snippets within JSON values, breaking parsing.
Example of broken output:
{
"test_steps": "```python\nprint('hello')\n```"
}Solution: Added a cleanup node that strips markdown code fences before JSON parsing:
// JavaScript code in cleanup node
const text = $input.item.json.output;
// Remove markdown code fences
const cleaned = text
.replace(/```[a-z]*\n/g, '')
.replace(/```/g, '')
.trim();
// Extract JSON even if there's surrounding text
const jsonMatch = cleaned.match(/\{[\s\S]*\}/);
const jsonString = jsonMatch ? jsonMatch[0] : cleaned;
return {
json: {
cleaned: jsonString
}
};This single fix improved success rate from 85% to 98%.
Issue #2: Multiple JSON Objects in Response
Problem: When sending documentation with multiple examples, AI would generate multiple JSON objects:
{"feature": "login"}
{"feature": "register"}Solution: Detection + helpful error message:
// Check if response contains multiple JSON objects
const jsonObjects = response.match(/\{[^{}]*\}/g);
if (jsonObjects && jsonObjects.length > 1) {
return {
error: true,
message: "⚠️ Multiple test requirements detected. Please send ONE feature at a time.",
suggestion: "Try: 'Generate test cases for login API' (not the entire documentation)"
};
}Issue #3: Test Cases Too Generic
Problem: Initial prompts produced generic test cases like "Verify the API works."
Solution: Enhanced prompt with specific requirements:
const improvedPrompt = `
Generate test cases that include:
1. SPECIFIC test data (real-looking emails, realistic values)
2. DETAILED expected results (exact HTTP codes, error messages)
3. REALISTIC preconditions (what needs to be set up)
4. SECURITY considerations (SQL injection, XSS, auth checks)
5. EDGE cases (boundary values, empty inputs, special characters)
BAD Example:
- Title: "Test the API"
- Steps: "1. Call API"
- Expected: "It works"
GOOD Example:
- Title: "Verify user creation with valid data returns 201"
- Steps:
1. Send POST to /api/users with valid payload
2. Verify response status is 201 Created
3. Verify user_id is returned in response
4. Verify user appears in database
- Expected:
- HTTP 201 Created
- Response body contains: {"user_id": integer, "status": "active"}
- Database confirms user record created
- Test Data: {"name": "John Doe", "email": "john.doe@company.com", "age": 28}
`;This dramatically improved test case quality.
Issue #4: Rate Limiting During Batch Processing
Problem: Processing 50+ user stories triggered OpenAI rate limits.
Solution: Added delay between requests:
// In workflow, add "Wait" node between iterations
{
"delay": 2000, // 2 second delay
"unit": "milliseconds"
}For larger batches, I process them overnight when rate limits are less restrictive.
Advanced Use Cases: Beyond Basic Generation
Use Case 1: Batch Processing Related Features
When you have a complete user flow to test:
Generate comprehensive test cases for checkout flow:
1. Cart Management
- Add/remove items
- Update quantities
- Apply coupon codes
- Calculate totals
2. Shipping
- Enter address
- Validate fields (zip, phone)
- Select shipping method
- Calculate shipping cost
3. Payment
- Enter card details
- Validate CVV, expiry
- Process payment
- Handle declined cards
4. Confirmation
- Display order summary
- Send confirmation email
- Update inventory
Cover: happy path, validation errors, security, edge cases for ALL steps.Result: 45+ test cases covering the entire flow with proper interdependencies.
Use Case 2: Security-Focused Testing
When you need comprehensive security coverage:
Security test the authentication API:
Test for:
- SQL Injection: ' OR '1'='1
- XSS: <script>alert('xss')</script>
- Brute Force: Rate limiting (max 5 attempts)
- Session Fixation: New session after login
- CSRF: Token validation required
- Password Storage: Bcrypt hashed
- Timing Attacks: Consistent response timesResult: 12 security-specific test cases that most manual testers forget.
Use Case 3: Performance Testing
Performance test the search API:
Requirements:
- Response time: < 200ms (p95)
- Concurrent users: 1000
- Throughput: 10,000 requests/minute
- Error rate: < 0.1%
Test scenarios:
1. Baseline: Single user, simple query
2. Load: 100 concurrent users, sustained 5 min
3. Stress: 1000 concurrent users, sustained 10 min
4. Spike: 0 → 1000 users in 10 seconds
5. Soak: 500 users, sustained 1 hourResult: Complete performance test plan with JMeter script suggestions.
Integration Options: Beyond Google Sheets
Jira Integration (Coming Soon)
Auto-create test cases in Jira:
// Pseudo-code for Jira integration
testCases.forEach(async (testCase) => {
await jira.createTestCase({
project: 'QA',
summary: testCase.title,
steps: testCase.test_steps,
expectedResult: testCase.expected_result,
priority: mapPriority(testCase.priority),
labels: [testCase.type, 'ai-generated']
});
});Slack Notifications
Get notified when test cases are generated:
🤖 AI Test Generator
📊 Generated: 8 test cases
🎯 Feature: User Login API
🔗 Sheet: [View Test Cases]
⏱️ Time: 23 seconds
Breakdown:
- Happy path: 2 (P0)
- Negative: 3 (P1)
- Security: 2 (P0)
- Edge cases: 1 (P2)TestRail Import
Direct import to TestRail:
# Python script for TestRail import
import requests
def import_to_testrail(test_cases, project_id):
for tc in test_cases:
response = requests.post(
f'https://yourcompany.testrail.io/index.php?/api/v2/add_case/{project_id}',
json={
'title': tc['title'],
'custom_steps': tc['test_steps'],
'custom_expected': tc['expected_result'],
'priority_id': map_priority(tc['priority'])
},
auth=('user@email.com', 'api_key')
)Tips for Maximum Effectiveness
1. Be Specific with Context
❌ Bad input:
Test the login✅ Good input:
Test the login API for banking application.
Requirements:
- OAuth 2.0 authentication
- MFA required for new devices
- Session timeout: 15 minutes
- Support biometric on mobile
- Rate limit: 5 attempts per minute
- GDPR compliant (log all access)More context = better test cases.
2. Use Follow-ups Strategically
Instead of cramming everything into one request:
Request 1: "Generate test cases for user registration"
[Review generated cases]
Request 2: "Add test cases for email verification flow"
[AI remembers context, adds related cases]
Request 3: "What about password complexity validation?"
[AI adds specific validation test cases]This iterative approach produces better results than one giant request.
3. Review and Refine
AI generates 85-90% correct test cases. Always review for:
- Domain-specific terminology
- Compliance requirements
- Company-specific standards
- Integration points unique to your system
I spend about 5-10 minutes reviewing and tweaking per feature. Still way faster than writing from scratch!
4. Build a Template Library
Save prompts for common scenarios:
Saved Prompt: "API Testing - CRUD"
---
Test the {{entity}} CRUD API:
POST /api/{{entity}} - Create
GET /api/{{entity}}/:id - Read
PUT /api/{{entity}}/:id - Update
DELETE /api/{{entity}}/:id - Delete
Include: validation, auth, error handling, securityThen just fill in {{entity}} with "users", "products", "orders", etc.
5. Track Metrics
I created a simple dashboard tracking:
- Test cases generated per week
- Time saved vs. manual
- OpenAI costs
- Test execution pass rate
- Bugs found per test case
This helps justify the tool to management and identify areas for improvement.
Future Roadmap: What's Next
I'm working on these enhancements:
v2.1 (Q1 2025)
- Direct Jira Cloud integration
- Slack notifications
- PDF export with company branding
- Custom test case templates
v2.2 (Q2 2025)
- Multi-language support (Spanish, French, German)
- Test execution tracking
- Analytics dashboard
- Batch processing UI
v3.0 (Q3 2025)
- Support for Claude, Gemini (not just OpenAI)
- Visual test builder
- Auto-generate test automation code
- API marketplace for templates
5 Key Takeaways
-
AI excels at structured, repetitive tasks - Test case generation is the perfect use case (99% time savings)
-
Input format flexibility matters - Supporting 6+ formats means teams can use their existing tools without conversion overhead
-
Context is king - MongoDB memory enables intelligent follow-ups and iterative refinement
-
Human review is essential - AI gets you 85-90% there, humans handle domain-specific nuances
-
ROI is ridiculous - $2/month for 1,500 test cases vs. $46K manual cost (19,280x ROI)
Get Started: Your Next Steps
Ready to implement this in your workflow?
- Download the workflow from the repo
- Set up credentials (OpenAI, MongoDB, Google Sheets)
- Import to n8n and configure nodes
- Test with a simple feature ("Test user login")
- Iterate on prompts to match your specific needs
- Share with your team and gather feedback
- Track metrics to prove ROI to management
Estimated setup time: 15 minutes Payback time: First use (you'll save 45 minutes immediately)
Final Thoughts: Why This Works
I've tried a dozen "AI testing tools" over the years. Most were overpriced, under-delivered, or required complete workflow changes.
This solution works because:
- It's free/cheap (~$2/month)
- It fits your existing workflow (works with your current tools)
- It's controllable (you own the workflow, can customize everything)
- It actually saves time (99% reduction, proven with real metrics)
The best part? Once set up, it just works. I haven't touched the workflow in weeks - it's been quietly generating test cases for our team every day.
Have you automated test case creation? What's been your experience with AI in QA? Drop a comment below - I'd love to hear what you're building! 👇
Questions? Reach out on LinkedIn or leave a comment.
Want more QA automation content? Check out my other posts on Selenium best practices and CI/CD testing strategies.
Happy testing! 🚀