I Built an AI QA Assistant That Writes Test Cases in 15 Seconds

Written on December 15, 2025 by ibsanju.

Last updated December 15, 2025.

See changes
20 min read
––– views

I Built an AI QA Assistant That Writes Test Cases in 15 Seconds (And It's Better Than Me)

Let me tell you about the automation that's been saving my QA team 23 hours per day. Yes, you read that right - not a typo.

I've spent the last few months building an AI-powered test case generator using n8n and OpenAI. Today I'm sharing the complete setup, lessons learned, and why this might be the most practical QA automation you'll implement this year.

TL;DR: What We're Building

  • n8n workflow that accepts 6+ input formats (natural language, cURL, Postman, Swagger, JIRA formats)
  • OpenAI GPT-4o-mini integration generating 5-10 production-ready test cases per feature
  • MongoDB memory for contextual awareness across conversations
  • Google Sheets export for instant collaboration with your team
  • Real metrics: 15-25 seconds per feature vs. 30-60 minutes manual writing
  • Cost: ~$2/month for 1,500 test cases (or $22 if using n8n Cloud)

Let's build this thing! 🚀

The Problem: Manual Test Case Hell

Picture this: It's sprint planning, and you're staring at 20 user stories that need comprehensive test cases by end of week. Each one takes 30-60 minutes to write properly:

  1. Read and understand the requirement (5 mins)
  2. Draft happy path scenarios (10 mins)
  3. Think through edge cases (15 mins)
  4. Document negative scenarios (10 mins)
  5. Add security considerations (10 mins)
  6. Format everything consistently (10 mins)

Total: 60 minutes × 20 stories = 20 hours of soul-crushing work.

And here's the kicker - you still miss stuff. I once spent 45 minutes writing test cases for a login feature and completely forgot to test SQL injection attempts. The penetration testing team found it in 5 minutes. 🤦

Why This Solution Actually Works

I've tried "AI test case generators" before. Most produce generic garbage like:

Test Case 1: Verify the feature works
Steps: 1. Use the feature
Expected: It should work

Thanks, AI. Super helpful. 🙄

This workflow is different because it:

  1. Understands context through MongoDB memory
  2. Accepts multiple input formats (because real teams use different tools)
  3. Follows actual QA best practices (not made-up nonsense)
  4. Generates realistic test data (not "test123" everywhere)
  5. Exports to Google Sheets (where your team actually works)

The Architecture: How It All Fits Together

Here's the complete flow (and trust me, each piece matters):

  User Input (6 formats supported)

  Data Extraction Agent (GPT-4o-mini, temp: 0.1)

  Parse & Validate JSON

  Test Case Generator Agent (GPT-4o-mini, temp: 0.2)

  Transform for Google Sheets

  Append to Sheet (auto-mapped columns)

  Success Response with metrics

Key Design Decisions:

  • Two separate AI agents: One for extraction (deterministic), one for generation (slightly creative)
  • MongoDB memory: Maintains conversation context so you can say "add security tests" and it knows what feature you're talking about
  • Error handling at every step: Because AI is probabilistic and things break
  • Temperature tuning: 0.1 for extraction (consistent), 0.2 for generation (varied but controlled)

Note: I tried using a single AI agent initially. It produced inconsistent results because extraction needs to be deterministic while generation benefits from slight creativity. Two agents with different temperature settings solved this completely.

Step-by-Step Setup (15 Minutes)

Prerequisites

You'll need:

  • n8n (cloud or self-hosted - I use self-hosted on a $5 DigitalOcean droplet)
  • OpenAI API key (get one here)
  • MongoDB Atlas (free tier works perfectly)
  • Google account (for Sheets integration)

Step 1: MongoDB Setup (3 minutes)

  1. Create free MongoDB Atlas account
  2. Create new cluster (M0 free tier)
  3. Create database: n8n_db
  4. Create two collections:
    • n8n_extractor_memory
    • n8n_generator_memory
  5. Copy connection string (we'll need this later)

Pro tip: Whitelist all IPs (0.0.0.0/0) in network access if using n8n Cloud. For self-hosted, whitelist your server's IP.

Step 2: Google Sheets Setup (2 minutes)

  1. Create new Google Sheet: "QA Test Cases"
  2. Add these exact headers in Row 1:
    Test_ID | Title | Type | Priority | Preconditions | 
    Test_Steps | Expected_Result | Test_Data | Flow_Type | 
    Created_At | Status
  3. Note the Sheet ID from URL:
    https://docs.google.com/spreadsheets/d/YOUR_SHEET_ID/edit

Step 3: Import n8n Workflow (2 minutes)

  1. Download workflow_fully_fixed_v2.json from the repo
  2. In n8n: WorkflowsImport from File
  3. Select the JSON file
  4. Click Import

You should now see this beast of a workflow:

Step 4: Configure Credentials (5 minutes)

OpenAI API

Credential Name: OpenAi account
Type: OpenAI API
API Key: sk-proj-xxxxx...

Cost check: GPT-4o-mini is crazy cheap:

  • Input: $0.15 per 1M tokens
  • Output: $0.60 per 1M tokens
  • Average per test case: ~$0.001 (one-tenth of a cent!)

MongoDB

Credential Name: MongoDB account
Type: MongoDB
Connection String: mongodb+srv://username:password@cluster.mongodb.net/
Database Name: n8n_db

Test the connection before proceeding. I've wasted hours debugging other issues when the connection string was wrong. 🙃

Google Sheets

Two options here:

Option A: OAuth2 (Quick setup, personal use)

  • In n8n, add Google Sheets OAuth2 credential
  • Authorize with your Google account
  • Done!

Option B: Service Account (Production, team use)

  1. Create Google Cloud project
  2. Enable Sheets API
  3. Create service account
  4. Download JSON key
  5. In n8n, add Google Service Account credential
  6. Upload JSON key

I use Option B for production because it doesn't require re-authorization every few months.

Step 5: Configure Workflow Nodes (3 minutes)

  1. Open the "Append to Google Sheets" node
  2. Set your Sheet ID
  3. Set Sheet Name: Sheet1 (or whatever you named it)
  4. Save workflow

Activate the workflow using the toggle at top-right. You're now live! 🎉

Real-World Usage: 6 Input Formats That Actually Work

The magic is that this workflow accepts whatever format you're already using. No need to convert everything to some proprietary format.

Format 1: Natural Language (Most Common)

Create test cases for user registration. Users should register 
with email and password. Password must be 8+ characters with 
1 uppercase, 1 number. Email verification required within 24 hours.

Output: 8 test cases covering happy path, validation rules, email verification, security, and edge cases.

Format 2: cURL Commands (For API Testing)

curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer token123" \
  -d '{
    "name": "John Doe",
    "email"\: "john@example.com",
    "age": 25
  }'

Output: 6 test cases covering request validation, auth checks, response codes, error handling.

Format 3: Postman Collections (Copy-Paste)

Just paste your Postman collection JSON directly:

{
  "info": {
    "name": "User API"
  },
  "item": [
    {
      "name": "Create User",
      "request": {
        "method": "POST",
        "url": "{{base_url}}/api/users",
        "body": {
          "mode": "raw",
          "raw": "{\"name\":\"test\",\"email\":\"test@test.com\"}"
        }
      }
    }
  ]
}

Output: Test cases automatically extracted from request/response structure.

Format 4: Swagger/OpenAPI Specs (Best for API)

openapi: 3.0.0
paths:
  /orders:
    post:
      summary: Create order
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                user_id:
                  type: integer
                items:
                  type: array
                total:
                  type: number

Output: Comprehensive API test cases with schema validation.

Format 5: JIRA Test Format (API)

Story: USER-123

API Endpoint: POST /api/orders

Description: Creates new order

Request:
- Headers: Authorization (required)
- Body: { "user_id": int, "items": array, "total": float }

Response:
- Status: 201 Created
- Body: { "order_id": int, "status": "pending" }

Security: OAuth 2.0 required

Format 6: JIRA Test Format (UI)

Story: USER-456

As a customer
I want to filter products by category
So that I can find items quickly

Acceptance Criteria:
1. Multiple categories can be selected
2. Results update in real-time
3. Filters persist on page refresh

Preconditions:
- User logged in
- Products exist

UI Elements:
- Category dropdown
- Filter button
- Clear button

The workflow automatically detects which format you're using and adjusts its extraction logic. No need to tell it "this is a cURL command" - it figures it out.

What You Actually Get: Test Case Quality

Here's a real example from our production system. I gave it this input:

API: POST /api/auth/login
Body: { "email": "string", "password": "string" }
Returns JWT token on success.
Rate limit: 5 attempts per minute.
Session timeout: 15 minutes.

Generated Test Cases (Sample)

TC001: Verify successful login with valid credentials

Type: happy
Priority: P0
Preconditions: User account exists | Valid credentials available
Test Steps:
1. Send POST request to /api/auth/login
2. Include valid email in request body
3. Include valid password in request body
4. Verify response status code is 200
5. Verify JWT token is returned in response
6. Verify token expiry is 15 minutes from now
Expected Result:
- HTTP 200 OK response
- Valid JWT token in response body
- Token contains user_id claim
- Token expiry set to 900 seconds
Test Data: {"email":"testuser@example.com","password":"ValidPass123!"}

TC002: Verify login failure with invalid password

Type: negative
Priority: P0
Preconditions: User account exists
Test Steps:
1. Send POST request to /api/auth/login
2. Include valid email
3. Include INVALID password
4. Verify response status code is 401
5. Verify error message is clear
Expected Result:
- HTTP 401 Unauthorized
- Error message: "Invalid credentials"
- No token returned
- Login attempt logged for security monitoring
Test Data: {"email":"testuser@example.com","password":"WrongPassword"}

TC003: Verify rate limiting after 5 failed attempts

Type: security
Priority: P1
Preconditions: User account exists
Test Steps:
1. Send 5 failed login requests within 1 minute
2. Send 6th login request (even with valid credentials)
3. Verify response status code is 429
4. Wait 60 seconds
5. Verify login succeeds after rate limit reset
Expected Result:
- First 5 requests: HTTP 401
- 6th request: HTTP 429 Too Many Requests
- Error: "Rate limit exceeded. Try again in X seconds"
- After 60s: Login succeeds normally
Test Data: {"email":"testuser@example.com","password":"WrongPassword"}

TC004: Verify SQL injection protection

Type: security
Priority: P0
Preconditions: None
Test Steps:
1. Send POST request with SQL injection in email field
2. Verify request is rejected
3. Verify no database error exposed
4. Verify attempt is logged
Expected Result:
- HTTP 400 Bad Request
- Generic error message (no SQL details exposed)
- Security event logged
- Database remains secure
Test Data: {"email":"' OR '1'='1","password":"anything"}

Notice how the test cases include:

  • Realistic test data (not "test123" everywhere)
  • Security considerations (SQL injection, rate limiting)
  • Clear expected results (specific HTTP codes, error messages)
  • Proper prioritization (P0 for critical, P1 for important)

The MongoDB Memory: Why Context Matters

The workflow maintains conversation context using MongoDB. This enables powerful follow-up interactions:

Conversation Example:

You: Generate test cases for user login API

AI: [Generates 8 test cases covering standard scenarios]

You: Add test cases for biometric authentication

AI: [Generates 5 additional test cases specifically for 
     biometric auth, understanding we're still talking 
     about the login feature]

You: What about session management?

AI: [Generates session-specific test cases including 
     concurrent logins, session timeout, logout scenarios]

Without memory, you'd have to re-explain the entire context each time. With memory, it's like talking to a team member who actually remembers what you discussed.

Technical Implementation:

// Memory is stored per conversation session
{
  "sessionId": "unique-session-id",
  "messages": [
    {
      "role": "user",
      "content": "Generate test cases for login"
    },
    {
      "role": "assistant", 
      "content": "Generated test cases for login API..."
    }
  ],
  "context": {
    "feature": "login",
    "apiEndpoint": "/api/auth/login",
    "lastGenerated": "2024-12-15T10:30:00Z"
  }
}

This context is automatically passed to subsequent requests, enabling intelligent follow-ups.

The Google Sheets Magic: Instant Collaboration

Test cases are automatically appended to your Google Sheet with proper formatting:

Why Google Sheets?

  1. Zero learning curve - Everyone knows how to use Sheets
  2. Real-time collaboration - Your team can review/edit simultaneously
  3. Easy filtering - Filter by priority, type, status
  4. Export anywhere - Convert to Jira, TestRail, Excel, PDF
  5. Version history - See who changed what and when

Column Structure:

ColumnPurposeExample Values
Test_IDUnique identifierTC001, TC002, TC003
TitleTest case name"Verify login with valid credentials"
TypeCategoryhappy, negative, boundary, security, regression
PriorityImportanceP0 (critical), P1 (high), P2 (medium), P3 (low)
PreconditionsSetup needed"User exists | DB seeded"
Test_StepsNumbered actions"1. Navigate...\n2. Enter...\n3. Click..."
Expected_ResultExpected outcomes"Success message shown\nUser redirected"
Test_DataSample data{"email": "test@test.com", "password": "Pass123!"}
Flow_TypeAPI or UIAPI, UI
Created_AtTimestamp2024-12-15T10:30:00Z
StatusExecution statusNot Executed, Passed, Failed

You can easily import these into Jira using Xray or Zephyr CSV import.

Real Performance Metrics: Does It Actually Save Time?

I tracked our team's usage over the last 3 months. Here's what we found:

Time Savings

ScenarioManual TimeAI TimeTime Saved
Simple feature (3-5 cases)15 min20 sec93% faster
Medium feature (5-10 cases)45 min25 sec99% faster
Complex feature (10-15 cases)90 min35 sec99% faster
Average45 min25 sec99% faster

Quality Metrics

  • Test coverage: Increased from 65% to 92%
  • Edge cases found: 3x more than manual writing
  • Security tests: Went from "sometimes" to "always"
  • Consistency: 100% (vs. ~70% with different people writing)

Cost Analysis (Real Numbers)

Our usage (December 2024):

  • Test cases generated: 1,247
  • Total OpenAI cost: $2.43
  • Time saved: 937 hours
  • Cost per test case: $0.002 (two-tenths of a cent!)

Manual equivalent:

  • 1,247 test cases × 45 minutes = 937 hours
  • At $50/hour QA rate = $46,850

ROI: 19,280x (not a typo!)

Even accounting for setup time (15 minutes) and occasional refinements, the ROI is insane.

Common Issues and How I Fixed Them

Issue #1: JSON Parsing Errors

Problem: AI sometimes generated code snippets within JSON values, breaking parsing.

Example of broken output:

{
  "test_steps": "```python\nprint('hello')\n```"
}

Solution: Added a cleanup node that strips markdown code fences before JSON parsing:

// JavaScript code in cleanup node
const text = $input.item.json.output;
 
// Remove markdown code fences
const cleaned = text
  .replace(/```[a-z]*\n/g, '')
  .replace(/```/g, '')
  .trim();
 
// Extract JSON even if there's surrounding text
const jsonMatch = cleaned.match(/\{[\s\S]*\}/);
const jsonString = jsonMatch ? jsonMatch[0] : cleaned;
 
return { 
  json: { 
    cleaned: jsonString 
  } 
};

This single fix improved success rate from 85% to 98%.

Issue #2: Multiple JSON Objects in Response

Problem: When sending documentation with multiple examples, AI would generate multiple JSON objects:

{"feature": "login"}
{"feature": "register"}

Solution: Detection + helpful error message:

// Check if response contains multiple JSON objects
const jsonObjects = response.match(/\{[^{}]*\}/g);
 
if (jsonObjects && jsonObjects.length > 1) {
  return {
    error: true,
    message: "⚠️ Multiple test requirements detected. Please send ONE feature at a time.",
    suggestion: "Try: 'Generate test cases for login API' (not the entire documentation)"
  };
}

Issue #3: Test Cases Too Generic

Problem: Initial prompts produced generic test cases like "Verify the API works."

Solution: Enhanced prompt with specific requirements:

const improvedPrompt = `
Generate test cases that include:
 
1. SPECIFIC test data (real-looking emails, realistic values)
2. DETAILED expected results (exact HTTP codes, error messages)
3. REALISTIC preconditions (what needs to be set up)
4. SECURITY considerations (SQL injection, XSS, auth checks)
5. EDGE cases (boundary values, empty inputs, special characters)
 
BAD Example:
- Title: "Test the API"
- Steps: "1. Call API"
- Expected: "It works"
 
GOOD Example:
- Title: "Verify user creation with valid data returns 201"
- Steps: 
  1. Send POST to /api/users with valid payload
  2. Verify response status is 201 Created
  3. Verify user_id is returned in response
  4. Verify user appears in database
- Expected: 
  - HTTP 201 Created
  - Response body contains: {"user_id": integer, "status": "active"}
  - Database confirms user record created
- Test Data: {"name": "John Doe", "email": "john.doe@company.com", "age": 28}
`;

This dramatically improved test case quality.

Issue #4: Rate Limiting During Batch Processing

Problem: Processing 50+ user stories triggered OpenAI rate limits.

Solution: Added delay between requests:

// In workflow, add "Wait" node between iterations
{
  "delay": 2000, // 2 second delay
  "unit": "milliseconds"
}

For larger batches, I process them overnight when rate limits are less restrictive.

Advanced Use Cases: Beyond Basic Generation

When you have a complete user flow to test:

Generate comprehensive test cases for checkout flow:

1. Cart Management
   - Add/remove items
   - Update quantities
   - Apply coupon codes
   - Calculate totals

2. Shipping
   - Enter address
   - Validate fields (zip, phone)
   - Select shipping method
   - Calculate shipping cost

3. Payment
   - Enter card details
   - Validate CVV, expiry
   - Process payment
   - Handle declined cards

4. Confirmation
   - Display order summary
   - Send confirmation email
   - Update inventory

Cover: happy path, validation errors, security, edge cases for ALL steps.

Result: 45+ test cases covering the entire flow with proper interdependencies.

Use Case 2: Security-Focused Testing

When you need comprehensive security coverage:

Security test the authentication API:

Test for:
- SQL Injection: ' OR '1'='1
- XSS: <script>alert('xss')</script>
- Brute Force: Rate limiting (max 5 attempts)
- Session Fixation: New session after login
- CSRF: Token validation required
- Password Storage: Bcrypt hashed
- Timing Attacks: Consistent response times

Result: 12 security-specific test cases that most manual testers forget.

Use Case 3: Performance Testing

Performance test the search API:

Requirements:
- Response time: < 200ms (p95)
- Concurrent users: 1000
- Throughput: 10,000 requests/minute
- Error rate: < 0.1%

Test scenarios:
1. Baseline: Single user, simple query
2. Load: 100 concurrent users, sustained 5 min
3. Stress: 1000 concurrent users, sustained 10 min
4. Spike: 0 → 1000 users in 10 seconds
5. Soak: 500 users, sustained 1 hour

Result: Complete performance test plan with JMeter script suggestions.

Integration Options: Beyond Google Sheets

Jira Integration (Coming Soon)

Auto-create test cases in Jira:

// Pseudo-code for Jira integration
testCases.forEach(async (testCase) => {
  await jira.createTestCase({
    project: 'QA',
    summary: testCase.title,
    steps: testCase.test_steps,
    expectedResult: testCase.expected_result,
    priority: mapPriority(testCase.priority),
    labels: [testCase.type, 'ai-generated']
  });
});

Slack Notifications

Get notified when test cases are generated:

🤖 AI Test Generator

📊 Generated: 8 test cases
🎯 Feature: User Login API
🔗 Sheet: [View Test Cases]
⏱️ Time: 23 seconds

Breakdown:
- Happy path: 2 (P0)
- Negative: 3 (P1)
- Security: 2 (P0)
- Edge cases: 1 (P2)

TestRail Import

Direct import to TestRail:

# Python script for TestRail import
import requests
 
def import_to_testrail(test_cases, project_id):
    for tc in test_cases:
        response = requests.post(
            f'https://yourcompany.testrail.io/index.php?/api/v2/add_case/{project_id}',
            json={
                'title': tc['title'],
                'custom_steps': tc['test_steps'],
                'custom_expected': tc['expected_result'],
                'priority_id': map_priority(tc['priority'])
            },
            auth=('user@email.com', 'api_key')
        )

Tips for Maximum Effectiveness

1. Be Specific with Context

❌ Bad input:

Test the login

✅ Good input:

Test the login API for banking application. 
Requirements:
- OAuth 2.0 authentication
- MFA required for new devices
- Session timeout: 15 minutes
- Support biometric on mobile
- Rate limit: 5 attempts per minute
- GDPR compliant (log all access)

More context = better test cases.

2. Use Follow-ups Strategically

Instead of cramming everything into one request:

Request 1: "Generate test cases for user registration"
[Review generated cases]

Request 2: "Add test cases for email verification flow"
[AI remembers context, adds related cases]

Request 3: "What about password complexity validation?"
[AI adds specific validation test cases]

This iterative approach produces better results than one giant request.

3. Review and Refine

AI generates 85-90% correct test cases. Always review for:

  • Domain-specific terminology
  • Compliance requirements
  • Company-specific standards
  • Integration points unique to your system

I spend about 5-10 minutes reviewing and tweaking per feature. Still way faster than writing from scratch!

4. Build a Template Library

Save prompts for common scenarios:

Saved Prompt: "API Testing - CRUD"
---
Test the {{entity}} CRUD API:

POST /api/{{entity}} - Create
GET /api/{{entity}}/:id - Read
PUT /api/{{entity}}/:id - Update
DELETE /api/{{entity}}/:id - Delete

Include: validation, auth, error handling, security

Then just fill in {{entity}} with "users", "products", "orders", etc.

5. Track Metrics

I created a simple dashboard tracking:

  • Test cases generated per week
  • Time saved vs. manual
  • OpenAI costs
  • Test execution pass rate
  • Bugs found per test case

This helps justify the tool to management and identify areas for improvement.

Future Roadmap: What's Next

I'm working on these enhancements:

v2.1 (Q1 2025)

  • Direct Jira Cloud integration
  • Slack notifications
  • PDF export with company branding
  • Custom test case templates

v2.2 (Q2 2025)

  • Multi-language support (Spanish, French, German)
  • Test execution tracking
  • Analytics dashboard
  • Batch processing UI

v3.0 (Q3 2025)

  • Support for Claude, Gemini (not just OpenAI)
  • Visual test builder
  • Auto-generate test automation code
  • API marketplace for templates

5 Key Takeaways

  1. AI excels at structured, repetitive tasks - Test case generation is the perfect use case (99% time savings)

  2. Input format flexibility matters - Supporting 6+ formats means teams can use their existing tools without conversion overhead

  3. Context is king - MongoDB memory enables intelligent follow-ups and iterative refinement

  4. Human review is essential - AI gets you 85-90% there, humans handle domain-specific nuances

  5. ROI is ridiculous - $2/month for 1,500 test cases vs. $46K manual cost (19,280x ROI)

Get Started: Your Next Steps

Ready to implement this in your workflow?

  1. Download the workflow from the repo
  2. Set up credentials (OpenAI, MongoDB, Google Sheets)
  3. Import to n8n and configure nodes
  4. Test with a simple feature ("Test user login")
  5. Iterate on prompts to match your specific needs
  6. Share with your team and gather feedback
  7. Track metrics to prove ROI to management

Estimated setup time: 15 minutes Payback time: First use (you'll save 45 minutes immediately)

Final Thoughts: Why This Works

I've tried a dozen "AI testing tools" over the years. Most were overpriced, under-delivered, or required complete workflow changes.

This solution works because:

  • It's free/cheap (~$2/month)
  • It fits your existing workflow (works with your current tools)
  • It's controllable (you own the workflow, can customize everything)
  • It actually saves time (99% reduction, proven with real metrics)

The best part? Once set up, it just works. I haven't touched the workflow in weeks - it's been quietly generating test cases for our team every day.

Have you automated test case creation? What's been your experience with AI in QA? Drop a comment below - I'd love to hear what you're building! 👇


Questions? Reach out on LinkedIn or leave a comment.

Want more QA automation content? Check out my other posts on Selenium best practices and CI/CD testing strategies.

Happy testing! 🚀

Share this article

Enjoying this post?

Don't miss out 😉. Get an email whenever I post, no spam.

Subscribe Now