Teaching a Private SLM About Your Target Application Using Document RAG for QA Testing

Teaching a Private SLM About Your Target Application Using Document RAG for QA Testing

A Private Small Language Models (SLMs) hosted onsite or on private cloud are becoming the default choice in enterprise QA teams because of privacy, compliance, and control. But the moment we try to use a private SLM for real QA work—generating test cases, understanding application flows, or validating business rules—we hit a hard truth: the model doesn’t know our target application under test. It doesn’t understand our requirements, our test plans, our architecture, or even the terminologies specific to the domain (Finance, Telecom, Life Sciences). As a result, the SLM produces generic, assumption-driven answers that cannot be trusted in a testing environment. This challenge is exactly where RAG for QA Testing becomes valuable.

In this blog, I’ll show how we solved this problem by teaching the SLM about the target application using Document-based Retrieval-Augmented Generation (RAG), and how this approach transforms a private SLM from a generic text generator into a project-aware QA assistant for RAG for QA Testing workflows.

1. Introduction

Private SLMs are widely used in QA teams because they are secure and work inside enterprise environments. But when we try to use a private SLM for real QA tasks—like understanding application flows or generating test cases—we face a common issue: the SLM does not know our target application. It has no idea about our requirements, test cases, or business rules, so it gives only generic answers.

In this blog, I show how we solve this problem by teaching the SLM using Document-based RAG (Retrieval-Augmented Generation). By connecting the SLM to real application-specific documents, the model starts answering based on actual application behaviour. Through real screenshots, I’ll show how Document RAG turns a private SLM into a useful and reliable QA assistant.

2. The Real Problem with Private SLMs in QA

When we use a private SLM in QA projects, we often expect it to behave like a smart team member who understands our application. But in reality, a private SLM only knows general software knowledge, not your application-specific details, as it comes with a fixed set of information.

It does not know:

  • How our application works
  • What modules and flows exist
  • What validations do the requirements define?
  • How do QA engineers write test cases for the target application?

So when a QA engineer asks questions like:

  • “Explain the onboarding flow of our application.”
  • “Generate test cases for the Add Vendor feature.”
  • “What are the negative scenarios for the SKYBoard module?”

The private SLM gives generic answers based on assumptions, not based on the real application. These answers may look correct at first glance, but they often miss important business rules, edge cases, and validations that matter in testing.

In QA, generic answers are dangerous. They reduce trust in AI, force testers to double-check everything, and limit the real value of using SLMs in testing workflows.

This is the core problem:

Private SLMs are powerful, but they are completely unaware of your target application unless you teach them.

3. Why is Document RAG Mandatory for QA Testing

To make a private SLM useful for QA, we must teach it on the target application, its concepts, terminologies, workflows, etc. Without this, the model will always give generic answers, no matter how advanced it is.

This is where Document-based Retrieval-Augmented Generation (RAG) becomes mandatory.

Instead of training or fine-tuning the SLM, Document RAG works by:

  • Storing target application documents outside the model
  • Searching those documents when a user asks a question.
  • Providing only the relevant content to the SLM at runtime

This means the SLM answers questions based on the well-documented target application knowledge base, not assumptions.

For QA teams, this is especially important because:

  • Requirements change frequently
  • Test cases evolve every sprint
  • New features introduce new flows
  • Teams keep updating demo videos and documentation (Or not 😀).

Fine-tuning a model every time something changes is not practical. Document RAG solves this by keeping the knowledge dynamic and always up to date.

In simple terms:

Document RAG does not change the SLM — it teaches the SLM using your actual target application documents.

This approach allows the private SLM to understand:

  • Application flows
  • Business rules
  • Validation logic
  • Real test scenarios

In the next sections, I’ll show how this works in practice using screenshots from my RAG implementation.

4. What I Built – Document RAG System for QA

To solve the problem of private SLMs not understanding target applications, I built a Document RAG system specifically designed for QA software testing.

The idea was simple:
Instead of expecting the SLM to “know” the application, we connect it directly to the documents containing the target application knowledge base and let it learn from them at query time.

High-Level Architecture

The system has four main parts:

  1. Application Documents as Source of Truth
    The system stores all QA-related documents in a single place.
    • Requirement documents
    • Test cases and test plans
    • Architecture notes
    • JSON and structured QA data
    • Demo and walkthrough videos
  2. RAG Engine (Document Processing Layer)
    The RAG engine:
    • Reads documents from the workspace
    • Splits them into meaningful chunks
    • Converts them into vector embeddings
    • Stores them in a vector database
  3. Private SLM (Reasoning Layer)
    The system uses a private SLM only for reasoning.
    It does not store application knowledge permanently.
    It answers questions using the context provided by RAG.
  4. MCP Server (Integration Layer)
    The system exposes the RAG system as an MCP tool, so the SLM can:
    • Query documents.
    • Perform deep analysis
    • Retrieve answers with traceable sources

This design keeps the system:

  • Modular
  • Secure
  • Easy to extend across multiple projects

How QA Engineers Use It

QA engineers interact with the system directly from VS Code using the Continue extension. They can ask real project questions, such as:

  • “Explain the Add Employee flow.”
  • “Generate test cases for this module.”
  • What validations do the requirements define?

The answers come only from indexed documents, making the output reliable and QA-friendly.

5. Implementation – Documents Indexed into RAG

The first and most important step in teaching a private SLM is feeding it the right knowledge. In my implementation, this knowledge comes directly from target application documents, not sample data or assumptions.

What the RAG System Indexes

The RAG system continuously scans a dedicated workspace folder that contains all QA-related artifacts, such as:

  • Requirement documents (.pdf, .docx, .txt)
  • Test cases and test plans
  • Architecture and functional notes
  • JSON and structured QA data
  • Demo and walkthrough videos

These documents represent the single source of truth for the application.

How Documents Are Prepared for RAG

When teams add or update documents:

  1. The RAG engine reads each file from the workspace (local file system, Google Drive, OneDrive, etc)
  2. The RAG engine cleans and normalizes the content (especially PDFs).
  3. The RAG engine splits large documents into meaningful chunks.
  4. The system converts each chunk into vector embeddings.
  5. The system stores the embeddings in a vector database.

This process ensures that:

  • The system does not lose any important knowledge.
  • Large documents remain searchable
  • Retrieval is fast and accurate

Why This Matters for QA

Because the RAG engine indexes the documents directly from the workspace:

  • The SLM always works with the latest information from documents
  • Updated test cases are immediately available
  • The system does not require retraining or fine-tuning.

From a QA perspective, this is critical.
The AI assistant answers questions only based on what exists in the target application documents, not on general industry assumptions.

RAG_SYSTEM :

RAG for QA Testing

This screenshot shows the actual workspace structure used by the Document RAG system:

  • target_docs/
    Contains real QA artifacts:
    • Requirement documents (PDF)
    • Test case design files
    • JSON configuration data
    • Excel-based test data
    • Demo images and videos
  • target_docs/videos/
    Stores walkthroughs, and demo videos that are indexed using:
    • Speech-to-text (video transcripts)
    • OCR on video frames (for UI text)
  • db_engine/
    This is the vector database generated by the RAG engine:
    • chroma.sqlite3 stores embeddings
    • Chunked document knowledge lives here

6. Ask QA Questions Using VS Code (Continue + MCP)

Once the documents are indexed, the next step is how QA engineers/Testers actually use the system in their daily work. In my implementation, everything happens inside VS Code, using the Continue extension connected to the RAG system through an MCP server.

QA Workflow Inside VS Code

Instead of switching between tools, documents, and browsers, a QA engineer can simply ask questions directly in VS Code, such as:

  • “How do I add a new employee in the PIM module?”
  • “Explain the validation rules for this feature.”
  • “Generate test cases based on the requirement document.”

These are real QA questions that require application-specific knowledge, not generic AI answers.

What Happens Behind the Scenes

When a question is asked in Continue:

  1. The query is sent to the MCP server
  2. The MCP server invokes the RAG tool
  3. Relevant documents are retrieved from the vector database
  4. The retrieved content is passed to the private SLM
  5. The SLM generates an answer strictly based on those documents

At no point does the SLM guess or rely on public knowledge.

Why MCP Matters Here

Using MCP provides a clean separation of responsibilities:

Document RAG System

This makes the system:

  • Modular
  • Scalable
  • Easy to extend across projects

For QA teams, this means the AI assistant behaves like an application-aware testing expert, not a generic chatbot.

RAG for QA Testing

This screenshot demonstrates how Model Context Protocol (MCP) is used to connect a private SLM with the Document RAG system during a real QA query.

You can see the list of registered MCP tools, such as:

🔎 rag_query – Standard RAG Query Tool

This is the primary tool used for document-based question answering.

It allows QA engineers to ask questions about the client application.
If debug=True, it returns structured JSON that includes:

  • Original user question
  • Rewritten query (if applied)
  • Whether query rewriting was triggered
  • Retrieved document sources
  • Final generated answer

This tool ensures that responses are grounded in real client documents.

🎥 index_video – Index a Single Video

This tool indexes a single demo or walkthrough video into the RAG database.

It processes:

Speech-to-text transcription

Optional OCR on video frames

Once indexed, video knowledge becomes searchable like any other document.

📂 index_all_videos – Bulk Video Indexing

This tool scans the target_docs/videos directory and indexes all .mp4 files into the RAG database at once.

It is useful when:

  • New KT sessions are added
  • Demo recordings are uploaded
  • Large batches of videos need indexing

🧠 hybrid_deep_query – Advanced RAG + Full Document Context

This tool is designed for complex or high-precision queries.

It works by:

  1. Using RAG to identify the most relevant files
  2. Loading the complete content of those files (CAG – Context-Aware Generation)
  3. Generating a deep, fully context-grounded answer

This is ideal for detailed QA analysis or requirement validation.

❤️ health_check – Connectivity Verification

A lightweight tool that verifies whether the MCP server is running and properly connected to the vector database.

This helps ensure:

  • Server availability
  • Database presence
  • Stable MCP communication

Screenshot: Asking a QA Question in VS Code

RAG for QA Testing

This screenshot demonstrates:

  • A real QA question typed inside VS Code: Retrieve information related to how to add a new employee in the PIM Module using RAG …
  • Continue invoking the RAG MCP tool:  rag_query tool
  • The workflow stays fully inside the IDE
  • On the right side, when a QA question is asked,

Continue clearly shows:

  • Continue to use the RAG rag_query tool.
  • This is a very important indicator.

This message confirms that:

  • The SLM is not answering from its own knowledge
  • The response is generated by calling the RAG MCP tool
  • Documents are actively retrieved and used to form the answer

In other words, the SLM is behaving like a tool user, not a guessing chatbot.

What This Means for QA Testing

For QA engineers, this brings confidence and transparency:

  • Answers are based on real application documentation
  • No hallucination or assumed workflows
  • Clear visibility into which tool was used
  • Easy to debug and validate AI responses

This is critical in QA, where incorrect assumptions can lead to missed defects and unreliable test coverage.

Key Takeaway from This Screenshot

MCP makes RAG visible, verifiable, and production-ready.

Instead of hiding retrieval logic inside prompts, MCP exposes RAG as a first-class QA tool that the private SLM explicitly uses. This is what turns AI from an experiment into a trusted QA assistant

7. Advanced RAG in Action – Query Rewriting & Source-Aware Retrieval

One of the biggest challenges in QA is that they ask questions in human language, and documents speak a more formal and sophisticated language.

QA engineers usually ask questions like:

  • “How does a supervisor approve or reject a timesheet?”
  • “What happens after submission?”

But documentation often uses:

  • Formal headings
  • Role-based terms
  • Structured language (Supervisor, Manager, Approval Workflow, etc.)

If we send the user’s raw question directly to vector search, retrieval can be incomplete or noisy.

To solve this, I implemented Query Rewriting as part of my RAG pipeline — a key feature that turns this into an advanced, enterprise-grade RAG system.

What Is Query Rewriting in RAG?

Query rewriting means:

  • Taking a conversational QA question
  • Understanding the intent
  • Converting it into a clean, focused retrieval query
  • Then, using that rewritten query to fetch documents

In simple words:

Users ask questions like humans.
Documents are written like manuals/SOPs/Workflows.
Query rewriting bridges that gap.

Query Rewrriten

How Query Rewriting Works in My RAG System

Before document retrieval happens:

  1. The system looks at:
    • Current question
    • Recent conversation history
  2. It rewrites the question into a single, standalone search query
  3. That rewritten query is used for vector retrieval
  4. Only the most relevant document chunks are passed to the SLM

This step dramatically improves:

  • Retrieval precision
  • Answer accuracy
  • QA trust in AI outputs
RAG for QA Testing

This screenshot demonstrates an advanced RAG capability that goes beyond basic document retrieval — query rewriting combined with source-level traceability.

Query Rewriting in Action (Left Panel)

On the left side, the RAG system returns a structured debug response that clearly shows how the user’s question was processed before retrieval.

The original user question was:

“How does a supervisor approve or reject an employee’s timesheet?”

Before performing a document search, the system automatically rewrote the query into a more focused retrieval term:

  • question_rewritten: "Supervisor"
  • rewrite_enabled: true
  • rewrite_applied: true

This step is critical because QA engineers usually ask questions in natural language, while documentation is written using formal role-based terminology. Query rewriting bridges this gap and ensures that the retrieval engine searches using the language of the documentation, not the language of conversation.

Document-Backed Retrieval with Exact Page References

The same debug output also shows the retrieved sources:

  • Application document: OrangeHRM User Guide (PDF)
  • Exact page numbers: pages 113 and 114
  • Multiple retrieved chunks confirming consistency

On the right side, the generated answer is explicitly labeled as:

“As documented in the OrangeHRM User Guide – pages 113–114.”

This confirms that:

  • The response is not generated from model assumptions
  • Every step is grounded in real application documentation
  • QA engineers can instantly verify the source

Why This Matters for QA Software Testing

In QA, accuracy and traceability are more important than creativity.

This screenshot proves that:

  • The private SLM does not hallucinate
  • Answers come strictly from approved documents
  • Every response can be audited back to the source PDF

If the information is not found, the system safely responds with:

“I don’t know based on the documents.”

This controlled behaviour is intentional and essential for building trust in AI-assisted QA workflows.

Key Takeaway

Advanced RAG is not just about retrieving documents — it’s about retrieving the right content, for the right question, with full traceability.

Query rewriting ensures precise retrieval, and source-level evidence ensures QA-grade reliability. Together, they transform a private SLM into a trusted, project-aware QA assistant.

8. What Types of Files and Resources Does This RAG System Support?

In real projects, knowledge is never stored in a single format.
Requirements, Designs, architectures, user guides, manuals, test cases, configurations, and data are scattered across multiple file types. A useful RAG system must be able to understand all of them, not just PDFs.

This RAG system is designed to index and reason over multiple relevant file formats, all from a single workspace.

Supported File Types in the RAG Workspace

As shown in the screenshot, the target_docs folder acts as the knowledge source for the RAG engine. It supports the following resource types:

📄 Text & Documentation Files

  • .txt – Test case descriptions, notes, and exploratory testing ideas
  • .pdf – Official requirement documents, user guides, specifications
  • .md – QA documentation and internal knowledge pages

These files are:

  • Cleaned
  • Chunked
  • Indexed into the vector database for semantic search

📊 Structured Test Data Files

  • .json – Configuration values, test inputs, environment data
  • .xlsx / .csv – Professional test data sheets, boundary values, scenarios

Structured files are especially important in QA because they represent real test inputs, not just documentation.

🖼 Images & Visual Assets

  • .png, .jpg (via OCR)
    • Screenshots
    • Error messages
    • UI states

Text inside images is extracted using OCR and indexed, allowing the SLM to answer questions based on visual evidence, not assumptions.

🎥 Videos (Optional but Supported)

  • Demo recordings
  • Product Walkthrough videos
  • KT session recordings

Videos are processed using:

  • Speech-to-text (audio transcription)
  • Optional OCR on video frames

This allows QA teams to query spoken explanations that never existed in written form.

Why This Matters for QA Teams

This multi-format support ensures that:

  • No QA knowledge is lost
  • Testers don’t need to rewrite documents for AI
  • The SLM learns from exactly what the team already uses

Instead of changing QA workflows, the RAG system adapts to existing QA artifacts.

Key Takeaway

A QA RAG system is only as good as the data it can understand. (Garbage->In->Garbage->Out)

By supporting documents, structured data, images, and videos, this RAG system becomes a true knowledge layer for QA, not just a document chatbot.

9. Why This Approach Scales Across QA Projects

One of the biggest mistakes teams make with AI in QA is building solutions that work for one project but collapse when reused for another. This RAG-based approach was intentionally designed to scale across multiple QA projects and different applications without rework.

No Application Specific Hardcoding

The RAG system does not hardcode:

  • Application names
  • Module flows
  • Test scenarios
  • Business rules

Instead, each team teaches the SLM through its own documents.
When a new project starts, the only action required is:

  • Add the applications’ QA artifacts to the target_docs folder
  • Rebuild the index

The same RAG engine and MCP tools continue to work without change.

Document-Driven Knowledge, Not Model Memory

Because all knowledge lives in documents:

  • No fine-tuning is required per application
  • No retraining cost
  • No risk of cross-application data leakage

Each application’s knowledge stays isolated at the document level, which is critical for:

  • Enterprise security
  • Compliance
  • Multi-application QA environments

MCP Makes the System Reusable Everywhere

Exposing RAG through MCP tools makes this system:

  • IDE-agnostic
  • SLM-agnostic
  • Workflow-independent

Whether QA teams use:

  • VS Code today
  • Another IDE tomorrow
  • Different private SLMs in the future

The same MCP contract remains valid.

This is what makes the solution future-proof.

Works for Different QA Maturity Levels

This approach scales naturally across teams:

  • Manual QA teams
    → Use it to understand requirements and flows
  • Automation QA teams
    → Generate scenarios, validations, and test logic
  • New joiners
    → Faster onboarding using project-specific answers
  • Senior QA / Leads
    → Analyse coverage, gaps, and test strategies

All without changing the system.

Minimal Maintenance, Maximum Reuse

When requirements change:

  • Update the document
  • Re-run indexing

That’s it.

There is no need to:

  • Rewrite prompts
  • Update AI logic
  • Touch model configurations

This makes the system low-maintenance and high-impact.

Key Takeaway

Scalable AI is not built by making the model smarter —
It’s built by making the knowledge portable.

By combining Document RAG, MCP, and private SLMs, this approach delivers an application-aware/Domain-aware QA assistant that scales effortlessly across projects, teams, and organizations.

Conclusion

Using AI in QA is not about choosing the most powerful SLM or LLM, for that matter. It’s about making the SLM understand the target application or target domain. A private SLM, by itself, does not know requirements, business flows, or test logic, which makes its answers generic and unsafe for real testing work.

This is where Document-based RAG becomes essential for RAG for QA Testing. By grounding the SLM in real application artifacts—BRD/PRD/SRS/requirements, Designs, Architectures, test cases, data files, and user guides— the AI is able to produce answers that are accurate, verifiable, and relevant to the project. Advanced capabilities like query rewriting and source traceability further ensure that every response is backed by documented evidence, eliminating hallucinations.

Exposing this intelligence through MCP tools makes the system transparent, reusable, and scalable across multiple projects and applications. The architecture stays the same; only the documents change. This keeps maintenance low while maximizing impact.

Final Thought

AI becomes truly useful in QA when it stops guessing and starts learning from real application knowledge.

By combining private SLMs with Document RAG and MCP, we can build AI-powered QA assistants that teams can trust, audit, and scale with confidence.

Click here to read more blogs like this.

5 Must-Have DevOps Tools for Test Automation in CI/CD  

5 Must-Have DevOps Tools for Test Automation in CI/CD  

DevOps tools for test automation – If you’re working in a real product team, you already know this uncomfortable truth: having automated tests is not the same as having a reliable release process. Many teams do everything “right” on paper—unit tests, API tests, even some end-to-end coverage—yet production releases still feel stressful. The pipeline goes green, but the deployment still breaks. Or the tests pass today and fail tomorrow for no clear reason. Over time, people stop trusting the automation, and the team quietly goes back to manual checking before every release.  

I’ve seen this happen more times than I’d like to admit, and the pattern is usually the same. The problem is not that teams aren’t writing tests. The real problem is that the system around the tests is weak: inconsistent environments, unstable dependencies, slow pipelines, poor reporting, and shared QA setups where multiple deployments collide. When those foundations are missing, test automation becomes “best effort” instead of a true safety net. 

That’s why DevOps tools for test automation matter so much. In a good CI/CD setup, tools don’t just run builds and deployments—they create a repeatable process where every code change is validated the same way, in controlled environments, with clear evidence of what happened. This is what makes automation trustworthy. And once engineers trust the pipeline, quality starts scaling naturally because testing becomes part of the workflow, not an extra task. 

In this blog, I’m focusing on five DevOps tools for test automation that consistently show up in strong test automation pipelines—not because they’re trending, but because each one solves a practical automation problem teams face at scale: 

  • Git (GitHub/GitLab/Bitbucket) for triggering automation and enforcing merge quality gates 
  • Jenkins for orchestrating pipelines, parallel execution, and test reporting 
  • Docker for eliminating environment drift and making test runs consistent everywhere 
  • Kubernetes for isolated, disposable environments and scalable test execution 
  • Terraform (Infrastructure as Code) for reproducible infrastructure and automation-ready environments 

I’ll keep this guide practical and implementation-focused. You’ll see what each tool contributes to automation, why it matters, and how teams use them together in real CI/CD workflows. DevOps tools for test automation

Now, before we go tool-by-tool, let’s define what “good” test automation actually looks like in a CI/CD pipeline. 

What “Test Automation” Really Means in CI/CD 

Before we jump into DevOps tools, it helps to define what “good” looks like. 

A solid test automation system in CI/CD typically has these characteristics: 

Every code change triggers tests automatically, tests run in consistent environments (same runtime, same dependencies, same configuration), feedback is fast enough to influence decisions (engineers shouldn’t wait forever), failures are actionable (clear reports, logs, and artifacts), environments are isolated (no conflicts between branches or teams), and the process is repeatable (you can rerun the same pipeline and get predictable behaviour). 

Most teams struggle not because they can’t write tests, but because they can’t keep test execution stable at scale. The five DevOps tools for test automation in ci/cd below solve that problem from different angles. 

DevOps tools for Test Automation

DevOps tools for test automation in CI/CD

Tool 1: Git (GitHub/GitLab/Bitbucket) – The Control Centre for Automation 

Git is usually introduced as version control, but in CI/CD it becomes something much bigger: it becomes the system that governs automation. 

In a mature setup, Git is where automation is triggered, enforced, and audited. 

Why Git is essential for test automation 

  • Git turns changes into events (and events trigger automation) 
    A strong pipeline isn’t dependent on someone remembering to run tests. Git events automatically drive the workflow: Push to a feature branch triggers lint and unit tests, opening a pull request triggers deeper automated checks, merging to main triggers deployment to staging and post deploy tests, and tagging a release triggers production deployment and smoke tests.  That event-driven model is the heart of CI/CD test automation. 
  • Git enforces quality gates through branch protections 
    This is one of the most overlooked “automation” features because it doesn’t look like testing at first. When branch protection rules require specific checks to pass, test automation becomes non-negotiable: required CI checks (unit tests, build, API smoke), required reviews, and blocked merge when pipeline fails.
    Without those rules, automation becomes optional. Optional automation gets skipped under pressure. Skipped automation eventually becomes unused automation. 
  • Git version-controls everything that affects test reliability
    Stable automation means versioning more than application code: the automated tests themselves, pipeline definitions (Jenkinsfile), Dockerfiles and container configs, Kubernetes manifests / Helm charts, Terraform infrastructure code, and test data and seeding scripts (where applicable). When all of this lives in Git, you can reproduce outcomes. That reproducibility is one of the biggest drivers of trust in automation. 

Practical example: A pull request workflow that makes automation enforceable 

Here’s a pattern that works well in real teams: 

Branch structure: main – protected, always deployable; feature/* – developer work branches; optional: release/* – release candidates. 

Pull request checks: linting, unit tests, build (to ensure code compiles / packages), API tests (fast integration validation), and E2E smoke tests (small, targeted, high signal). 

Protection rules: PR cannot merge unless required checks pass, disallow direct pushes to main, and require at least one reviewer. This turns automation into a daily habit. It also forces early failure detection: bugs are caught at PR time, not after a merge. 

Practical example: Using Git to control test scope (a realistic performance win) 

Not every test should run on every change. Git can help you control test selection in a clean, auditable way. Common approaches: run full unit tests on every PR, run a small set of E2E smoke tests on every PR, and run full regression E2E nightly or on demand. A practical technique is to use PR labels or commit tags to control pipeline behavior: 

label: run-e2e-full triggers full E2E suite, default PR triggers only E2E smoke, and nightly pipeline triggers full regression. 

This keeps pipelines fast while still maintaining coverage. 

Tool 2: Jenkins – The Orchestrator That Makes Tests Repeatable 

Once Git triggers automation, you need something to orchestrate the steps, manage dependencies, and publish results. Jenkins is still widely used for this because it’s flexible, integrates with almost everything, and supports “pipeline as code.” 
For test automation, Jenkins is important because it transforms a collection of scripts into a controlled, repeatable process. 

Why Jenkins is essential for test automation 

  • Jenkins makes test execution consistent and repeatable 
    A Jenkins pipeline defines what runs, in what order, with what environment variables, on what agents, and with what reports and artifacts. That consistency is the difference between “tests exist” and “tests protect releases.”
  • Jenkins supports staged testing (fast checks first, deeper checks later) 
    A well-designed CI/CD pipeline is layered:
    Stage 1: lint + unit tests (fast feedback), Stage 2: build artifact / image, Stage 3: integration/API tests, Stage 4: E2E smoke tests, and Stage 5: optional full regression (nightly or on-demand).
    Jenkins makes it easy to encode this strategy so it runs the same way every time.
  • Jenkins enables parallel execution 
    As test suites grow, total runtime becomes the biggest pipeline bottleneck. Jenkins can parallelize: Jenkins can parallelize lint and unit tests, API tests and UI tests, and sharded E2E jobs (multiple runners). Parallelization is a major reason DevOps tooling is critical for automation: without it, automation becomes too slow to be practical. 
  • Jenkins publishes actionable test outputs 
    Good automation isn’t just “pass/fail.” Jenkins can publish JUnit reports, HTML reports (Allure / Playwright / Cypress), screenshots and videos from failed UI tests, logs and artifacts, and build metadata (commit SHA, image tag, environment). This visibility reduces debugging time and increases trust in the pipeline. 

Practical Jenkins example: A pipeline structure used in real CI/CD automation 

Below is a Jenkins file that demonstrates a practical structure: 

  • Fast checks first 
  • Build Docker image 
  • Deploy to Kubernetes namespace (ephemeral environment) 
  • Run API and E2E tests in parallel 
  • Archive reports 
  • Cleanup 

You can adapt the commands to your stack (Maven/Gradle, pytest, npm, etc.). 

pipeline { 
    agent any 
    environment { 
        APP_NAME        = "demo-app" 
        DOCKER_REGISTRY = "registry.example.com" 
        IMAGE_TAG       = "${env.BUILD_NUMBER}" 
        NAMESPACE       = "pr-${env.CHANGE_ID ?: 'local'}" 
    } 
    options { 
        timestamps() 
    } 
    stages { 
        stage("Checkout") { 
            steps { checkout scm } 
        } 
        stage("Install & Build") { 
            steps { 
                sh "npm ci" 
                sh "npm run build" 
            } 
        } 
        stage("Fast Feedback") { 
            parallel { 
                stage("Lint") { 
                    steps { sh "npm run lint" } 
                } 
                stage("Unit Tests") { 
                    steps { sh "npm test -- --ci --reporters=jest-junit" } 
                    post { always { junit "test-results/unit/*.xml" } } 
                } 
            } 
        } 
        stage("Build & Push Docker Image") { 
            steps { 
                sh """ 
      docker build -t ${DOCKER_REGISTRY}/${APP_NAME}:${IMAGE_TAG} . 
      docker push ${DOCKER_REGISTRY}/${APP_NAME}:${IMAGE_TAG} 
    """ 
            } 
        } 
        stage("Deploy to Kubernetes (Ephemeral)") { 
            steps { 
                sh """ 
      kubectl create namespace ${NAMESPACE} || true 
      kubectl -n ${NAMESPACE} apply -f k8s/ 
      kubectl -n ${NAMESPACE} set image deployment/${APP_NAME} ${APP_NAME}=${DOCKER_REGISTRY}/${APP_NAME}:${IMAGE_TAG} 
      kubectl -n ${NAMESPACE} rollout status deployment/${APP_NAME} --timeout=180s 
    """ 
            } 
        }
        stage("Automation Tests") { 
            parallel { 
                stage("API Tests") { 
                    steps { 
                        sh """ 
          export BASE_URL=http://${APP_NAME}.${NAMESPACE}.svc.cluster.local:8080 
          npm run test:api 
        """ 
                    } 
                    post { always { junit "test-results/api/*.xml" } } 
                } 
                stage("E2E Smoke") { 
                    steps { 
                        sh """ 
          export BASE_URL=https://${APP_NAME}.${NAMESPACE}.example.com 
          npm run test:e2e:smoke 
        """ 
                    } 
                    post { 
                        always { 
                            archiveArtifacts artifacts: "e2e-report/**", allowEmptyArchive: true 
                        } 
                    } 
                } 
            } 
        } 
    } 
    post { 
        always { 
            sh "kubectl delete namespace ${NAMESPACE} --ignore-not-found=true" 
        } 
    } 
} 

This pipeline basically handles everything that should happen when someone opens or updates a pull request. First, it pulls the latest code, installs the dependencies, and builds the application. Then it quickly runs lint checks and unit tests in parallel so small mistakes are caught early instead of later in the process. 

If those basic checks pass, the pipeline creates a Docker image of the app and pushes it to the registry. That same image is then deployed into a temporary Kubernetes namespace created just for that PR. This keeps every pull request isolated from others and avoids environment conflicts. 

Once the app is running in that temporary environment, the pipeline runs API tests and E2E smoke tests against it. The results, reports, and any failure artifacts are

saved so the team can easily understand what went wrong. In the end, whether tests pass or fail, the temporary namespace is deleted to keep the cluster clean and disposable.

Why this Jenkins setup improves automation 

This pipeline is automation-friendly because it fails fast on lint and unit issues, builds a deployable artifact before running environment-dependent tests, isolates test environments per PR (namespace isolation), runs API and UI tests in parallel (better pipeline time), stores test reports and artifacts for debugging, and cleans up environments automatically (important for cost and cluster hygiene). 

Tool 3: Docker – The Foundation for Consistent, Portable Test Environments 

If Jenkins is the orchestrator, Docker is the stabilizer. Docker solves a major cause of unreliable automation: environment differences. A large percentage of pipeline failures happen because of different runtime versions (Node/Java/Python), different OS packages, missing dependencies, browser/driver mismatches for UI automation, and inconsistent configuration between local and CI. 

Docker reduces that variability by packaging the environment with the app or tests. 

Why Docker is essential for automation 

  • Docker eliminates “works on my machine” failures 
    When tests run inside a container, they run with consistent runtime versions, pinned dependencies, and predictable OS environment. This makes results repeatable across laptops, CI agents, and cloud runners. 
  • Docker makes test runners portable
    Instead of preparing every Jenkins agent with test dependencies, you run a container that already contains them. This reduces setup time and avoids agent drift over months. 
  • Docker enables clean integration test stacks 
    Integration tests often need services: database (PostgreSQL/MySQL), cache (Redis), message broker (RabbitMQ/Kafka), and local dependencies or mock services. Docker Compose can spin these up consistently, making integration tests practical and reproducible. 
  • Docker supports parallel and isolated execution 
    Containers isolate processes. That isolation helps when running multiple test jobs simultaneously without cross-interference. 

Practical Docker example A: Running UI tests in a container (Playwright)

UI test reliability often depends on browser versions and system libraries. A container gives you control. 

Dockerfile for Playwright tests written in JS/TS 

FROM mcr.microsoft.com/playwright:v1.46.0-jammy 
 
WORKDIR /tests 
COPY package.json package-lock.json ./ 
RUN npm ci 
 
COPY . . 
CMD ["npm", "run", "test:e2e"] 

This Dockerfile is basically packaging our entire E2E test setup into a container. Instead of installing browsers and fixing environment issues every time, we simply start from Playwright’s official image, which already has everything preconfigured. 

We set a folder inside the container, install the project dependencies using npm ci (so it’s always a clean install), and then copy our test code into it. 

When the container runs, it directly starts the E2E tests. 

What this really means is that our tests don’t depend on someone’s local setup anymore. Whether they run on a laptop or in CI, the environment stays the same — and that removes a lot of random, environment-related failures. 

Run CI

docker build -t e2e-tests:ci . 
docker run --rm -e BASE_URL="https://staging.example.com" e2e-tests:ci 

The first command builds a Docker image named e2e-tests:ci from the Dockerfile in the current directory. That image now contains the Playwright setup, the test code, and all required dependencies bundled together. 

The second command actually runs the tests inside that container. We pass the BASE_URL so the tests know which deployed environment they should hit — in this case, staging. The –rm flag simply cleans up the container after the run so nothing is left behind. 

Basically, we’re packaging our test setup once and then using it to test any environment we want, without reinstalling or reconfiguring things every time. 

In a real pipeline, you typically add an output folder mounted as a volume (to extract reports), retry logic only for known transient conditions, and trace/video capture on failure. 

Practical Docker example B: Integration tests with Docker Compose (app + database + tests) 

This is a pattern I’ve used often because it gives developers a “CI-like” environment locally. 

docker-compose.yml 

version: "3.8" 
services: 
app: 
build: . 
ports: 
        - "8080:8080" 
environment: 
DB_HOST: db 
DB_NAME: demo 
DB_USER: postgres 
DB_PASS: password 
depends_on: 
        - db 
 
db: 
image: postgres:16 
environment: 
POSTGRES_PASSWORD: password 
POSTGRES_DB: demo 
ports: 
        - "5432:5432" 
 
tests: 
build: ./tests 
depends_on: 
        - app 
environment: 
BASE_URL: http://app:8080 
command: ["npm", "run", "test:integration"] 

This docker-compose file brings up three things together: the app, a PostgreSQL database, and the integration tests. Instead of relying on some shared QA environment, everything runs locally inside containers. 

The db service starts a Postgres container with a demo database. The app service builds your application and connects to that database using dB as the hostname (Docker handles the networking automatically). 

Then the tests service builds the test container and runs the integration test command against http://app:8080. The depends on ensures things start in the right order — database first, then app, then tests. 

What this really gives you is a repeatable setup. Every time you run it, the app and database start from scratch, the tests execute, and you’re not depending on some shared environment that might already be in a weird state. 

Run

docker compose up --build --exit-code-from tests 

Why this matters for automation: Every run starts from a clean stack, test dependencies are explicit and versioned, failures are reproducible both locally and in CI, and integration tests stop depending on shared environments. 

Practical Docker example C: Using multi-stage builds for cleaner deployment and more reliable tests 

A multi-stage Dockerfile helps keep runtime images minimal and ensures builds are reproducible. 

# Build stage 
    FROM node:20-alpine AS builder 
    WORKDIR /app 
    COPY package*.json ./ 
    RUN npm ci 
    COPY . . 
    RUN npm run build 
 
# Runtime stage 
    FROM node:20-alpine 
    WORKDIR /app 
    COPY --from=builder /app/dist ./dist 
    COPY package*.json ./ 
    RUN npm ci --omit=dev 
    CMD ["node", "dist/server.js"] 

This is a multi-stage Docker build, which basically means we use one container to build the app and another, smaller one to run it. 

In the first stage (builder), we install all dependencies and run the build command to generate the production-ready files. This stage includes development dependencies because they’re needed to compile the application. 

In the second stage, we start fresh with a clean Node image and copy only the built output (dist) from the first stage. Then we install only production dependencies using npm ci –omit=dev. Finally, the container starts the app with node dist/server.js. 

The main benefit of this approach is that the final image is smaller, cleaner, and more secure since it doesn’t include unnecessary build tools or dev dependencies. 

This reduces surprises in automation by keeping build and runtime steps consistent and predictable. 

Tool 4: Kubernetes – Isolated, Disposable Environments for Real Integration and E2E Testing 

Docker stabilizes execution. Kubernetes stabilizes environments at scale. 

Kubernetes becomes essential when multiple teams deploy frequently, you have microservices, integration environments are shared and constantly overwritten, you need preview environments per PR, and you want parallel E2E execution without resource conflicts.  For test automation, Kubernetes matters because it provides isolation and repeatability for environment-dependent tests. 

Why Kubernetes is important for automation 

  • Namespace isolation prevents test collisions 
    A common problem: one QA environment, multiple branches, constant overwrites.  With Kubernetes, each PR can get its own namespace: Deploy the app stack into pr-245, run tests against pr-245, and delete the namespace afterward.  This prevents one PR deployment from breaking another PR’s test run. 
  • Kubernetes enables realistic tests against real deployments 
    E2E tests are most valuable when they run against something that looks like production: 
    • Deployed services, real networking, real service discovery, and real configuration and secrets injection. 
    • Kubernetes makes it practical to run those tests automatically without manually maintaining long-lived environments. 
  • Parallel test execution becomes infrastructure-driven 
    Instead of running all E2E tests on one runner, Kubernetes can run multiple test pods at once. This matters because: E2E tests are usually slower, pipelines must remain fast enough for engineers, and scaling test runs is often the only sustainable solution. 
  • Failures become easier to debug 
    When a test fails, you can: Collect logs from the specific namespace, inspect the deployed resources, re-run the pipeline with the same manifest versions, and avoid “someone changed the shared environment” confusion. 

Practical Kubernetes example A: Running E2E tests as a Kubernetes Job 

A clean pattern: 

  1. Deploy app 
  2. Run tests as a Job 
  3. Read logs and reports 
  4. Clean up namespace 

e2e-job.yaml 

apiVersion: batch/v1 
kind: Job 
metadata: 
name: e2e-tests 
spec: 
backoffLimit: 0 
template: 
spec: 
restartPolicy: Never 
containers: 
        - name: e2e 
image: registry.example.com/e2e-tests:ci 
env: 
        - name: BASE_URL 
value: "https://demo-app.pr-245.example.com" 

This Kubernetes manifest defines a one-time Job that runs our E2E tests inside the cluster. Instead of running tests from outside, we execute them as a container directly in Kubernetes. 

The Job uses the e2e-tests:ci image that we previously built and pushed to the registry. It passes a BASE_URL so the tests know which deployed environment they should target — in this case, the PR-specific URL. 

restart Policy: Never and back off Limit: 0 mean that if the tests fail, Kubernetes won’t keep retrying them automatically. It runs once and reports the result. 

In simple terms, this lets us trigger automated tests inside the same environment where the application is deployed, making the test run closer to real production behaviour. 

CI commands 

kubectl -n pr-245 apply -f e2e-job.yaml 
kubectl -n pr-245 wait --for=condition=complete job/e2e-tests --timeout=15m 
kubectl -n pr-245 logs job/e2e-tests 

These commands are used to run and monitor the E2E test job inside a specific Kubernetes namespace (pr-245). 

The first command applies the e2e-job.yaml file, which creates the Job and starts the test container. The second command waits until the job finishes (or until 15 minutes pass), so the pipeline doesn’t move forward while tests are still running. 

The last command fetches the logs from the test job, which allows us to see the test output directly in the CI logs. 

These commands create the E2E job in the PR namespace, wait for it to finish, and then fetch the logs so the CI pipeline can display the test results. 

This pattern keeps test execution close to the environment where the app runs, which often improves reliability and debugging. 

Practical Kubernetes example B: Readiness checks that reduce false E2E failures 

A common cause of flaky E2E runs is that tests start before services are ready. Kubernetes readiness probes help. 

Example snippet in a Deployment: 

readinessProbe: 
httpGet: 
path: /health 
port: 8080 
initialDelaySeconds: 10 
periodSeconds: 5 

This configuration adds a readiness probe to the application container in Kubernetes. It tells Kubernetes how to check whether the application is actually ready to receive traffic. 

Kubernetes will call the /health endpoint on port 8080. After waiting 10 seconds (initial Delay Seconds), it checks every 5 seconds (period Seconds). If the health check passes, the pod is marked as “ready” and can start receiving requests. 

When your pipeline waits for rollout status, it becomes far less likely that E2E tests fail due to startup timing issues. 

Practical Kubernetes example C: Sharding E2E tests across multiple Jobs 

If you have 300 E2E tests, running them on one pod may take too long. Sharding splits the suite across multiple pods. 

Concept: 

  • Total shards: 6 
  • Each shard runs in its own Job with environment variables 

Example environment variables: 

  • SHARD_INDEX=1..6 
  • SHARD_TOTAL=6 

Each job runs only a subset of tests. Your test runner must support sharding (many do, directly or via custom logic), but Kubernetes provides the execution layer. 

This is one of the biggest performance wins for automation at scale. 

Tool 5: Terraform (Infrastructure as Code) – Reproducible Test Infrastructure Without Manual Work 

If Kubernetes is where the application lives during testing, Terraform is often what creates the infrastructure that testing depends on. 

Terraform matters because real automation needs reproducible infrastructure. Manual environments drift. Drift breaks tests. Terraform allows you to define and version infrastructure such as networking (VPCs, subnets, security groups), databases and caches, Kubernetes clusters, IAM roles and permissions, and load balancers and storage. 

Why Terraform is essential for automation 

  • Terraform makes environments reproducible 
    When infrastructure is code, your environment isn’t tribal knowledge. It’s documented, versioned, and repeatable. That repeatability improves test reliability, because your tests stop depending on “whatever state the environment is in today.” 
  • Terraform enables ephemeral environments (and reduces long-term drift) 
    Permanent shared environments slowly accumulate manual changes: 
    • Ad-hoc configuration updates, quick fixes, outdated dependencies, and unknown drift over time. 
    •  Ephemeral environments built via Terraform start clean, run tests, and get destroyed. That model dramatically reduces environment-related flakiness. 
  • Terraform makes environment parity achievable 
    A test environment that resembles production catches issues earlier. Terraform supports consistent provisioning across dev, staging, and prod—often using the same modules with different variables. 
  • Terraform integrates cleanly with pipelines 
    Terraform outputs can feed directly into automation: database endpoint, service URL, credentials location (not the secret itself, but the reference), and resource identifiers. 

Practical Terraform example A: Outputs feeding automated tests 

Outputs.tf 

output "db_endpoint" { 
    value = aws_db_instance.demo.address 
} 
 
output "db_port" { 
    value = aws_db_instance.demo.port 
} 

These are Terraform output values. After Terraform creates the database, it exposes the database endpoint (address) and port as outputs. 

This makes it easy for the CI pipeline to read those values and pass them to the application or test scripts as environment variables. Instead of manually copying connection details, the pipeline can automatically fetch them using terraform output. 

CI usage

terraform init 
terraform apply -auto-approve 
 
        DB_ENDPOINT=$(terraform output -raw db_endpoint) 
DB_PORT=$(terraform output -raw db_port) 
 
export DB_ENDPOINT DB_PORT 
npm run test:integration 

These commands show how infrastructure provisioning and test execution are connected in the pipeline. 

First, terraform init initializes Terraform, and terraform apply -auto-approve creates the required infrastructure (like the database) without waiting for manual approval. 

After the infrastructure is created, the script reads the database endpoint and port using terraform output -raw and stores them in environment variables. Those variables are then exported so the integration tests can use them to connect to the newly created database. 

This way, the tests automatically run against fresh infrastructure created during the same pipeline run. 
This bridges infrastructure provisioning and test execution in an automated, repeatable way. 

Practical Terraform example B: Using workspaces (or unique naming) for PR environments 

A common approach is: One workspace per PR (or unique naming per PR), apply infrastructure for that PR, and destroy when pipeline completes. 

Example commands: 

  terraform workspace new pr-245 || terraform workspace select pr-245 
    terraform apply -auto-approve 
# run tests 
    terraform destroy -auto-approve 

These commands create an isolated Terraform workspace for a specific pull request (in this case, pr-245). If the workspace doesn’t exist, it’s created; if it already exists, it’s selected. 

Then terraform apply provisions the infrastructure just for that workspace — meaning this PR gets its own separate resources. After the tests are executed, terraform destroy removes everything that was created. 

This approach ensures that each PR gets its own temporary infrastructure and nothing is left behind once testing is complete. 

This approach prevents resource collisions and makes automation more scalable. 

Practical Terraform example C: Cleanup as a first-class pipeline requirement 

One of the most important operational rules: cleanup must run even when tests fail. 

In Jenkins, cleanup usually belongs in post { always { … } }. The same principle applies to Terraform: do not destroy only on success, or you will accumulate environments, costs, and complexity. 

Putting All 5 DevOps Tools for Test Automation Together: A Realistic “PR to Verified” Pipeline Flow 

DevOps Toosl for Test Automation in CI/CD

When these DevOps tools for test automation work together, test automation becomes a system, not a set of scripts. 
Here’s a practical flow that I’ve used (with minor variations) across multiple projects. 

Reference repository structure (simple but scalable) 

├─ app/                     # application code 
├─ tests/ 
│  ├─ unit/ 
│  ├─ api/ 
│  └─ e2e/ 
├─ k8s/                     # manifests (or Helm charts) 
├─ infra/ 
│  └─ terraform/            # IaC 
└─ Jenkinsfile 

Pipeline flow (PR) 

  1. Developer opens a PR (Git event) 
  2. Jenkins triggers automatically 
  3. Jenkins runs fast checks: 
    • lint 
    • unit tests 
  4. Jenkins builds Docker images: 
    • app image 
    • e2e test runner image 
  5. Terraform provisions required infrastructure (if needed): 
    • database for the PR environment 
    • any required cloud dependencies 
  6. Kubernetes creates an isolated namespace for the PR 
  7. Jenkins deploys the app to that namespace 
  8. Jenkins runs automated tests against that environment: 
    • API tests 
    • E2E smoke tests 
    • optional: full E2E sharded (nightly or on-demand) 
  9. Jenkins publishes reports and artifacts 
  10. Jenkins cleans up: 
    • deletes namespace 
    • destroys Terraform resources 

Why this combination is so effective for automation 

Each DevOps tool for test automation contributes something specific to reliability: Git ensures automation is part of the workflow and enforceable via checks, Jenkins makes execution repeatable and visible with staged pipelines and reporting, Docker keeps test execution consistent everywhere, Kubernetes isolates environments and supports scaling and shading, and Terraform makes infrastructure reproducible and disposable. 

This is exactly why DevOps tools are not “nice to have” for automation. They solve the problems that make automation fail in real life. 

Operational Practices That Make This Setup “Production Grade” 

DevOps Tools alone won’t give you great automation. The practices around them matter just as much.

1) Layer your tests to keep PR feedback fast 

A practical strategy: 

  • On every PR: 
    • lint 
    • unit tests 
    • API smoke tests 
    • E2E smoke tests (limited, high signal) 
  • Nightly: 
    • full E2E regression 
    • broader integration suite 
  • Before release: 
    • full regression 
    • performance checks (if applicable) 
    • security scans (if required by policy) 

This keeps day-to-day work fast while still maintaining strong coverage. 

2) Treat flaky tests as defects, not background noise 

Flaky tests destroy pipeline trust. 
Common fixes include: Stabilizing test data and teardown, waiting on readiness properly (not fixed sleeps), using stable selectors for UI tests, isolating environments (namespaces / disposable DBs), and limiting shared state across tests.  A good pipeline is one engineers rely on. Flaky pipelines get ignored. 

3) Make test results actionable 

At minimum, your pipeline should provide: Which test failed, logs from the failing step, screenshots/videos for UI failures, a link to a report artifact, and build metadata (commit, image tag, environment/namespace).  The goal is to reduce “time to understand failure,” not just detect it. 

4) Keep secrets out of code and images 

Avoid hardcoding secrets in Jenkinsfile, Docker images, Git repositories, and Kubernetes manifests. 
Use a proper secret strategy (Kubernetes secrets, cloud secret manager, Vault). Inject secrets at runtime. 

5) Use consistent naming conventions across tools 

This sounds small, but it helps with debugging a lot. 
Example: Namespace: pr-245, Docker tag: build-9812, and Terraform workspace: pr-245. 
When names align, it’s easier to trace failures across Jenkins logs, Kubernetes resources, and cloud infrastructure. 

Conclusion: The Five Tools That Make Test Automation Trustworthy 

DevOps tools for test automation in CI/CD. Reliable test automation is not about having the largest test suite. It’s about having a system that runs tests consistently, quickly, and automatically—without manual intervention and without environment chaos. 

These five DevOps tools for test automtion are essential because each one solves a practical automation problem: 

  • Git makes automation enforceable through triggers and quality gates 
  • Jenkins makes automation repeatable, staged, parallelizable, and reportable 
  • Docker makes test execution consistent across machines and environments 
  • Kubernetes enables isolated environments and scalable parallel test execution 
  • Terraform makes infrastructure reproducible, reviewable, and automatable 

When you combine them, you don’t just run tests—you operate a quality pipeline that protects every merge and every release. 

DevOps tools for test automation
GitHub – https://github.com/spurqlabs/5-Must-Have-DevOps-Tools-for-Test-Automation/

Click here to read more blogs like this.

Boosting Web Performance: Integrating Google Lighthouse with Automation Frameworks

Boosting Web Performance: Integrating Google Lighthouse with Automation Frameworks

The Silent Killer of User Experience

Integrating Google Lighthouse with Playwright; Picture this: Your development team just shipped a major feature update. The code passed all functional tests. QA signed off. Everything looks perfect in staging. You hit deploy with confidence.

Then the complaints start rolling in.

“The page takes forever to load.” “Images are broken on mobile.” “My browser is lagging.”

Sound familiar? According to Google, 53% of mobile users abandon sites that take longer than 3 seconds to load. Yet most teams only discover performance issues after they’ve reached production, when the damage to user experience and brand reputation is already done.

The real problem isn’t that teams don’t care about performance. It’s that performance testing is often manual, inconsistent, and disconnected from the development workflow. Performance degradation is gradual. It sneaks up on you. And by the time you notice, you’re playing catch-up instead of staying ahead.

The Gap Between Awareness and Action

Most engineering teams know they should monitor web performance. They’ve heard about Core Web Vitals, Time to Interactive, and First Contentful Paint. They understand that performance impacts SEO rankings, conversion rates, and user satisfaction.

But knowing and doing are two different things.

The challenge lies in making performance testing continuous, automated, and actionable. Manual audits are time-consuming and prone to human error. They create bottlenecks in the release pipeline. What teams need is a way to bake performance testing directly into their automation frameworks to treat performance as a first-class citizen alongside functional testing.

Integrating Google Lighthouse with Playwright

Enter Google Lighthouse.

What Is Google Lighthouse?

Google Lighthouse is an open-source, automated tool designed to improve the quality of web pages. Originally developed by Google’s Chrome team, Lighthouse has become the industry standard for web performance auditing by Integrating Google Lighthouse with Playwright.

But here’s what makes Lighthouse truly powerful: it doesn’t just measure performance it provides actionable insights.

When you run a Lighthouse audit, you get comprehensive scores across five key categories:

  • Performance: Load times, rendering metrics, and resource optimization
  • Accessibility: ARIA attributes, color contrast, semantic HTML
  • Best Practices: Security, modern web standards, browser compatibility
  • SEO: Meta tags, mobile-friendliness, structured data
  • Progressive Web App: Service workers, offline functionality, installability

Each category receives a score from 0 to 100, with detailed breakdowns of what’s working and what needs improvement. The tool analyzes critical metrics like:

  • First Contentful Paint (FCP): When the first content renders
  • Largest Contentful Paint (LCP): When the main content is visible
  • Total Blocking Time (TBT): How long the page is unresponsive
  • Cumulative Layout Shift (CLS): Visual stability during load
  • Speed Index: How quickly content is visually populated

These metrics align directly with Google’s Core Web Vitals the signals that impact search rankings and user experience.

Why Performance Can’t Be an Afterthought

Let’s talk numbers, because performance isn’t just a technical concern it’s a business imperative.

Amazon found that every 100ms of latency cost them 1% in sales. Pinterest increased sign-ups by 15% after reducing perceived wait time by 40%. The BBC discovered they lost an additional 10% of users for every extra second their site took to load.

The data is clear: performance directly impacts your bottom line.

But beyond revenue, there’s the SEO factor. Since 2021, Google has used Core Web Vitals as ranking signals. Sites with poor performance scores get pushed down in search results. You could have the most comprehensive content in your niche, but if your LCP is above 4 seconds, you’re losing visibility.

The question isn’t whether performance matters. The question is: how do you ensure performance doesn’t degrade as your application evolves?

The Power of Integration: Lighthouse Meets Automation

This is where the magic happens when you integrate Google Lighthouse into your automation frameworks.

By Integrating Google Lighthouse with Playwright, Selenium, or Cypress, you transform performance from a periodic manual check into a continuous, automated quality gate.

Here’s what this integration delivers:

1. Consistency Across Environments

Automated Lighthouse tests run in controlled environments with consistent configurations, giving you reliable, comparable data across test runs.

2. Early Detection of Performance Regressions

Instead of discovering performance issues in production, you catch them during development. A developer adds a large unoptimized image? The Lighthouse test fails before the code merges.

3. Performance Budgets and Thresholds

You can set specific performance budgets for example, “Performance score must be above 90.” If a change violates these budgets, the build fails, just like a failing functional test.

4. Comprehensive Reporting

Lighthouse generates detailed HTML and JSON reports with visual breakdowns, diagnostic information, and specific recommendations. These reports become part of your test artifacts.

Google Lighthouse with Playwright

How Integration Works: A High-Level Flow

You don’t need to be a performance expert to integrate Lighthouse into your automation framework. The process is straightforward and fits naturally into existing testing workflows.

Step 1: Install Lighthouse Lighthouse is available as an npm package, making it easy to add to any Node.js-based automation project. It integrates seamlessly with popular frameworks.

Step 2: Configure Your Audits Define what you want to test which pages, which metrics, and what thresholds constitute a pass or fail. You can customize Lighthouse to focus on specific categories or run full audits across all five areas.

Step 3: Integrate with Your Test Suite Add Lighthouse audits to your existing test files. Your automation framework handles navigation and setup, then hands off to Lighthouse for the performance audit. The results come back as structured data you can assert against.

Step 4: Set Performance Budgets Define acceptable thresholds for key metrics. These become your quality gates if performance drops below the threshold, the test fails and the pipeline stops.

Step 5: Generate and Store Reports Configure Lighthouse to generate HTML and JSON reports. Store these as test artifacts in your CI/CD system, making them accessible for review and historical analysis.

Step 6: Integrate with CI/CD Run Lighthouse tests as part of your continuous integration pipeline. Every pull request, every deployment performance gets validated automatically.

The beauty of this approach is that it requires minimal changes to your existing workflow. You’re not replacing your automation framework you’re enhancing it with performance capabilities.

Practical Implementation: Code Examples

Let’s look at how this works in practice with a real Playwright automation framework. Here’s how you can create a reusable Lighthouse runner:

Creating the Lighthouse Runner Utility

async function runLighthouse(url, thresholds = { 
  performance: 50, 
  accessibility: 90, 
  seo: 40, 
  bestPractices: 45 
}) {
  const playwright = await import('playwright');
  const lighthouse = await import('lighthouse');
  const fs = await import('fs');
  const path = await import('path');
  const assert = (await import('assert')).default;

  // Launch browser with debugging port for Lighthouse
  const browser = await playwright.chromium.launch({
    headless: true,
    args: ['--remote-debugging-port=9222']
  });

  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto(url);

  // Configure Lighthouse options
  const options = {
    logLevel: 'info',
    output: 'html',
    onlyCategories: ['performance', 'accessibility', 'seo', 'best-practices'],
    port: 9222,
    preset: 'desktop'
  };

  try {
    // Run Lighthouse audit
    const runnerResult = await lighthouse.default(url, options);
    const report = runnerResult.report;
    
    // Save reports
    const reportFolder = path.resolve(__dirname, '../lighthouse-reports');
    if (!fs.existsSync(reportFolder)) fs.mkdirSync(reportFolder);

    const reportFilename = path.join(reportFolder, `lighthouse-report-${Date.now()}.html`);
    const jsonReportFilename = path.join(reportFolder, `lighthouse-report-${Date.now()}.json`);
    
    fs.writeFileSync(reportFilename, report);
    fs.writeFileSync(jsonReportFilename, JSON.stringify(runnerResult, null, 2));
    
    await browser.close();

    // Extract scores
    const lhr = runnerResult.lhr;
    const performanceScore = lhr.categories.performance.score * 100;
    const accessibilityScore = lhr.categories.accessibility.score * 100;
    const seoScore = lhr.categories.seo.score * 100;
    const bestPracticesScore = lhr.categories['best-practices'].score * 100;

    console.log(`Performance Score: ${performanceScore}`);
    console.log(`Accessibility Score: ${accessibilityScore}`);
    console.log(`SEO Score: ${seoScore}`);
    console.log(`Best Practices Score: ${bestPracticesScore}`);

    // Assert against thresholds
    assert(performanceScore >= thresholds.performance, 
      `Performance score is too low: ${performanceScore}`);
    assert(accessibilityScore >= thresholds.accessibility, 
      `Accessibility score is too low: ${accessibilityScore}`);
    assert(seoScore >= thresholds.seo, 
      `SEO score is too low: ${seoScore}`);
    assert(bestPracticesScore >= thresholds.bestPractices, 
      `Best Practices score is too low: ${bestPracticesScore}`);

    console.log("All assertions passed!");
    return lhr;
    
  } catch (error) {
    console.error(`Lighthouse audit failed: ${error.message}`);
    await browser.close();
    throw error;
  }
}

module.exports = { runLighthouse };

Integrating with Your Page Objects

const { runLighthouse } = require("../Utility/lighthouseRunner");

class LighthousePage {
  async visitWebPage() {
    await global.newPage.goto(process.env.WEBURL, { timeout: 30000 });
  }
  
  async initiateLighthouseAudit() {
    await runLighthouse(await global.newPage.url());
  }
}

module.exports = LighthousePage;

BDD Test Scenario with Cucumber

Feature: Integrating Google Lighthouse with the Test Automation Framework

  This feature leverages Google Lighthouse to evaluate the performance, 
  accessibility, SEO, and best practices of web pages.

  @test
  Scenario: Validate the Lighthouse Performance Score for the Playwright Official Page
    Given I navigate to the Playwright official website
    When I initiate the Lighthouse audit
    And I click on the "Get started" button
    And I wait for the Lighthouse report to be generated
    Then I generate the Lighthouse report

Decoding Lighthouse Reports: What the Data Tells You

Lighthouse reports are information-rich, but they’re designed to be actionable, not overwhelming. Let’s break down what you get:

The Performance Score

This is your headline number a weighted average of key performance metrics. A score of 90-100 is excellent, 50-89 needs improvement, and below 50 requires immediate attention.

Metric Breakdown

Each performance metric gets its own score and timing. You’ll see exactly how long FCP, LCP, TBT, CLS, and Speed Index took, color-coded to show if they’re in the green, orange, or red zone.

Opportunities

This section is gold. Lighthouse identifies specific optimizations that would improve performance, ranked by potential impact. “Eliminate render-blocking resources” might save 2.5 seconds. “Properly size images” could save 1.8 seconds. Each opportunity includes technical details and implementation guidance.

Diagnostics

These are additional insights that don’t directly impact the performance score but highlight areas for improvement things like excessive DOM size, unused JavaScript, or inefficient cache policies.

Passed Audits

Don’t ignore these! They show what you’re doing right, which is valuable for understanding your performance baseline and maintaining good practices.

Accessibility and SEO Insights

Beyond performance, you get actionable feedback on accessibility issues (missing alt text, poor color contrast) and SEO problems (missing meta descriptions, unreadable font sizes on mobile).

The JSON output is equally valuable for programmatic analysis. You can extract specific metrics, track them over time, and build custom dashboards or alerts based on performance trends.

Integrating Google Lighthouse for web

Real-World Impact

Let’s look at practical scenarios where this integration delivers measurable value:

E-Commerce Platform

An online retailer integrated Lighthouse into their Playwright test suite, running audits on product pages and checkout flows. They set a performance budget requiring scores above 90. Within three months, they caught 14 performance regressions before production, including a third-party analytics script blocking rendering.

Result: Maintained consistent page load times, avoiding potential revenue loss.

SaaS Application

A B2B SaaS company added Lighthouse audits to their test suite, focusing on dashboard interfaces. They discovered their data visualization library was causing significant Total Blocking Time. The Lighthouse diagnostics pointed them to specific JavaScript bundles needing code-splitting.

Result: Reduced TBT by 60%, improving perceived responsiveness and reducing support tickets.

Content Publisher

A media company integrated Lighthouse into their deployment pipeline, auditing article pages with strict accessibility and SEO thresholds. This caught issues like missing alt text, poor heading hierarchy, and oversized media files.

Result: Improved SEO rankings, increased organic traffic by 23%, and ensured WCAG compliance.

The Competitive Advantage

Here’s what separates high-performing teams from the rest: they treat performance as a feature, not an afterthought.

By integrating Google Lighthouse with Playwright or any other automation framework, you’re building a culture of performance awareness. Developers get immediate feedback on the performance impact of their changes. Stakeholders get clear, visual reports demonstrating the business value of optimization work.

You shift from reactive firefighting to proactive prevention. Instead of scrambling to fix performance issues after users complain, you prevent them from ever reaching production.

Getting Started

You don’t need to overhaul your entire testing infrastructure. Start small:

  1. Pick one critical user journey maybe your homepage or checkout flow
  2. Add a single Lighthouse audit to your existing test suite
  3. Set a baseline by running the audit and recording current scores
  4. Define one performance budget perhaps a performance score above 80
  5. Integrate it into your CI/CD pipeline so it runs automatically

From there, you can expand add more pages, tighten thresholds, incorporate additional metrics. The key is to start building that performance feedback loop.

Conclusion: Performance as a Continuous Practice

Integrating Google Lighthouse with Playwright; Web performance isn’t a one-time fix. It’s an ongoing commitment that requires visibility, consistency, and automation. Google Lighthouse provides the measurement and insights. Your automation framework provides the execution and integration. Together, they create a powerful system for maintaining and improving web performance at scale.

The teams that win in today’s digital landscape are those that make performance testing as routine as functional testing. They’re the ones catching regressions early, maintaining high standards, and delivering consistently fast experiences to their users.

The question is: will you be one of them?

Would you be ready to boost your web performance? You can start by integrating Google Lighthouse into your automation framework today. Your users and your bottom line will thank you.

Click here to read more blogs like this.

10 Prompting Secrets Every QA Should Know to Get Smarter, Faster, and Better Results

10 Prompting Secrets Every QA Should Know to Get Smarter, Faster, and Better Results

The Testing Skill Nobody Taught You

Here’s a scenario that plays out in QA teams everywhere:

A tester spends 45 minutes manually writing test cases for a new feature. Another tester, working on the same type of feature, finishes in 12 minutes with better coverage, clearer scenarios, and more edge cases identified.

What’s the difference? Experience isn’t the deciding factor, and tools alone don’t explain it either. The real advantage comes from how they communicate with intelligent systems using effective QA Prompting Tips.

The testing world is changing more rapidly than we realise. Today, every QA engineer interacts with AI-powered tools, whether generating test cases, validating user stories, analysing logs, or debugging complex issues. But here’s the uncomfortable truth: most testers miss out on 80% of the value simply because they don’t know how to ask the right questions—especially when applying the right QA Prompting Tips.

That’s where prompting comes in.

Prompting isn’t about typing fancy commands or memorising templates. It’s about asking the right questions, in the right context, at the right time. It’s a skill that multiplies your testing expertise rather than replacing it.

Think of it this way: You wouldn’t write a bug report that just says “Login broken.” You’d provide steps to reproduce, expected vs. actual results, environment details, and severity. The same principle applies to prompting—specificity and structure determine quality, particularly when creating tests with QA Prompting Tips.

In this article, we’ll break down 10 simple yet powerful prompting secrets that can transform your day-to-day testing from reactive to strategic, from time-consuming to efficient, and from good to exceptional.

1. Context Is Everything

QA Prompting Tips

If you ask something vague, you’ll get vague answers. It’s that simple.

Consider these two prompts:

❌ Bad Prompt: “Write test cases for login.”

✅ Good Prompt: “You are a QA engineer for a healthcare application that handles sensitive patient data and must comply with HIPAA regulations. Write 10 test cases for the login module, focusing on data privacy, security vulnerabilities, session management, and multi-factor authentication.”

The difference? Context transforms generic output into actionable testing artifacts.

The first prompt might give you basic username/password validation scenarios. The second gives you security-focused test cases that consider regulatory compliance, session timeout scenarios, MFA edge cases, and data encryption validation, exactly what a healthcare app needs.

Why Context Matters

When you provide real-world details, AI tools can:

  • Align responses with your specific domain (fintech, healthcare, e-commerce)
  • Consider relevant compliance requirements (GDPR, HIPAA, PCI-DSS)
  • Prioritise appropriate risk areas
  • Use industry-specific terminology

Key Takeaway: Always include the “where” and “why” before the “what.” Context makes your prompts intelligent, not just informative, and serves as the foundation for effective QA Prompting Tips.

2. Define the Role Before the Task

QA Prompting Tips

Before you ask for anything, define what the system should think like. This single technique can elevate responses from junior-level to expert-level instantly.

✅ Effective Role Definition: “You are a senior QA engineer with 8 years of experience in exploratory testing and API validation. Review this user story and identify potential edge cases, security vulnerabilities, and performance bottlenecks.”

By assigning a role, you’re setting the expertise level, perspective, and focus area. The response shifts from surface-level observations to nuanced, experience-driven insights.

Role Examples for Different Testing Needs

  • For test case generation: “You are a detail-oriented QA analyst specializing in boundary value analysis…”
  • For bug analysis: “You are a senior test engineer experienced in root cause analysis…”
  • For automation: “You are a test automation architect with expertise in framework design…”
  • For performance: “You are a performance testing specialist, an expert in load testing methodologies and tools.”

Key Takeaway: Assign a role first, then give the task. It fundamentally changes the quality and depth of what you receive.

3. Structure the Output

QA Prompting Tips

QA engineers thrive on structured tables, columns, and clear formats. So ask for it explicitly.

✅ Structured Prompt: “Generate 10 test cases for the password reset feature in a table format with columns for: Test Case ID, Test Scenario, Pre-conditions, Test Steps, Expected Result, Actual Result, and Priority (High/Medium/Low).”

This gives you something that’s immediately copy-ready for Jira, TestRail, Zephyr, SpurQuality, or any test management tool. No reformatting. No cleanup. Just actionable test documentation.

Structure Options

Depending on your need, you can request:

  • Tables for test cases and test data
  • Numbered lists for test execution steps
  • Bullet points for quick scenario summaries
  • JSON/XML for API test data
  • Markdown for documentation
  • Gherkin syntax for BDD scenarios

Key Takeaway: Structured prompts produce structured results. Define the format, and you’ll save hours of manual reformatting.

4. Add Clear Boundaries

QA Prompting Tips

Boundaries create focus and prevent scope creep in your results.

✅ Bounded Prompt: “Generate exactly 8 test cases for the search functionality: 3 positive scenarios, 3 negative scenarios, and 2 edge cases. Focus only on the basic search feature, excluding advanced filters.”

This approach ensures you get:

  • The exact quantity you need (no overwhelming lists)
  • Balanced coverage (positive, negative, edge cases)
  • Focused scope (no feature creep)

Types of Boundaries to Set

  • Quantity: “Generate exactly 5 scenarios”
  • Scope: “Focus only on the checkout process, not the entire cart.”
  • Test types: “Only functional tests, no performance scenarios”
  • Priority: “High and medium priority only”
  • Platforms: “Web application only, exclude mobile”

Key Takeaway: Constraints keep your output precise, relevant, and actionable. They prevent information overload and maintain focus.

5. Build Step by Step (Prompt Chaining)

QA Prompting Tips

Just as QA processes are iterative, effective prompting follows a similar pattern. Instead of asking for everything at once, break it into logical steps.

Example Prompt Chain

Step 1:

“Analyze this user story and summarize the key functional requirements in 3-4 bullet points.”

Step 2:

“Based on those requirements, create 5 high-level test scenarios covering happy path, error handling, and edge cases.”

Step 3:

“Expand the second scenario into detailed test steps with expected results.”

Step 4:

“Identify potential automation candidates from these scenarios and explain why they’re suitable for automation.”

This layered approach produces clear, logical, and well-thought-out results. Each step builds on the previous one, creating a coherent testing strategy rather than disconnected outputs.

Key Takeaway: Prompt chaining mirrors your testing mindset. It’s iterative, logical, and produces higher-quality results than single-shot prompts.

6. Use Prompts for Reviews, Not Just Creation

QA Prompting Tips

Don’t limit AI tools to creation tasks; leverage them as your review partner.

Review Prompt Examples

✅ Test Case Review: “Review these 10 test cases for the payment gateway. Identify any missing scenarios, redundant steps, or unclear expected results.”

✅ Bug Report Quality Check: “Analyze this bug report and suggest improvements to make it clearer for developers. Focus on reproducibility, clarity, and completeness.”

✅ Test Summary Comparison: “Compare these two test execution summary reports and highlight which one communicates results more effectively to stakeholders.”

✅ Documentation Review: “Review this test plan and identify sections that lack clarity or need more detail.”

This transforms your workflow from one-directional (you create, you review) to collaborative (AI assists in both creation and quality assurance).

Key Takeaway: Use AI as your review partner, not just your assistant. It catches what you might miss and improves overall quality.

7. Use Real Scenarios and Data

use real scenarios and data

Generic prompts produce generic results. Feed real test data, actual API responses, or specific scenarios for practical insights.

✅ Real-Data Prompt: “Here’s the actual API response from our login endpoint: {‘status’: 200, ‘token’: null, ‘message’: ‘Success’}. Even though the status is 200 and the message is success, this is causing authentication failures. What could be the root cause, and what test scenarios should I add to catch this in the future?”

This gives you:

  • Specific debugging insights based on actual data
  • Relevant test scenarios tied to real issues
  • Actionable recommendations, not theoretical advice

When to Use Real Data

  • Debugging: Paste actual logs, error messages, or API responses
  • Test data generation: Provide sample data formats
  • Scenario validation: Share actual user workflows
  • Regression analysis: Include historical bug patterns

Key Takeaway: Realistic inputs produce realistic testing insights. The more specific your input, the more valuable your output.

Note: Be cautious about the data you send to the AI model; it might be used for their training purpose. Always prefer a purchased subscription with a data privacy policy.

8. Set the Quality Bar

Quality Bar

If you want a particular tone, standard, or level of professionalism, specify it upfront.

✅ Quality-Defined Prompts:

“Write concise, ISTQB-style test scenarios for the mobile registration flow using standard testing terminology.”

“Generate a bug report following IEEE 829 standards with proper severity classification and detailed reproduction steps.”

“Create BDD scenarios in Gherkin syntax following best practices for Given-When-Then structure.”

This instantly elevates the tone, structure, and professionalism of the output. You’re not getting casual descriptions, you’re getting industry-standard documentation.

Quality Standards to Reference

  • ISTQB for test case terminology
  • IEEE 829 for test documentation
  • Gherkin/BDD for behaviour-driven scenarios
  • ISO 25010 for quality characteristics
  • OWASP for security testing

Key Takeaway: Define the tone and quality standard upfront. It ensures outputs align with professional testing practices.

9. Refine and Iterate

Just like debugging, your first prompt won’t be perfect. And that’s okay.

After getting an initial result, refine it with follow-up prompts:

Initial Prompt: “Generate test cases for user registration.”

Refinement Prompts:

  • ✅ “Add data validation scenarios for email format and password strength.”
  • ✅ “Rank these test cases by priority based on business impact.”
  • ✅ “Include estimated effort for each test case (Small/Medium/Large).”
  • ✅ “Add a column for automation feasibility.”

Each iteration moves you from good to great. You’re sculpting the output to match your exact needs.

Iteration Strategies

  • Add missing elements: “Include security test scenarios”
  • Adjust scope: “Remove low-priority cases and add more edge cases”
  • Change format: “Convert this to Gherkin syntax”
  • Enhance detail: “Expand test steps with more specific actions”

Key Takeaway: Refinement is where you move from good to exceptional. Don’t settle for the first output iteration until it’s exactly what you need.

10. Ask for Prompt Feedback

Here’s a meta-technique: You can ask AI to improve your own prompts.

✅ Meta-Prompt Example: “Here’s the prompt I’m using to generate API test cases: [your prompt]. Analyze it and suggest how to make it more specific, QA-focused, and likely to produce better test scenarios.”

The system will reword, optimize, and enhance your prompt automatically. It’s like having a prompt coach.

What to Ask For

  • “How can I make this prompt more specific?”
  • “What context am I missing that would improve the output?”
  • “Rewrite this prompt to be more structured and clear.”
  • “What role definition would work best for this testing task?”

Key Takeaway: Always review and optimize your own prompts just like you’d review your test cases. Continuous improvement applies to prompting, too.

The QA Prompting Pyramid: A Framework for Mastery

Think of effective prompting as a pyramid. Each level builds on the previous one, creating a foundation for expert-level results.

LevelPrincipleFocusImpact
🧱 BaseContextRelevanceEnsures outputs match your domain and needs
🎭 Level 2Role DefinitionPerspectiveElevates expertise level of responses
📋 Level 3StructureClarityMakes outputs immediately usable
🎯 Level 4ConstraintsPrecisionPrevents scope creep and information overload
🪜 Level 5IterationRefinementTransforms good outputs into exceptional ones
🧠 ApexSelf-ImprovementMasteryContinuously optimizes your prompting skills

Start at the base and work your way up. Master each level before moving to the next. By the time you reach the apex, prompting becomes second nature, a natural extension of your testing expertise.

Real-World Impact: How Prompting Transforms QA Work

Let’s look at practical scenarios where these techniques deliver measurable results:

Test Case Generation

A QA team at a fintech company used structured prompting to generate test cases for a new payment feature. By providing context (PCI-DSS compliance), defining roles (security-focused QA), and setting boundaries (20 test cases covering security, functionality, and edge cases), they reduced test case creation time from 3 hours to 25 minutes while improving coverage by 40%. This type of improvement becomes even more powerful when teams apply effective QA Prompting Tips in their workflows.

Bug Analysis and Root Cause Investigation

A tester struggling with an intermittent bug used real API response data in their prompt, asking for potential root causes and additional test scenarios. Within minutes, they identified a race condition that would have taken hours to debug manually.

Test Automation Strategy

An automation engineer used prompt chaining to develop a framework strategy starting with requirements analysis, moving to tool selection, then architecture design, and finally implementation priorities. The structured approach created a comprehensive automation roadmap in one afternoon.

Documentation Review

A QA lead used review prompts to analyze test plans before stakeholder presentations. The AI identified unclear sections, missing risk assessments, and inconsistent terminology issues that would have surfaced during the actual presentation.

The Competitive Advantage: Why This Matters Now

Here’s the reality: AI won’t replace testers, but testers who know how to prompt will replace those who don’t.

This isn’t about job security, it’s about effectiveness. The QA engineers who master prompting will:

  • Deliver faster without sacrificing quality
  • Think more strategically by offloading routine tasks
  • Catch more issues through comprehensive scenario generation
  • Communicate better with clearer documentation and reports
  • Stay relevant as testing evolves

Prompting is becoming as fundamental to QA as writing test cases or understanding requirements. It’s not a nice-to-have skill; it’s a must-have multiplier.

Getting Started: Your First Steps

You don’t need to master all 10 techniques overnight. Start small and build momentum:

First Week: Foundation

  • Practice adding context to every prompt
  • Define roles before tasks
  • Track the difference in output quality

Second Week: Structure

  • Request structured outputs (tables, lists)
  • Set clear boundaries on scope and quantity
  • Compare structured vs. unstructured results

Third Week: Advanced

  • Try prompt chaining for complex tasks
  • Use prompts for review and feedback
  • Experiment with real data and scenarios

Fourth Week: Mastery

  • Set quality standards in your prompts
  • Iterate and refine outputs
  • Ask for feedback on your own prompts

The key is consistency. Use these techniques daily, even for small tasks. Over time, they become instinctive.

Conclusion: Prompting as a Core QA Skill

Smart prompting is quickly becoming a core competency for QA professionals. It doesn’t replace your testing expertise; it multiplies it, especially when you use the right QA Prompting Tips.

When you apply these 10 techniques, you’ll notice how your test cases become more comprehensive, your bug reports clearer, your scenario planning sharper, and your overall productivity significantly higher. These improvements happen faster when you incorporate effective QA Prompting Tips into your daily workflow.

Remember this simple truth:

“The best testers aren’t those who work harder; they’re those who work smarter by asking better questions.”

So start today. Pick one or two of these techniques and apply them to your next testing task. Notice the difference. Refine your approach. And watch as your testing workflow transforms from reactive to strategic with the help of QA Prompting Tips.

The future of QA isn’t about replacing human intelligence with artificial intelligence. It’s about augmenting human expertise with intelligent tools, and prompting is the bridge between the two.

Your Next Steps

If you found these techniques valuable:

  • Share this article with your QA team and start a conversation about prompting best practices
  • Bookmark this guide and reference it when crafting your next prompt
  • Try one technique today, pick the easiest one, and apply it to your current task
  • Drop a comment below. What’s your go-to prompt that saves you time? What challenges do you face with prompting?
  • Follow for more. We’ll be publishing guides on advanced prompt patterns, AI-driven test automation, and QA productivity hacks

Your prompting journey starts with a single, well-crafted question. Make it count.

Click here to read more blogs like this.

Building a Complete API Automation Testing Framework with Java, Rest Assured, Cucumber, and Playwright 

Building a Complete API Automation Testing Framework with Java, Rest Assured, Cucumber, and Playwright 

API Automation Testing Framework – In Today’s fast-paced digital ecosystem, almost every modern application relies on APIs (Application Programming Interfaces) to function seamlessly. Whether it’s a social media integration pulling live updates, a payment gateway processing transaction, or a data service exchanging real-time information, APIs act as the invisible backbone that connects various systems together. 

Because APIs serve as the foundation of all interconnected software, ensuring that they are reliable, secure, and high performing is absolutely critical. Even a minor API failure can impact multiple dependent systems; consequently, it may cause application downtime, data mismatches, or even financial loss.

That’s where API automation testing framework comes in. Unlike traditional UI testing, API testing validates the core business logic directly at the backend layer, which makes it faster, more stable, and capable of detecting issues early in the development cycle — even before the frontend is ready. 

In this blog, we’ll walk through the process of building a complete API Automation Testing Framework using a combination of: 

  • Java – as the main programming language 
  • Maven – for project and dependency management 
  • Cucumber – to implement Behavior Driven Development (BDD) 
  • RestAssured – for simplifying RESTful API automation 
  • Playwright – to handle browser-based token generation 

The framework you’ll learn to build will follow a BDD (Behavior-Driven Development) approach, enabling test scenarios to be written in simple, human-readable language. This not only improves collaboration between developers, testers, and business analysts but also makes test cases easier to understand, maintain, and extend

Additionally, the API automation testing framework will be CI/CD-friendly, meaning it can be seamlessly integrated into automated build pipelines for continuous testing and faster feedback. 

By the end of this guide, you’ll have a scalable, reusable, and maintainable API testing framework that brings together the best of automation, reporting, and real-time token management — a complete solution for modern QA teams. 

What is API?

An API (Application Programming Interface) acts as a communication bridge between two software systems, allowing them to exchange information in a standardized way. In simpler terms, it defines how different software components should interact — through a set of rules, protocols, and endpoints

Think of an API as a messenger that takes a request from one system, delivers it to another system, and then brings back the response. This interaction, therefore, allows applications to share data and functionality without exposing their internal logic or database structure.

Let’s take a simple example: 
When you open a weather application on your phone, it doesn’t store weather data itself. Instead, it sends a request to a weather server API, which processes the request and sends back a response — such as the current temperature, humidity, or forecast. 
This request-response cycle is what makes APIs so powerful and integral to almost every digital experience we use today. 

Most modern APIs follow the REST (Representational State Transfer) architectural style. REST APIs use the HTTP protocol and are designed around a set of standardized operations, including: 

HTTP MethodDescriptionExample Use
GETRetrieve data from the serverFetch a list of users
POSTCreate new data on the serverAdd a new product
PUTUpdate existing dataedit user details
DELETERemove dataDelete a record

The responses returned by API’s are typically in JSON (JavaScript Object Notation) format – a lightweight, human-readable, and machine-friendly data format that’s easy to parse and validate.

In essence, API’s are the digital glue that holds modern applications together — enabling smooth communication, faster integrations, and a consistent flow of information across systems. 

What is API Testing?

API Testing is the process of verifying that an API functions correctly and performs as expected — ensuring that all its endpoints, parameters, and data exchanges behave according to defined business rules. 

In simple terms, it’s about checking whether the backend logic of an application works properly — without needing a graphical user interface (UI). Since APIs act as the communication layer between different software components, testing them helps ensure that the entire system remains reliable, secure, and efficient. 

API testing typically focuses on four main aspects: 

  1. Functionality – Does the API perform the intended operation and return the correct response for valid requests? 
  2. Reliability – Does it deliver consistent results every time, even under different inputs and conditions? 
  3. Security – Is the API protected from unauthorized access, data leaks, or token misuse? 
  4. Performance – Does it respond quickly and remain stable under heavy load or high traffic? 

Unlike traditional UI testing, which validates the visual and interactive parts of an application, API testing operates directly at the business logic layer. This makes it: 

  • Faster – Since it bypasses the UI, execution times are much shorter. 
  • More Stable – UI changes (like a button name or layout) don’t affect API tests. 
  • Proactive – Tests can be created and run even before the front-end is developed. 

In essence, API testing ensures the heart of your application is healthy. By validating responses, performance, and security at the API level, teams can detect defects early, reduce costs, and deliver more reliable software to users. 

Why is API Testing Important?

API Testing plays a vital role in modern software development because APIs form the backbone of most applications. A failure in an API can affect multiple systems and impact overall functionality. 

Here’s why API testing is important: 

  1. Ensures Functionality: Verifies that endpoints return correct responses and handle errors properly. 
  2. Enhances Security: Detects vulnerabilities like unauthorized access or token misuse. 
  3. Validates Data Integrity: Confirms that data remains consistent across APIs and databases. 
  4. Improves Performance: Checks response time, stability, and behavior under load. 
  5. Detects Defects Early: Allows early testing right after backend development, saving time and cost
  6. Supports Continuous Integration: Easily integrates with CI/CD pipelines for automated validation. 

In short, API testing ensures your system’s core logic is reliable, secure, and ready for real-world use. 

Tools for Manual API Testing

Before jumping into automation, it’s essential to explore and understand APIs manually. Manual testing helps you validate endpoints, check responses, and get familiar with request structures. 

Here are some popular tools used for manual API testing: 

  • Postman: The most widely used tool for sending API requests, validating responses, and organizing test collections [refer link – https://www.postman.com/.
  • SoapUI: Best suited for testing both SOAP and REST APIs with advanced features like assertions and mock services. 
  • Insomnia: A lightweight and user-friendly alternative to Postman, ideal for quick API exploration. 
  • cURL: A command-line tool perfect for making fast API calls or testing from scripts. 
  • Fiddler: Excellent for capturing and debugging HTTP/HTTPS traffic between client and server. 

Using these tools helps testers understand API behavior, request/response formats, and possible edge cases — forming a strong foundation before moving to API automation

Tools for API Automation Testing 

After verifying APIs manually, the next step is to automate them using reliable tools and libraries. Automation helps improve test coverage, consistency, and execution speed. 

Here are some popular tools used for API automation testing: 

  • RestAssured: A powerful Java library designed specifically for testing and validating RESTful APIs. 
  • Cucumber: Enables writing test cases in Gherkin syntax (plain English), making them easy to read and maintain. 
  • Playwright: Automates browser interactions; in our framework, it will be used for token generation or authentication flows. 
  • Postman + Newman: Allows you to run Postman collections directly from the command line — ideal for CI/CD integration. 
  • JMeter: A robust tool for performance and load testing of APIs under different conditions. 

In this blog, our focus will be on building a framework using RestAssured, Cucumber, and Playwright — combining functional, BDD, and authentication automation into one cohesive setup. 

Framework Overview 

We’ll build a Behavior-Driven API Automation Testing Framework that combines multiple tools for a complete testing solution. Here’s how each component fits in: 

  • Cucumber – Manages the BDD layer, allowing test scenarios to be written in simple, readable feature files
  • RestAssured – Handles HTTP requests and responses for validating RESTful APIs. 
  • Playwright – Automates browser-based actions like token generation or authentication. 
  • Maven – Manages project dependencies, builds, and plugins efficiently. 
  • Cucumber HTML Reports – Automatically generates detailed execution reports after each run. 

The framework follows a modular structure, with separate packages for step definitions, utilities, configurations, and feature files — ensuring clean organization, easy maintenance, and scalability. 

Step 1: Prerequisites

Before starting, ensure you have: 

Add the required dependencies to your pom.xml file: 

<?xml version="1.0" encoding="UTF-8"?> 
<project xmlns="http://maven.apache.org/POM/4.0.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
 
    <groupId>org.Spurqlabs</groupId> 
    <artifactId>SpurQLabs-Test-Automation</artifactId> 
    <version>1.0-SNAPSHOT</version> 
    <properties> 
        <maven.compiler.source>11</maven.compiler.source> 
        <maven.compiler.target>11</maven.compiler.target> 
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
    </properties> 
    <dependencies> 
        <!-- Playwright for UI automation --> 
        <dependency> 
            <groupId>com.microsoft.playwright</groupId> 
            <artifactId>playwright</artifactId> 
            <version>1.50.0</version> 
        </dependency> 
        <!-- Cucumber for BDD --> 
        <dependency> 
            <groupId>io.cucumber</groupId> 
            <artifactId>cucumber-java</artifactId> 
            <version>7.23.0</version> 
        </dependency> 
        <dependency> 
            <groupId>io.cucumber</groupId> 
            <artifactId>cucumber-testng</artifactId> 
            <version>7.23.0</version> 
        </dependency> 
        <!-- TestNG for test execution --> 
        <dependency> 
            <groupId>org.testng</groupId> 
            <artifactId>testng</artifactId> 
            <version>7.11.0</version> 
            <scope>test</scope> 
        </dependency> 
        <!-- Rest-Assured for API testing --> 
        <dependency> 
            <groupId>io.rest-assured</groupId> 
            <artifactId>rest-assured</artifactId> 
            <version>5.5.5</version> 
        </dependency> 
        <!-- Apache POI for Excel support --> 
        <dependency> 
            <groupId>org.apache.poi</groupId> 
            <artifactId>poi-ooxml</artifactId> 
            <version>5.4.1</version> 
        </dependency> 
        <!-- org.json for JSON parsing --> 
        <dependency> 
            <groupId>org.json</groupId> 
            <artifactId>json</artifactId> 
            <version>20250517</version> 
        </dependency> 
        <dependency> 
            <groupId>org.seleniumhq.selenium</groupId> 
            <artifactId>selenium-devtools-v130</artifactId> 
            <version>4.26.0</version> 
            <scope>test</scope> 
        </dependency> 
        <dependency> 
            <groupId>com.sun.mail</groupId> 
            <artifactId>jakarta.mail</artifactId> 
            <version>2.0.1</version> 
        </dependency> 
        <dependency> 
            <groupId>com.sun.activation</groupId> 
            <artifactId>jakarta.activation</artifactId> 
            <version>2.0.1</version> 
        </dependency> 
    </dependencies> 
    <build> 
        <plugins> 
            <plugin> 
                <groupId>org.apache.maven.plugins</groupId> 
                <artifactId>maven-compiler-plugin</artifactId> 
                <version>3.14.0</version> 
                <configuration> 
                    <source>11</source> 
                    <target>11</target> 
                </configuration> 
            </plugin> 
        </plugins> 
    </build> 
</project> 

Step 2: Creating Project

Create a Maven project with the following folder structure:

loanbook-api-automation 

│ 
├── .idea 
│ 
├── src 
│   └── test 
│       └── java 
│           └── org 
│               └── Spurlabs 
│                   ├── Core 
│                   │   ├── Hooks.java 
│                   │   ├── Main.java 
│                   │   ├── TestContext.java 
│                   │   └── TestRunner.java 
│                   │ 
│                   ├── Steps 
│                   │   └── CommonSteps.java 
│                   │ 
│                   └── Utils 
│                       ├── APIUtility.java 
│                       ├── FrameworkConfigReader.java 
│                       └── TokenManager.java 
│ 
├── resources 
│   ├── Features 
│   ├── headers 
│   ├── Query_Parameters 
│   ├── Request_Bodies 
│   ├── Schema 
│   └── cucumber.properties 
│ 
├── target 
│ 
├── test-output 
│ 
├── .gitignore 
├── bitbucket-pipelines.yml 
├── DealDetails.json 
├── FrameworkConfig.json 
├── pom.xml 
├── README.md 
└── token.json 

Step 3: Creating a Feature File

In this, we will be creating a feature file for API Automation Testing Framework. A feature file consists of steps. These steps are mentioned in the gherkin language. The feature is easy to understand and can be written in the English language so that a non-technical person can understand the flow of the test scenario. In this framework we will be automating the four basic API request methods i.e. POST, PUT, GET and DELETE. 
 
We can assign tags to our scenarios mentioned in the feature file to run particular test scenarios based on the requirement. The key point you must notice here is the feature file should end with .feature extension. We will be creating four different scenarios for the four different API methods.  

Feature: All Notes API Validation 
 
  @api 
 
  Scenario Outline: Validate POST Create Notes API Response for "<scenarioName>" Scenario 
    When User sends "<method>" request to "<url>" with headers "<headers>" and query file "<queryFile>" and requestDataFile  "<bodyFile>" 
    Then User verifies the response status code is <statusCode> 
    And User verifies the response body matches JSON schema "<schemaFile>" 
    Then User verifies fields in response: "<contentType>" with content type "<fields>" 
    Examples: 
      | scenarioName       | method | url                                                             | headers | queryFile | bodyFile             | statusCode | schemaFile | contentType | fields | 
      | Valid create Notes | POST   | /api/v1/loan-syndications/{dealId}/investors/{investorId}/notes | NA      | NA        | Create_Notes_Request | 200        | NA         | NA          | NA     | 
 
  Scenario Outline: Validate GET Notes API Response for "<scenarioName>" Scenario 
    When User sends "<method>" request to "<url>" with headers "<headers>" and query file "<queryFile>" and requestDataFile "<bodyFile>" 
    Then User verifies the response status code is <statusCode> 
    And User verifies the response body matches JSON schema "<schemaFile>" 
    Then User verifies fields in response: "<contentType>" with content type "<fields>" 
    Examples: 
      | scenarioName    | method | url                                                             | headers | queryFile | bodyFile | statusCode | schemaFile       | contentType | fields              | 
      | Valid Get Notes | GET    | /api/v1/loan-syndications/{dealId}/investors/{investorId}/notes | NA      | NA        | NA       | 200        | Notes_Schema_200 | json        | note=This is Note 1 | 
 
  Scenario Outline: Validate Update Notes API Response for "<scenarioName>" Scenario 
    When User sends "<method>" request to "<url>" with headers "<headers>" and query file "<queryFile>" and requestDataFile "<bodyFile>" 
    Then User verifies the response status code is <statusCode> 
    And User verifies the response body matches JSON schema "<schemaFile>" 
    Then User verifies fields in response: "<contentType>" with content type "<fields>" 
    Examples: 
      | scenarioName       | method | url                                                                                   | headers | queryFile | bodyFile             | statusCode | schemaFile | contentType | fields | 
      | Valid update Notes | PUT    | /api/v1/loan-syndications/{dealId}/investors/{investorId}/notes/{noteId}/update-notes | NA      | NA        | Update_Notes_Request | 200        | NA         | NA          | NA     | 
 
  Scenario Outline: Validate DELETE Create Notes API Response for "<scenarioName>" Scenario 
    When User sends "<method>" request to "<url>" with headers "<headers>" and query file "<queryFile>" and requestDataFile "<bodyFile>" 
    Then User verifies the response status code is <statusCode> 
    And User verifies the response body matches JSON schema "<schemaFile>" 
    Then User verifies fields in response: "<contentType>" with content type "<fields>" 
    Examples: 
      | scenarioName | method | url                                                                      | headers | queryFile | bodyFile | statusCode | schemaFile | contentType | fields | 
      | Valid delete | DELETE | /api/v1/loan-syndications/{dealId}/investors/{investorId}/notes/{noteId} | NA      | NA        | NA       | 200        | NA         | NA          | NA     | 

Step 4: Creating a Step Definition File

Unlike the automation framework which we have built in the previous blog, we will be creating a single-step file for all the feature files. In the BDD framework, the step files are used to map and implement the steps described in the feature file. Rest Assured library is very accurate to map the steps with the steps described in the feature file. We will be describing the same steps in the step file as they have described in the feature file so that behave will come to know the step implementation for the particular steps present in the feature file.  

package org.Spurqlabs.Steps; 
 
import io.cucumber.java.en.Then; 
import io.cucumber.java.en.When; 
import io.restassured.response.Response; 
import org.Spurqlabs.Core.TestContext; 
import org.Spurqlabs.Utils.*; 
import org.json.JSONArray; 
import org.json.JSONObject; 
 
import java.io.File; 
import java.io.IOException; 
import java.nio.charset.StandardCharsets; 
import java.nio.file.Files; 
import java.nio.file.Paths; 
import java.util.HashMap; 
import java.util.Map; 
 
import static io.restassured.module.jsv.JsonSchemaValidator.matchesJsonSchemaInClasspath; 
import static org.Spurqlabs.Utils.DealDetailsManager.replacePlaceholders; 
import static org.hamcrest.Matchers.equalTo; 
public class CommonSteps extends TestContext { 
    private Response response; 
 
    @When("User sends {string} request to {string} with headers {string} and query file {string} and requestDataFile {string}") 
    public void user_sends_request_to_with_query_file_and_requestDataFile (String method, String url, String headers, String queryFile, String bodyFile) throws IOException { 
        String jsonString = Files.readString(Paths.get(FrameworkConfigReader.getFrameworkConfig("DealDetails")), StandardCharsets.UTF_8); 
        JSONObject storedValues = new JSONObject(jsonString); 
 
        String fullUrl = FrameworkConfigReader.getFrameworkConfig("BaseUrl") + replacePlaceholders(url); 
 
        Map<String, String> header = new HashMap<>(); 
        if (!"NA".equalsIgnoreCase(headers)) { 
            header = JsonFileReader.getHeadersFromJson(FrameworkConfigReader.getFrameworkConfig("headers") + headers + ".json"); 
        } else { 
            header.put("cookie", TokenManager.getToken()); 
        } 
        Map<String, String> queryParams = new HashMap<>(); 
        if (!"NA".equalsIgnoreCase(queryFile)) { 
            queryParams = JsonFileReader.getQueryParamsFromJson(FrameworkConfigReader.getFrameworkConfig("Query_Parameters") + queryFile + ".json"); 
            for (String key : queryParams.keySet()) { 
                String value = queryParams.get(key); 
                for (String storedKey : storedValues.keySet()) { 
                    value = value.replace("{" + storedKey + "}", storedValues.getString(storedKey)); 
                } 
                queryParams.put(key, value); 
            } 
        } 
 
        Object requestBody = null; 
        if (!"NA".equalsIgnoreCase(bodyFile)) { 
            String bodyTemplate = JsonFileReader.getJsonAsString( 
                    FrameworkConfigReader.getFrameworkConfig("Request_Bodies") + bodyFile + ".json"); 
 
            for (String key : storedValues.keySet()) { 
                String placeholder = "{" + key + "}"; 
                if (bodyTemplate.contains(placeholder)) { 
                    bodyTemplate = bodyTemplate.replace(placeholder, storedValues.getString(key)); 
                } 
            } 
 
            requestBody = bodyTemplate; 
        } 

        response = APIUtility.sendRequest(method, fullUrl, header, queryParams, requestBody); 
        response.prettyPrint(); 
        TestContextLogger.scenarioLog("API", "Request sent: " + method + " " + fullUrl); 
 
        if (scenarioName.contains("GET Notes") && response.getStatusCode() == 200) { 
            DealDetailsManager.put("noteId", response.path("[0].id")); 
        } 
         
    } 
 
    @Then("User verifies the response status code is {int}") 
    public void userVerifiesTheResponseStatusCodeIsStatusCode(int statusCode) { 
        response.then().statusCode(statusCode); 
        TestContextLogger.scenarioLog("API", "Response status code: " + statusCode); 
    } 
 
    @Then("User verifies the response body matches JSON schema {string}") 
    public void userVerifiesTheResponseBodyMatchesJSONSchema(String schemaFile) { 
        if (!"NA".equalsIgnoreCase(schemaFile)) { 
            String schemaPath = "Schema/" + schemaFile + ".json"; 
            response.then().assertThat().body(matchesJsonSchemaInClasspath(schemaPath)); 
            TestContextLogger.scenarioLog("API", "Response body matches schema"); 
        } else { 
            TestContextLogger.scenarioLog("API", "Response body does not have schema to validate"); 
        } 
    } 
 
    @Then("User verifies field {string} has value {string}") 
    public void userVerifiesFieldHasValue(String jsonPath, String expectedValue) { 
        response.then().body(jsonPath, equalTo(expectedValue)); 
        TestContextLogger.scenarioLog("API", "Field " + jsonPath + " has value: " + expectedValue); 
    } 
 
    @Then("User verifies fields in response: {string} with content type {string}") 
    public void userVerifiesFieldsInResponseWithContentType(String contentType, String fields) throws IOException { 
        // If NA, skip verification 
        if ("NA".equalsIgnoreCase(contentType) || "NA".equalsIgnoreCase(fields)) { 
            return; 
        } 
        String responseStr = response.getBody().asString().trim(); 
 
        try { 
            if ("text".equalsIgnoreCase(contentType)) { 
                // For text, verify each expected value is present in response 
                for (String expected : fields.split(";")) { 
                    expected = replacePlaceholders(expected.trim()); 
                    if (!responseStr.contains(expected)) { 
                        throw new AssertionError("Expected text not found: " + expected); 
                    } 
                    TestContextLogger.scenarioLog("API", "Text found: " + expected); 
                } 
            } else if ("json".equalsIgnoreCase(contentType)) { 
                // For json, verify key=value pairs 
                JSONObject jsonResponse; 
                if (responseStr.startsWith("[")) { 
                    JSONArray arr = new JSONArray(responseStr); 
                    jsonResponse = !arr.isEmpty() ? arr.getJSONObject(0) : new JSONObject(); 
                } else { 
                    jsonResponse = new JSONObject(responseStr); 
                } 
                for (String pair : fields.split(";")) { 
                    if (pair.trim().isEmpty()) continue; 
                    String[] kv = pair.split("=", 2); 
                    if (kv.length < 2) continue; 
                    String keyPath = kv[0].trim(); 
                    String expected = replacePlaceholders(kv[1].trim()); 
                    Object actual = JsonFileReader.getJsonValueByPath(jsonResponse, keyPath); 
                    if (actual == null) { 
                        throw new AssertionError("Key not found in JSON: " + keyPath); 
                    } 
                    if (!String.valueOf(actual).equals(String.valueOf(expected))) { 
                        throw new AssertionError("Mismatch for " + keyPath + ": expected '" + expected + "', got '" + actual + "'"); 
                    } 
                    TestContextLogger.scenarioLog("API", "Validated: " + keyPath + " = " + expected); 
                } 
            } else { 
                throw new AssertionError("Unsupported content type: " + contentType); 
            } 
        } catch (AssertionError | Exception e) { 
            TestContextLogger.scenarioLog("API", "Validation failed: " + e.getMessage()); 
            throw e; 
        } 
    } 

Step 5: Creating API

Till now we have successfully created a feature file and a step file now in this step we will be creating a utility file. Generally, in Web automation, we have page files that contain the locators and the actions to perform on the web elements but in this framework, we will be creating a single utility file just like the step file. The utility file contains the API methods and the endpoints to perform the specific action like, POST, PUT, GET, or DELETE. The request body i.e. payload and the response body will be captured using the methods present in the utility file. So the reason these methods are created in the utility file is that we can use them multiple times and don’t have to create the same method over and over again.

package org.Spurqlabs.Utils; 
 
import io.restassured.RestAssured; 
import io.restassured.http.ContentType; 
import io.restassured.response.Response; 
import io.restassured.specification.RequestSpecification; 
 
import java.io.File; 
import java.util.Map; 
 
public class APIUtility { 
    public static Response sendRequest(String method, String url, Map<String, String> headers, Map<String, String> queryParams, Object body) { 
        RequestSpecification request = RestAssured.given(); 
        if (headers != null && !headers.isEmpty()) { 
            request.headers(headers); 
        } 
        if (queryParams != null && !queryParams.isEmpty()) { 
            request.queryParams(queryParams); 
        } 
        if (body != null && !method.equalsIgnoreCase("GET")) { 
            if (headers == null || !headers.containsKey("Content-Type")) { 
                request.header("Content-Type", "application/json"); 
            } 
            request.body(body); 
        } 
        switch (method.trim().toUpperCase()) { 
            case "GET": 
                return request.get(url); 
            case "POST": 
                return request.post(url); 
            case "PUT": 
                return request.put(url); 
            case "PATCH": 
                return request.patch(url); 
            case "DELETE": 
                return request.delete(url); 
            default: 
                throw new IllegalArgumentException("Unsupported HTTP method: " + method); 
        } 
    } 

Step 6: Create a Token Generation using Playwright

In this step, we automate the process of generating authentication tokens using Playwright. Many APIs require login-based tokens (like cookies or bearer tokens), and managing them manually can be difficult — especially when they expire frequently. 

The TokenManager class handles this by: 

  • Logging into the application automatically using Playwright. 
  • Extracting authentication cookies (OauthHMAC, OauthExpires, BearerToken). 
  • Storing the token in a local JSON file for reuse. 
  • Refreshing the token automatically when it expires. 

This ensures that your API tests always use a valid token without manual updates, making the framework fully automated and CI/CD ready. 

package org.Spurqlabs.Utils; 
 
import java.io.*; 
import java.nio.file.*; 
import java.time.Instant; 
import java.util.HashMap; 
import java.util.Map; 
import com.google.gson.Gson; 
import com.google.gson.reflect.TypeToken; 
import com.microsoft.playwright.*; 
import com.microsoft.playwright.options.Cookie; 
 
public class TokenManager { 
    private static final ThreadLocal<String> tokenThreadLocal = new ThreadLocal<>(); 
    private static final ThreadLocal<Long> expiryThreadLocal = new ThreadLocal<>(); 
    private static final String TOKEN_FILE = "token.json"; 
    private static final long TOKEN_VALIDITY_SECONDS = 30 * 60; // 30 minutes 
 
    public static String getToken() { 
        String token = tokenThreadLocal.get(); 
        Long expiry = expiryThreadLocal.get(); 
        if (token == null || expiry == null || Instant.now().getEpochSecond() >= expiry) { 
            // Try to read from a file (for multi-JVM/CI) 
            Map<String, Object> fileToken = readTokenFromFile(); 
            if (fileToken != null) { 
                token = (String) fileToken.get("token"); 
                expiry = ((Number) fileToken.get("expiry")).longValue(); 
            } 
            // If still null or expired, fetch new 
            if (token == null || expiry == null || Instant.now().getEpochSecond() >= expiry) { 
                Map<String, Object> newToken = generateAuthTokenViaBrowser(); 
                token = (String) newToken.get("token"); 
                expiry = (Long) newToken.get("expiry"); 
                writeTokenToFile(token, expiry); 
            } 
            tokenThreadLocal.set(token); 
            expiryThreadLocal.set(expiry); 
        } 
        return token; 
    } 
 
    private static Map<String, Object> generateAuthTokenViaBrowser() { 
        String bearerToken; 
        long expiry = Instant.now().getEpochSecond() + TOKEN_VALIDITY_SECONDS; 
        int maxRetries = 2; 
        int attempt = 0; 
        Exception lastException = null; 
        while (attempt < maxRetries) { 
            try (Playwright playwright = Playwright.create()) { 
                Browser browser = playwright.chromium().launch(new BrowserType.LaunchOptions().setHeadless(true)); 
                BrowserContext context = browser.newContext(); 
                Page page = context.newPage(); 
 
                // Robust wait for login page to load 
                page.navigate(FrameworkConfigReader.getFrameworkConfig("BaseUrl"), new Page.NavigateOptions().setTimeout(60000)); 
                page.waitForSelector("#email", new Page.WaitForSelectorOptions().setTimeout(20000)); 
                page.waitForSelector("#password", new Page.WaitForSelectorOptions().setTimeout(20000)); 
                page.waitForSelector("button[type='submit']", new Page.WaitForSelectorOptions().setTimeout(20000)); 
 
                // Fill a login form 
                page.fill("#email", FrameworkConfigReader.getFrameworkConfig("UserEmail")); 
                page.fill("#password", FrameworkConfigReader.getFrameworkConfig("UserPassword")); 
                page.waitForSelector("button[type='submit']:not([disabled])", new Page.WaitForSelectorOptions().setTimeout(10000)); 
                page.click("button[type='submit']"); 
 
                // Wait for either dashboard element or flexible URL match 
                boolean loggedIn; 
                try { 
                    page.waitForSelector(".dashboard, .main-content, .navbar, .sidebar", new Page.WaitForSelectorOptions().setTimeout(20000)); 
                    loggedIn = true; 
                } catch (Exception e) { 
                    // fallback to URL check 
                    try { 
                        page.waitForURL(url -> url.startsWith(FrameworkConfigReader.getFrameworkConfig("BaseUrl")), new Page.WaitForURLOptions().setTimeout(30000)); 
                        loggedIn = true; 
                    } catch (Exception ex) { 
                        // Both checks failed 
                        loggedIn = false; 
                    } 
                } 
                if (!loggedIn) { 
                    throw new RuntimeException("Login did not complete successfully: dashboard element or expected URL not found"); 
                } 
 
                // Extract cookies 
                String oauthHMAC = null; 
                String oauthExpires = null; 
                String token = null; 
                for (Cookie cookie : context.cookies()) { 
                    switch (cookie.name) { 
                        case "OauthHMAC": 
                            oauthHMAC = cookie.name + "=" + cookie.value; 
                            break; 
                        case "OauthExpires": 
                            oauthExpires = cookie.name + "=" + cookie.value; 
                            if (cookie.expires != null && cookie.expires > 0) { 
                                expiry = cookie.expires.longValue(); 
                            } 
                            break; 
                        case "BearerToken": 
                            token = cookie.name + "=" + cookie.value; 
                            break; 
                    } 
                } 
                if (oauthHMAC != null && oauthExpires != null && token != null) { 
                    bearerToken = oauthHMAC + ";" + oauthExpires + ";" + token + ";"; 
                } else { 
                    throw new RuntimeException("❗ One or more cookies are missing: OauthHMAC, OauthExpires, BearerToken"); 
                } 
                browser.close(); 
                Map<String, Object> map = new HashMap<>(); 
                map.put("token", bearerToken); 
                map.put("expiry", expiry); 
                return map; 
            } catch (Exception e) { 
                lastException = e; 
                System.err.println("[TokenManager] Login attempt " + (attempt + 1) + " failed: " + e.getMessage()); 
                attempt++; 
                try { Thread.sleep(2000); } catch (InterruptedException ignored) {} 
            } 
        } 
        throw new RuntimeException("Failed to generate auth token after " + maxRetries + " attempts", lastException); 
    } 
 
    private static void writeTokenToFile(String token, long expiry) { 
        try { 
            Map<String, Object> map = new HashMap<>(); 
            map.put("token", token); 
            map.put("expiry", expiry); 
            String json = new Gson().toJson(map); 
            Files.write(Paths.get(TOKEN_FILE), json.getBytes()); 
        } catch (IOException e) { 
            e.printStackTrace(); 
        } 
    } 
 
    private static Map<String, Object> readTokenFromFile() { 
        try { 
            Path path = Paths.get(TOKEN_FILE); 
            if (!Files.exists(path)) return null; 
            String json = new String(Files.readAllBytes(path)); 
            return new Gson().fromJson(json, new TypeToken<Map<String, Object>>() {}.getType()); 
        } catch (IOException e) { 
            return null; 
        } 
    } 
} 

Step 7: Create Framework Config File

A good tester is one who knows the use and importance of config files. In this framework, we are also going to use the config file. Here, we are just going to put the base URL in this config file and will be using the same in the utility file over and over again. The config file contains more data than just of base URL when you start exploring the framework and start automating the new endpoints then at some point, you will realize that some data can be added to the config file.  

Additionally, the purpose of the config files is to make tests more maintainable and reusable. Another benefit of a config file is that it makes the code more modular and easier to understand as all the configuration settings are stored in a separate file and it makes it easier to update the configuration settings for all the tests at once.  

{ 
  "BaseUrl": "https://app.sample.com", 
  "UserEmail": "************.com", 
  "UserPassword": "#############", 
  "ExecutionBrowser": "chromium", 
  "Resources": "/src/test/resources/", 
  "Query_Parameters": "src/test/resources/Query_Parameters/", 
  "Request_Bodies": "src/test/resources/Request_Bodies/", 
  "Schema": "src/test/resources/Schema/", 
  "TestResultsDir": "test-output/", 
  "headers": "src/test/resources/headers/", 
  "DealDetails": "DealDetails.json", 
  "UploadDocUrl": "/api/v1/documents" 
} 

Step 8: Execute and Generate Cucumber Report

At this stage, we create the TestRunner class, which serves as the entry point to execute all Cucumber feature files. It uses TestNG as the test executor and integrates Cucumber for running BDD-style test scenarios. 

The @CucumberOptions annotation defines: 

  • features → Location of all .feature files. 
  • glue → Packages containing step definitions and hooks. 
  • plugin → Reporting options like JSON and HTML reports. 

After execution, Cucumber automatically generates: 

  • Cucumber.json → For CI/CD and detailed reporting. 
  • Cucumber.html → A user-friendly HTML report showing test results. 

This setup makes it easy to run all API tests and view clean, structured reports for quick analysis. 

package org.Spurqlabs.Core; 
import io.cucumber.testng.AbstractTestNGCucumberTests; 
import io.cucumber.testng.CucumberOptions; 
import org.testng.annotations.AfterSuite; 
import org.testng.annotations.BeforeSuite; 
import org.testng.annotations.DataProvider; 
import org.Spurqlabs.Utils.CustomHtmlReport; 
import org.Spurqlabs.Utils.ScenarioResultCollector; 
 
@CucumberOptions( 
        features = {"src/test/resources/Features"}, 
        glue = {"org.Spurqlabs.Steps", "org.Spurqlabs.Core"}, 
        plugin = {"pretty", "json:test-output/Cucumber.json","html:test-output/Cucumber.html"} 
) 
 
public class TestRunner {} 

Running your test

Once the framework is set up, you can execute your API automation suite directly from the command line using Maven. Maven handles compiling, running tests, and generating reports automatically. 

Run All Tests – 

To run all Cucumber feature files: 

mvn clean test 
  • clean → Deletes old compiled files and previous reports for a fresh run. 
  • test → Executes all test scenarios defined in your project. 

After running this command, Maven will trigger the Cucumber TestRunner, execute all scenarios, and generate reports in the test-output folder. 

Run Tests by Tag – 

Tags allow you to selectively run specific test scenarios or features. 
You can add tags like @api1, @smoke, or @regression in your .feature files to categorize tests. 

Example: 

@api1 
Scenario: Verify POST API creates a record successfully 
  Given User sends "POST" request to "/api/v1/create" ... 
  Then User verifies the response status code is 201 

To execute only scenarios with a specific tag, use: 

mvn clean test -Dcucumber.filter.tags="@api1" 
  • The framework will run only those tests that have the tag @api1. 
  • You can combine tags for more flexibility: 
  • @api1 or @api2 → Runs tests with either tag. 
  • @smoke and not @wip → Runs smoke tests excluding work-in-progress scenarios. 

This is especially useful when running specific test groups in CI/CD pipelines. 

View Test Reports 

API Automation Testing Framerwork Report – After the execution, Cucumber generates detailed reports automatically in the test-output directory: 

  • Cucumber.html → User-friendly HTML report showing scenario results and logs. 
  • Cucumber.json → JSON format report for CI/CD integrations or analytics tools. 

You can open the report in your browser: 

project-root/test-output/Cucumber.html 
 

This section gives testers a clear understanding of how to: 

  • Run all or specific tests using tags, 
  • Filter executions during CI/CD, and 
  • Locate and view the generated reports. 
API Automation Testing Framework Report

Reference Framework GitHub Link – https://github.com/spurqlabs/APIAutomation_RestAssured_Cucumber_Playwright

Conclusion

API automation testing framework ensures that backend services are functioning properly before the application reaches the end user. 
Therefore, by integrating Cucumber, RestAssured, and Playwright, we have built a flexible and maintainable test framework that: 

  • Supports BDD style scenarios. 
  • Handles token-based authentication automatically. 
  • Provides reusable utilities for API calls. 
  • Generates rich HTML reports for easy analysis. 

This hybrid setup helps QA engineers achieve faster feedback, maintain cleaner code, and enhance the overall quality of the software.