Feb 18, 2025

Mastering AI-Driven Testing: Writing Effective Tests for Hercules

Overview of Hercules

Hercules is an end-to-end test automation platform combining multiple agents like a Planner Agent, Browser Agent, and API Agent (among others) to autonomously execute Gherkin BDD scenarios. Each Gherkin step serves as a prompt for these AI-driven helpers.

Each Gherkin step is effectively a mini “prompt” to Hercules. Well-crafted tests or prompts; reduce misinterpretation, speed up test runs, and provide clearer pass/fail outcomes. Poorly written steps cause confusion, rework for both humans and agents, and potential test failures.

Core Principles for Effective tests

  1. Abstract versus Specific tests and steps

    • Hercules is an agent so it can autonomously execute both the below kind of tests:
      Example 1:


      Example 2:



      We must note that the first example is more specific than the second example, where we ask the agent to specifically “click” on an element; whereas the second example is more abstract. While both examples work with Hercules, in the second example, Planner agent has to break down the steps incurring slightly higher LLM tokens. This also introduces slightly more determinism in the test execution. As the Planner agent creates the plan of execution based on the UI state, rather than following a prescribed path.

    • So which format should we follow?
      Its entirely based on the use case at hand. In the first example, we are testing out an ecommerce application (wrangler.in), which was developed completely in-house, so each step must be explicitly tested, so that example is more detailed. On the other hand, in the second example, as we know that Salesforce is a pre-packaged SaaS, therefore the lead creation could be written in a more abstract fashion, as for our use case, we are not testing the customizations on our Salesforce implementation for lead creation.


  2. Use of double back ticks or brackets
    It is always better to format the inputs for the agent to separate the instruction from input values.

    • Use inputs like username="vale" or username=[value]

    • This format helps  the Planner Agent parse steps with clarity.


  3. Use AAA format

    • The Arrange-Act-Assert (AAA) pattern is a simple yet effective way to structure tests, ensuring clarity and maintainability. In Gherkin, this maps naturally to Given-When-Then.

    • Arrange (Given) sets up the test by defining preconditions, like navigating to a page or preparing test data. 

    • Act (When) performs the key actions, such as typing input or clicking a button.

    • Assert (Then) verifies the expected outcome, like checking for a success message. For example, a login test would start with Given I navigate to "https://example.com", followed by When I enter my credentials and click login, and ending with Then I should see "login success message". 

    • Keeping each step concise and behavior-focused makes tests readable, reusable, and easy to maintain.


  4. Single responsibility:

    • Each test should focus on verifying a single behavior or functionality. This way, tests are easy to maintain and provide a very specific signal when they fail. Avoid testing multiple aspects of an application in one test case.

    • For example in the below test, we are only looking for one outcome.


      If we were to update it and add the below lines, wouldn’t it become confusing? So that is not recommended.




  5. Descriptive Naming

    • Names like “Submit” button and “First Name” input are helpful.

    • If there are repetitive or redundant elements on the screen, then specify the section of the web element. For example in the below section, if you need to interact with “Buy” input box under “Delivery equity” section then you can specify:
      When the user enters 5000 in the “Buy” input box under “Delivery equity” section

  • Avoid generic references like “that button” or “the field.”


  1. Parallel-Friendly & Self-Contained

    • Write scenarios so they can run independently without referencing external states or partial steps from other scenarios.

    • Use Gherkin’s Background tag to run pre-test fixtures.


  2. Amalgamated tests

    • Observe the below example from a test in our open source repository:



    • As we can see that this is an amalgamation of UI and API steps, hence we can have some overlap between UI and API steps. 

    • The agent smartly navigates between these steps and invokes the right tools

    • We dont recommend mixing UI and API tests, for example:

Gherkin Feature Organization

 Feature File Structure

A typical Gherkin feature file has the following structure:

  1. Feature Heading

    • Short description of the user story or functionality tested.

    • Example: Feature: User Account Registration


  2. Background (Optional)

    • Common preconditions that every scenario in the file requires.

    • Keep this minimal to avoid hidden dependencies.

    • Example:
      Background:

        Given I am on the home page

  1. Scenario or Scenario Outline

    • Scenario: For a single set of data.

    • Scenario Outline: For multiple data sets using examples or external data references.

    • Both of these terms work with Hercules.

Conclusion

Gherkin test scenarios for AI-driven execution work best when they blend clarity with enough flexibility to allow for slight abstraction. Having Given steps to define context—like being on a certain page or having certain data at hand—and When steps describing user or system actions lays out a clear sequence. The Then steps verify the expected outcome, ensuring each scenario retains sufficient detail for the AI to generate a robust plan while still allowing slightly vague steps (e.g., “When I create a new lead, then a new lead should be created”) that the AI can interpret and expand.

Even if a step sounds a bit abstract, there must still be enough context so the AI knows what fields to fill or what validations to perform. For instance, a scenario such as:

Scenario: Creating a new lead

  Given I am on the "Leads" page in the CRM

  When I create a new lead with name "John Smith" and email "smith@example.com"

  Then a new lead named "John Smith" with "smith@example.com" should be listed in the lead table

provides a clear setting and outcome, enabling the AI to break it down into atomic steps like “click New Lead,” “enter name,” “enter email,” and “verify lead creation.”

It’s also wise to avoid overly generic statements in either the When or Then steps. For example, “When I create a new lead, then a new lead should be created” can work—because it outlines an action and an expected result—but only if the context is defined (“Given I am on the Leads page” or “Given I have permission to create leads”). A scenario with too many unspoken assumptions (“Given I open the system, When I do something, Then I see success”) leaves the AI guessing. Striking the right balance between detail and abstract phrasing ensures the test scenario is both interpretable and flexible enough for dynamic or slightly vague steps.

You can find more examples of test cases at : https://github.com/test-zeus-ai/testzeus-hercules/tree/main/tests 

Happy Testing!

balance cost, quality and deadlines with TestZeus' Agents.

Come, join us as we revolutionize software testing with the help of reliable AI.

© 2025 Built with ❤️ in 🇮🇳. All Rights Reserved.

balance cost, quality and deadlines with TestZeus' Agents.

Come, join us as we revolutionize software testing with the help of reliable AI.

© 2025 Built with ❤️ in 🇮🇳. All Rights Reserved.

balance cost, quality and deadlines with TestZeus' Agents.

Come, join us as we revolutionize software testing with the help of reliable AI.

© 2025 Built with ❤️ in 🇮🇳. All Rights Reserved.