Bangalore

4-5y

Full-time

AI Engineer

TestZeus is pioneering the next generation of AI-powered software testing. We’re the team behind Hercules, the world’s first open-source testing agent. By combining large language models, multi-agent orchestration, and state-of-the-art retrieval pipelines, we deliver autonomous, zero-maintenance testing for web and API workloads. We are seeking an AI Engineer (4–5 years experience) who can design, build, and scale production-grade LLM systems. This is a hands-on product role where you’ll work closely with backend, frontend, and product teams to ship quickly, test with real users, and iterate in production. If you enjoy building RAG systems, prompt workflows, and agentic evaluation that solve real business problems, this role is for you.

Bangalore

4-5y

Full-time

AI Engineer

TestZeus is pioneering the next generation of AI-powered software testing. We’re the team behind Hercules, the world’s first open-source testing agent. By combining large language models, multi-agent orchestration, and state-of-the-art retrieval pipelines, we deliver autonomous, zero-maintenance testing for web and API workloads. We are seeking an AI Engineer (4–5 years experience) who can design, build, and scale production-grade LLM systems. This is a hands-on product role where you’ll work closely with backend, frontend, and product teams to ship quickly, test with real users, and iterate in production. If you enjoy building RAG systems, prompt workflows, and agentic evaluation that solve real business problems, this role is for you.

Bangalore

4-5y

Full-time

AI Engineer

TestZeus is pioneering the next generation of AI-powered software testing. We’re the team behind Hercules, the world’s first open-source testing agent. By combining large language models, multi-agent orchestration, and state-of-the-art retrieval pipelines, we deliver autonomous, zero-maintenance testing for web and API workloads. We are seeking an AI Engineer (4–5 years experience) who can design, build, and scale production-grade LLM systems. This is a hands-on product role where you’ll work closely with backend, frontend, and product teams to ship quickly, test with real users, and iterate in production. If you enjoy building RAG systems, prompt workflows, and agentic evaluation that solve real business problems, this role is for you.

Key Responsibilities

1. LLM Workflow & Prompt Engineering

  • Design, build, and maintain LLM workflows that score freeform answers, generate contextual feedback, and assist users in real time.

  • Develop and refine prompt templates, chains, and tools to improve relevance, reduce token usage, and mitigate hallucinations.

  • Implement multi-step prompt workflows for tasks like mock interviews, code reviews, and automated test guidance.

2. Retrieval & RAG Pipelines

  • Build and optimise retrieval-augmented generation (RAG) pipelines using vector stores (e.g., Pinecone, Weaviate, Elasticsearch).

  • Implement embedding generation, similarity search, and dynamic context selection that reduce hallucinations and improve answer quality.

  • Ensure low-latency, high-accuracy retrieval combined with LLM generation for highly personalised user experiences.

3. LLM Evaluation & Analytics

  • Define and implement evaluation frameworks for LLM outputs: accuracy, consistency, bias, interpretability, and robustness.

  • Build automated evaluation pipelines that monitor performance over time, detect regression, and flag failure modes.

  • Instrument systems with metrics and logging to understand model behaviour in production and drive data-informed decisions.

4. Agent-Based System Development

  • Build tool-augmented agents capable of evaluating coding, system design, or reasoning questions using frameworks like LangChain, AutoGen, LlamaIndex, or similar.

  • Design and experiment with agent orchestration patterns (multi-agent workflows, planners, evaluators) to improve multi-step reasoning and reliability.

  • Integrate agents with external tools and APIs (code execution, documentation search, test runners, etc.) to extend capabilities.

5. Cross-Functional Collaboration & Product Impact

  • Partner with backend engineers (Go, FastAPI) and frontend engineers (React) to ship features end-to-end.

  • Work closely with product managers and designers to shape user journeys, gather feedback, and iterate quickly in production.

  • Participate in agile ceremonies (standups, sprint planning, retrospectives) and provide clear, consistent status updates.

6. Research, Experimentation & Innovation

  • Stay current with state-of-the-art LLM and retrieval research, benchmarks, and open-source tools.

  • Rapidly prototype new ideas (e.g., advanced retrieval strategies, custom fine-tuning flows, new evaluation methods) and demonstrate feasibility.

  • Contribute to internal best practices, playbooks, and reusable components for LLM and agent development.

Qualifications & Skills

Education

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent professional experience.


Experience

  • 4–5 years of professional software engineering experience, with a strong focus on Python.

  • At least 2 years of hands-on experience building and deploying LLM-powered applications in production (beyond toy projects).


Core Skills

LLM & Prompt Engineering

  • Proven experience designing prompt workflows, tools, and chains for production use.

  • Familiarity with OpenAI, Claude, and/or open-weight LLMs (e.g., Hugging Face ecosystem).

  • Experience with LangChain, LlamaIndex, AutoGen, or equivalent frameworks for chains/agents.

Retrieval & RAG

  • Built at least one end-to-end RAG pipeline integrating vector search (Pinecone, Weaviate, Elasticsearch, or similar) with LLMs.

  • Understanding of embeddings, similarity search, reranking, and dynamic context selection for reducing hallucinations.

Evaluation Frameworks

  • Experience defining LLM output quality metrics (accuracy, consistency, bias, interpretability).

  • Implemented automated evaluation or testing pipelines to monitor LLM behaviour and aggregate performance statistics.


Python Engineering

  • Strong Python skills and experience building production services using frameworks like FastAPI, Flask, or similar.

  • Good understanding of data preprocessing, ETL flows, and integration patterns for AI/ML components.

  • Familiarity with testing, observability, and integration testing for AI systems.


Collaboration & Product Mindset

  • Demonstrated ability to work in cross-functional teams (backend, product, UX) within an Agile/Scrum environment.

  • Ability to translate research concepts and technical trade-offs to non-technical stakeholders.

  • Product mindset: you care deeply about user experience, measurable impact, and iteration speed.


Preferred

  • Hands-on experience with vector databases & semantic search (Pinecone, Weaviate, pgvector, Elasticsearch, etc.).

  • Domain exposure to developer tooling, QA/testing platforms, edtech, or hiring/assessment products.

  • Experience with fine-tuning LLMs or building lightweight custom models.

  • Interest in growing into a Founding AI Lead role as TestZeus scales.


Why Join TestZeus?

  • Real Impact: Own and shape the AI systems that power our flagship testing platform, impacting quality for thousands of users.

  • Cutting-Edge Environment: Work with SOTA LLMs, agents, and retrieval systems, and get space to prototype bold ideas.

  • High Ownership: Small, talented team where your decisions and code directly shape the product and company direction.

  • Learning & Growth: Regular tech talks, deep dives into research, and support for conferences, workshops, and open-source work.

  • Bangalore – Work From Office: Collaborate closely with the team, iterate quickly, and build a strong product culture together.


Application Process

To apply, please share the following details with us:

  1. Your CV

  2. Your Current and Expected CTC

  3. Months of experience building LLM applications and/or AI agents

  4. Links to Public Work (e.g., GitHub, Medium, personal website, open-source contributions)

📬 Send everything to: hiring@testzeus.com

We’re excited to review your application!

// Start testing //

balance cost, quality and deadlines with TestZeus' Agents.

balance cost, quality and deadlines with TestZeus' Agents.