Required Skills

llm

Job Description

About Pilotcrew AI Pilotcrew AI builds infrastructure for AI Agent Evaluation. We benchmark large language models, run automated agent evaluations, power human\-in\-the\-loop assessments, and host AI arenas for competitive testing. Our mission is to make AI agents measurable, reliable, and production\-ready through structured, scalable evaluation systems. Role Overview We are building a benchmark to evaluate the cybersecurity capabilities of frontier AI systems. We are looking for cybersecurity professionals who are comfortable working with LLMs to help design and validate challenging security tasks that current AI systems struggle to solve. If you have a solid security background and regularly use AI tools in your work, you're likely a good fit. Key Responsibilities **Create cybersecurity tasks and challenges that are difficult for frontier LLMs and AI agents to solve** Research real\-world security issues and translate them into well\-defined AI evaluation tasks **Use LLM agents to test your tasks and document where and why they fail** Write clear, concise descriptions of security scenarios suitable as prompts for AI systems **Validate that tasks have a correct, reproducible solution that can be verified** Work with the research team to refine tasks and improve benchmark quality Required Skills Cybersecurity **3\+ years of experience in any security domain application security, network security, penetration testing, vulnerability research, CTF, or similar** Ability to understand and explain how a security vulnerability or attack works **Comfortable reading code and identifying security issues across common languages Working with AI / LLMs** Regular hands\-on experience using LLMs (ChatGPT, Claude, Gemini, etc.) for technical work **Ability to evaluate whether an AI system has solved a problem correctly or not** Basic comfort with LLM APIs or AI\-assisted workflows Communication **Can write clear, precise task descriptions that leave no ambiguity about what the correct answer is** Self\-directed able to deliver quality work independently on a contract engagement Nice to Have **Experience with CTF competitions (web, pwn, reversing, crypto, or misc categories)** Familiarity with AI agent frameworks such as OpenHands, SWE\-agent, or Codex CLI **Background in bug bounty hunting, red teaming, or security research** Experience with automated testing or fuzzing tools **Any prior work in AI red\-teaming, adversarial evaluation, or AI safety research What We Value** Curiosity about security, about AI, and where the two meet **Bias toward action: you figure things out and deliver, rather than waiting for perfect clarity** Ownership mindset in a high\-autonomy environment **Clear communication when explaining complex problems** Comfort working in a fast\-paced startup with evolving requirements Why Join Pilotcrew AI **Work at the intersection of cybersecurity and frontier AI evaluation** Flexible remote contract with high ownership and autonomy **Direct exposure to state\-of\-the\-art LLMs and agentic AI systems** Strong performers will have the opportunity to continue beyond the initial month **Contribute to research that shapes how AI agents are tested and improved**

Pay: ₹352,138\.32 \- ₹1,630,854\.57 per year

Work Location: In person

Similar Jobs

Browse all jobs

Upload resume for AI match score

Job Overview

Job type: Full-time
Work mode: Remote
Location: Anywhere in India
Posted: 18h ago
Source: Indeed

LinkedIn 𝕏 / Twitter

CyberSecurity Benchmark Engineer

Required Skills

Job Description

Similar Jobs

Job Overview

Share