Required Skills

awsazureci/cdgcpkubernetesmachine learningpython

Job Description

We're hiring a Full Stack Software Engineer to build the infrastructure that powers our AI agents and ML systems end\-to\-end — from UX/UI, fine\-tuning foundation models to shipping production\-grade agent harnesses. You'll work across the stack: Creating UX design and UI in ReactJS/TS, building MLOps pipelines, customizing LLMs, and deploying scalable agent systems on Kubernetes. This role sits at the intersection of UX design, ML engineering, platform engineering, and applied AI.

Design UX and build UI for Agentic Ops
Design and build agent harnesses in Python — the runtime scaffolding that enables AI agents to perceive, reason, plan, and act reliably
Develop and maintain a robust MLOps framework using Kubeflow and complementary tooling (MLflow, Argo, Airflow, or similar) to orchestrate training, evaluation, and deployment workflows
Fine\-tune foundation LLMs using techniques such as LoRA/QLoRA, SFT, and RLHF; manage datasets, training runs, and evaluation pipelines
Deploy and operate services on Kubernetes, including model serving, autoscaling, and observability
Build and integrate AI agents using modern agent frameworks (LangGraph, CrewAI, AutoGen, LlamaIndex, or similar)
Apply software engineering rigor — SOLID principles, secure coding, static analysis, code reviews, and CI/CD — across all deliverables

Bachelor’s or Master’s degree in Engineering, along with around 8\+ years of experience in Python development, including building and supporting production systems
Hands\-on experience working with agent\-based or agentic systems, using at least one framework such as LangGraph, CrewAI, AutoGen, LangChain, or LlamaIndex
Exposure to designing or contributing to MLOps pipelines, with familiarity with tools like Kubeflow
Practical experience in fine\-tuning large language models (for example, open\-source models like Llama, Mistral, Qwen, or similar)
Experience deploying containerized applications on Kubernetes, including areas like Helm, operators, networking, and resource management
Familiarity with at least one major cloud platform (AWS, GCP, or Azure), including services related to compute, storage, identity access management, and machine learning

Understanding of software engineering practices such as modular design (SOLID principles), design patterns, secure coding practices, static analysis tools (for example, mypy, ruff, Bandit, SonarQube), and testing approaches (unit and integration testing)

*Nice to Have:**

Exposure to distributed training approaches, using tools such as DeepSpeed, FSDP, or Accelerate
Familiarity with vector databases, retrieval\-augmented generation (RAG) systems, and evaluation frameworks for language models
Experience working with model serving solutions such as vLLM, TGI, KServe, or Triton

Similar Jobs

Browse all jobs

Upload resume for AI match score

Job Overview

Job type: Full-time
Work mode: On-site
Location: IN
Posted: 2d ago
Source: Indeed

LinkedIn 𝕏 / Twitter

Senior AI Engineer

Required Skills

Job Description

Similar Jobs

Job Overview

Share