Required Skills

androidgollmpytorchrest

Job Description

*About Sarvam**

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India’s full\-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India’s leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

*About the Role**

Own the technical architecture of Sarvam’s on\-device products end\-to\-end \- from model export and chipset\-specific runtime selection up through the OS\-layer voice input integration on Windows, macOS, and Android. You set the standards every other engineer on the team works against, you are the technical interface to OEM partners (Qualcomm, Intel, NVIDIA, AMD, Apple), and you are accountable for hitting the published latency, footprint, and accuracy targets across every supported chipset.

This is a player\-coach role. We expect you to write code, debug at the kernel/driver layer when needed, and review every workbook before publication \- but your highest\-leverage work is in setting the architecture and unblocking the team.

*What You’ll Do**

End\-to\-end latency and footprint budgets across all targets \- Memory \+ runtime SLAs on NPUs / CPUs / GPUs.
Runtime selection strategy per chipset: when OpenVINO vs. ONNX Runtime\+EP, when TensorRT vs. CUDA\-direct, when QNN vs. LiteRT, when CoreML vs. CPU fallback. The decision matrix is your deliverable.
Model export pipeline: how models go from PyTorch to every target runtime, with shared infrastructure where possible and per\-runtime customization where necessary.
xPU selector and graceful\-degradation logic: probing host capabilities, driver\-version compatibility, fallback paths when the user picks an unavailable backend.
OEM technical relationships: you are the technical face to Qualcomm, Intel, NVIDIA, AMD, and Apple counterparts. You explain perf wins and losses, escalate driver issues, and influence their roadmaps where we have leverage.
Tech Lead duties for the optimization team: hiring bar, technical roadmap, weekly architecture reviews, mentorship of the three senior engineers.

*What We're Looking For**

8\+ years on ML systems, with at least 3 years shipping on\-device inference in production. Resume should show models actually running on user devices, not just internal demos.
Genuine depth in at least three of: TensorRT/CUDA, OpenVINO, QNN/SNPE, CoreML/ANE, ONNX Runtime EP development, llama.cpp/MLC. Reading\-level fluency in the rest.
Production experience with streaming inference \- KV cache management, chunked attention, encoder\-decoder cache projection, partial output emission. ASR or LLM streaming both qualify.
Has shipped against hard latency budgets (sub\-second E2E) on heterogeneous hardware. Knows where the time actually goes \- capture, preprocessing, model, post\-processing, OS\-layer paste \- and how to budget across them.
Has built or owned an xPU/runtime selection system \- capability probing, driver\-version handling, graceful fallback. This is rare and we will weight it heavily.
Cross\-team technical leadership track record. You have set standards across multiple ICs without becoming a bottleneck.

*Bonus Points**

Direct prior collaboration with Qualcomm, Intel, NVIDIA, or Apple on inference performance issues.
OS\-layer integration experience: Windows IME, macOS Accessibility/IME APIs, Android InputMethodService.
ASR\-specific optimization experience (Whisper, Conformer, RNN\-T, or similar).
Indic\-language ML systems experience.

*Why Sarvam?**

Sarvam is a fast\-moving, high talent\-density team building full\-stack AI for India, working on problems that push the frontiers of AI with real population\-scale impact.

Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
High ownership and high impact, from day one
Everything we do is AI\-first, from the way we build and ship to the way we think about problems
You can work on problems that could change how an entire country learns, works, and communicates

If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.

Similar Jobs

Browse all jobs

Upload resume for AI match score

Job Overview

Job type: Full-time
Work mode: On-site
Location: Bengaluru
Posted: 1d ago
Source: Indeed

LinkedIn 𝕏 / Twitter

Architect, On-Device Inference

Required Skills

Job Description

Similar Jobs

Job Overview

Share