### **Job Information**
Hiring Manager
- *Full time**Work Experience
- *Bengaluru**State/Province
- *560041**### **Job Description**
Job description
Job Title: SRE Manager
Location: Bangalore
Department: IT\-Dev
Experience \- 15\+ Years Industry Experience \| 8\+ Years Relevant Experience
Client – JIO
Role Overview
We are looking for an experienced SRE Manager to lead the reliability engineering function
responsible for operating and scaling mission\-critical infrastructure across cloud, networking,
and data centre environments.
You will lead a team responsible for building highly reliable, scalable, and automated
production platforms supporting enterprise and telecom\-scale workloads. This role requires
deep expertise in distributed systems, cloud infrastructure, networking, and observability
along with strong leadership capabilities.
Key Responsibilities
Reliability Engineering
- Build and lead the Site Reliability Engineering (SRE) function focused on reliability,
scalability, and operational excellence.
- Define and manage SLIs, SLOs, and SLAs for critical production systems.
- Drive improvements in system resilience, fault tolerance, and performance
optimization.
Production Infrastructure \& Platform Operations
- Ensure stability of large\-scale production environments across cloud and data centers.
- Manage reliability of distributed platforms, microservices environments, and
containerized systems.
- Support architecture teams in designing highly scalable infrastructure platforms.
Incident Management \& Operational Excellence
- Lead major incident response and outage management across infrastructure and
platform services.
- Establish incident management frameworks, escalation processes, and postmortem
practices.
- Drive root cause analysis and reliability improvements.
Automation \& DevOps
- Drive automation initiatives to reduce operational overhead.
- Implement Infrastructure as Code (IaC) and automated infrastructure provisioning.
- Improve CI/CD pipelines and operational workflows.
Observability \& Monitoring
- Design and maintain observability platforms including monitoring, logging, and tracing
systems.
- Establish real\-time operational visibility across infrastructure, applications, and
networks.
- Build dashboards and analytics to measure system performance and reliability.
Team Leadership
- Lead and mentor SRE engineers, platform engineers, and reliability teams.
- Build a culture of automation, engineering excellence, and reliability\-first mindset.
- Collaborate with DevOps, network engineering, NOC, cloud, and architecture teams.
Technical Expertise
- Strong experience with cloud platforms such as AWS, Microsoft Azure, or Google Cloud.
- Deep understanding of distributed systems and microservices architecture.
- Experience with container orchestration platforms such as Kubernetes and Docker.
- Knowledge of core networking technologies including BGP, VXLAN, EVPN, and SD\-WAN.
- Experience with observability and monitoring platforms such as Prometheus, Grafana,
ELK, Datadog, or similar tools.
- Familiarity with infrastructure automation tools such as Terraform, Ansible, or similar
frameworks.
- Understanding of ISP/telecom infrastructure, network operations, and large\-scale
traffic environments.
- Experience with data centre infrastructure, virtualization, and private cloud
environments.