- *Cloud Infrastructure \& Site Reliability Engineer (DevOps \+ SRE \+ CloudOps)****MishiPay****Bangalore****Salary plus 20% bonus****Unlimited Holidays** **About Us****About MishiPay**
MishiPay is a global leader in **next\-generation checkout technology,** enabling seamless in\-store purchases via shoppers’ own smartphones or our minimalist, low capex kiosks.
We provide a complete checkout ecosystem \- spanning **self\-checkout, Mobile POS, Click \& Collect** and traditional tills \- allowing retailers to run their entire store checkout operation on MishiPay.
Our technology reduces operating costs, increases basket size and visit frequency, lowers abandonment, and unlocks rich, real\-time customer data, delivering both immediate commercial impact and deeper insight into the in\-store customer journey. We now have major international retail customers in the UK, Europe, Middle East and the USA, and we’re expanding! **About The Role**
We are looking for a proactive and detail\-oriented **Cloud Infrastructure \& SRE Engineer** to drive the scalability, reliability, and evolution of our cloud platform. This role demands strong ownership across infrastructure operations, CI/CD pipelines, cloud migration, observability, performance tuning, and incident response. You will also work closely with our InfoSec team to proactively identify and eliminate infrastructure vulnerabilities, ensuring compliance and security best practices.
- *It’s essential that you’ll have around 5 years of experience in both DevOps and SRE. You will have experience working on high\-scale platforms that serve millions of users or process large volumes of real\-time transactions with strict uptime and latency requirements. You will also have strong Azure and Kubernetes experience.** **Please also note the additional requirements listed below as we cannot consider anyone who doesn’t have what we require. This is a role for someone who can hit the ground running and take ownership immediately.**
You’ll work closely with the Director of Engineering and other squad members, alongside the Product, Payment, Security and Delivery teams achieving the roadmap which has been set against our top business priorities. You’ll work on getting rid of tech debt, deploy best in class systems and architecture and ensure that we can scale to 1000s of stores while maintaining system performance at over 99\.9% at the push of a button.
If you're a startup enthusiast with the required experience, who is passionate about solving complex problems and wants to learn something new everyday, we'd absolutely love to speak to you! **Key Responsibilities*** Own and operate Azure infrastructure across Dev, Staging, Production, and DR environments with 99\.999% uptime SLAs
- Lead incident response; own triage, resolution, and RCA for all production incidents
- Define and maintain SLOs, error budgets, and alerting policies; build observability coverage across logs, metrics, and traces (Datadog, Azure Monitor, Sentry)
- Manage AKS internals \- pods, deployments, ingress, autoscaling (HPA) \- and drive migration of VM/VMSS workloads to Kubernetes
- Own database operational health and performance tuning across PostgreSQL, MySQL, and MongoDB; manage backups and DR drills
- Build and maintain CI/CD pipelines with versioned deployments and environment isolation
- Manage Azure networking components including Application Gateway, Traffic Manager, and Cloudflare (DNS, CDN, WAF)
- Ensure infrastructure scales reliably with business growth while continuously optimising cloud costs through right\-sizing, cleanup, and spend monitoring
- Ensure security, compliance, and governance; collaborate with InfoSec on vulnerability remediation
- Implement and manage caching (Redis) and search/observability indexing (Elasticsearch)
- Lead cloud\-to\-cloud migration from Azure to GCP
- Support engineering, QA, and support teams with access to cloud infrastructure and databases
- *Required Experience*** 5\+ years of hands\-on experience managing production cloud infrastructure
- Strong Azure experience across AKS, VMs, VMSS, Networking, Storage, Monitoring and Security services
- Deep understanding of Site Reliability Engineering (SRE), SLOs, Error Budgets and Incident Management
- Hands\-on experience with Datadog, Azure Monitor, Prometheus, Grafana or ELK
- Strong Kubernetes (AKS), Docker and Helm experience
- Experience with PostgreSQL, MySQL and MongoDB performance tuning and operations
- Experience building CI/CD pipelines using GitHub Actions or Azure DevOps
- Infrastructure as Code using Terraform
- Strong scripting skills in Python and Bash
- Experience with Cloudflare, Redis and Elasticsearch
- Understanding of networking, security, VPNs, firewalls and cloud governance
- *Bonus Experience*** Experience with GCP (GKE, Cloud SQL, VPC)
- Azure or Kubernetes certifications
- Experience in retail\-tech, fintech or high\-scale product companies
- Startup or scale\-up experience
You’ll work with an inspirational multi\-cultural team, based in our Dubai HQ, the US, London and Bangalore, who are redefining the retail industry globally. We offer a tight\-knit, collaborative and exciting work environment, coupled with the opportunity to see the apps we develop live in action within some of the world’s largest retailers, impacting the lives of millions of shoppers. **We also offer:*** Bonuses
- Unlimited holidays
- Hybrid Working (36 days WFH a year)
- Monthly Employee Awards
- A small monthly training budget
- Company events
- Free lunch in Bangalore
- MacBook
- Career progression and a chance to work with some of the brightest minds in tech
- Options depending on experience level