CLOUDSUFI, a Google Cloud Premier Partner, is a global leading provider of data\-driven digital transformation across cloud\-based enterprises. With a global presence and focus on Software \& Platforms, Life sciences and Healthcare, Retail, CPG, financial services and supply chain, CLOUDSUFI is positioned to meet customers where they are in their data monetization journey. **Our Values**
We are a passionate and empathetic team that prioritizes human values. Our purpose is to elevate the quality of lives for our family, customers, partners and the community.**Equal Opportunity Statement**
CLOUDSUFI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified candidates receive consideration for employment without regard to race, colour, religion, gender, gender identity or expression, sexual orientation and national origin status. We provide equal opportunities in employment, advancement, and all other areas of our workplace. Please explore more at https://www.cloudsufi.com/ **About the Role**We are looking for a hands\-on **Senior Platform Engineer** to own the foundation that our product and ML teams build on top of: the cloud infrastructure, CI/CD and GitOps delivery, and the AI/ML serving platform that powers our products. You will sit at the intersection of platform engineering, MLOps, and full\-stack delivery — designing self\-service infrastructure, operating production LLM and inference workloads at scale, and partnering with application teams who ship frontend, backend, and data services daily. This is a senior, high\-autonomy role for someone who has done this before and can set technical direction for a platform team. **What You’ll Do*** + Design, build, and operate cloud\-native infrastructure on **GCP and AWS** — networking (VPC, DNS, load balancing, service mesh), compute, and multi\-environment provisioning.
+ Own **GitOps delivery pipelines** (ArgoCD/FluxCD) and CI/CD (GitHub Actions, CircleCI, Jenkins, Cloud Build) with blue\-green and canary strategies across 50\+ microservices.
+ Build and scale the **AI/ML serving platform**: GPU provisioning and orchestration, model serving and inference endpoints, and the infrastructure behind RAG, agentic, and tool\-calling workloads.
+ Author reusable **Infrastructure\-as\-Code** modules (Terraform, Terragrunt, Helm, Ansible/Puppet) and drive self\-service developer platforms that cut onboarding and provisioning time.
+ Establish **observability and reliability** practices — Prometheus, Grafana, cloud monitoring, SLOs, alerting, and incident response — to meet 99\.9%\+ uptime targets.
+ Embed **DevSecOps**: vulnerability scanning (SonarQube, Snyk), secrets management (HashiCorp Vault), IAM/RBAC, and SOC 2 / enterprise compliance into the pipeline.
+ Collaborate closely with full\-stack teams, contributing to **backend services and frontend tooling** and reviewing application architecture for scalability and operability.
+ Mentor engineers, set platform standards, and act as a technical point of escalation for production issues.
+ **Cloud:** Deep production experience with GCP and/or AWS, including networking, IAM, and cost management.
+ **Containers \& Orchestration:** Strong Kubernetes (GKE/EKS), Docker, Helm, Kustomize, Ingress, and service mesh (Istio).
+ **CI/CD \& GitOps:** Proven delivery automation with ArgoCD/FluxCD and at least one of GitHub Actions, CircleCI, Jenkins, or Cloud Build.
+ **IaC:** Expert\-level Terraform (\+ Terragrunt/Terraform Cloud) and configuration management (Ansible or Puppet).
+ **AI/ML platform:** Hands\-on experience deploying and operating LLM / ML workloads — model serving, inference APIs, GPU compute, and supporting RAG or agentic systems in production.
+ **Observability \& Security:** Prometheus/Grafana, SLO\-driven operations, and DevSecOps tooling (Snyk, SonarQube, Vault, IAM/RBAC).
+ **Backend:** Solid programming in **Python** (FastAPI a plus) and comfort building/operating REST APIs and microservices.
+ **Databases:** Strong fundamentals across relational (**PostgreSQL**, MySQL) and in\-memory/cache (Redis); schema design, tuning, and operations.
+ Experience with model\-serving stacks (vLLM, Triton, KServe, Ray) and MLOps tooling (Vertex AI, Kubeflow, MLflow).
+ Vector databases and LLM frameworks (LangChain, LlamaIndex).
+ Real\-time / streaming systems (WebRTC, WebSockets) and artifact management (Artifactory, GAR, ECR).
+ Experience building developer platforms / internal developer portals and leading platform teams.
+ Direct impact on architecture and the autonomy to choose the right tools.
+ Senior\-level scope with room to set platform standards and mentor the team.
Sen. Mobile App Tester
Testvox · Mumbai
GenAI / AI-ML Engineer
Premier IT Solutions · Ghaziabad
Network SME (Subject Matter Expert)
TalentNest Solutions · Mumbai