Required Skills

pythonawsazuresqlpostgresqlmachine learninggit

Job Description

*Job Title**\- Lead Data Engineer (Databricks)
*Location** – Gurugram/Chennai India (Onsite\- 5 days/week)
*Employment Type** – Fulltime

*Role Overview**

We are seeking a highly experienced and motivated **Principal / Lead Data Engineer** to join our dynamic data platform team in either our **Gurgaon or Chennai office**. The successful candidate will be a critical player in designing, building, and optimizing our next\-generation data architecture and pipelines.

This role requires expert\-level proficiency in PySpark and SQL, alongside a proven track record of architecting scalable, high\-performance ETL/ELT processes. You will transform vast amounts of raw data into high\-quality, actionable insights for analytics, reporting, and Machine Learning. Given the seniority of this role, we are looking for a seasoned leader and immediate joiner who can hit the ground running and contribute significantly from day one.

*Key Responsibilities**

*Data Pipeline Development \& Optimization**

**Design and Build:** Architect, develop, and maintain robust, scalable, and fault\-tolerant ETL/ELT pipelines for ingesting data from diverse sources (e.g., databases, APIs, streaming sources) into our data lake and data warehouse.
**PySpark Expertise:** Write and optimize complex data transformation jobs using PySpark and the Spark DataFrame API to process petabytes of structured and unstructured data efficiently.
**SQL Mastery:** Utilize Advanced SQL for complex querying, data manipulation, stored procedures, performance tuning, and optimizing database schema design in relational and analytical databases.
**Data Quality \& Governance:** Implement data validation, cleansing, and monitoring routines to ensure high data quality, integrity, and adherence to security and governance standards.

*Architecture and Infrastructure**

**Data Modeling:** Design and implement optimal data models (e.g., Dimensional Modeling, Data Vault, Snowflake Schema) for our data warehouse to support business intelligence and analytical needs.
**Cloud Integration:** Drive cloud\-native data solutions primarily leveraging Azure Databricks, Azure Data Lake, and Synapse (or comparable frameworks like AWS S3/Redshift and Google BigQuery) to build and deploy data solutions.
**Automation:** Implement orchestration tools like Apache Airflow, Azure Data Factory, or AWS Step Functions to automate data workflows and manage pipeline dependencies.

*Collaboration and Operational Excellence**

**Cross\-Functional Leadership:** Collaborate closely with Data Scientists, Data Analysts, Product Managers, and Business Stakeholders to understand data requirements and translate them into high\-level technical specifications.
**Monitoring \& Support:** Monitor, troubleshoot, and resolve critical issues in production data pipelines, ensuring maximum uptime and timely data delivery.
**Best Practices \& Mentorship:** Lead code reviews, enforce strict coding standards, mentor junior engineers, and contribute to the continuous improvement of development and deployment practices (CI/CD, Git).

*Required Technical Skills (Mandatory)**

**Certification:** **Active Azure Databricks Certification (e.g., Databricks Certified Data Engineer Associate/Professional).**
**PySpark:** Expert\-level, hands\-on experience in developing, tuning, and optimizing large\-scale data processing applications using PySpark (Python for Apache Spark).
**SQL:** Mastery of Advanced SQL (including window functions, complex joins, stored procedures, and query performance tuning) across various database systems (e.g., Snowflake, Redshift, PostgreSQL).
**Programming:** Strong proficiency in Python for scripting, automation, and general data manipulation libraries (e.g., Pandas).
**Big Data Architecture:** Deep understanding of Big Data concepts, distributed systems architecture, data lakes, and modern data warehousing principles.
**ETL/ELT:** Proven experience in designing and implementing enterprise\-grade ETL/ELT pipelines.

*Preferred Qualifications (Good to Have)**

Hands\-on experience with wider Azure ecosystem components (Azure Data Factory, Azure Synapse, Key Vault).
Familiarity with workflow orchestration tools like Apache Airflow.
Experience with real\-time/streaming data processing (e.g., Spark Structured Streaming, Kafka, or Event Hubs).
Advanced knowledge of Data Governance, Data Cataloging, and Data Security best practices.

Pay: ₹1,200,093\.57 \- ₹1,956,668\.30 per year

License/Certification

Databricks Certified Data Engineer Associate/Professional (Required)

Work Location: In person

Similar Jobs

Browse all jobs

Upload resume for AI match score

Job Overview

Job type: Full-time
Work mode: On-site
Location: Ghaziabad
Posted: 1d ago
Source: Scraped

LinkedIn 𝕏 / Twitter

Data Engineer (Lead)

Required Skills

Job Description

License/Certification

Similar Jobs

Job Overview

Share