- *Job Title**\- Lead Data Engineer (Databricks)
- *Location** – Gurugram/Chennai India (Onsite\- 5 days/week)
- *Employment Type** – Fulltime
We are seeking a highly experienced and motivated **Principal / Lead Data Engineer** to join our dynamic data platform team in either our **Gurgaon or Chennai office**. The successful candidate will be a critical player in designing, building, and optimizing our next\-generation data architecture and pipelines.
This role requires expert\-level proficiency in PySpark and SQL, alongside a proven track record of architecting scalable, high\-performance ETL/ELT processes. You will transform vast amounts of raw data into high\-quality, actionable insights for analytics, reporting, and Machine Learning. Given the seniority of this role, we are looking for a seasoned leader and immediate joiner who can hit the ground running and contribute significantly from day one.
- *Data Pipeline Development \& Optimization**
- **Design and Build:** Architect, develop, and maintain robust, scalable, and fault\-tolerant ETL/ELT pipelines for ingesting data from diverse sources (e.g., databases, APIs, streaming sources) into our data lake and data warehouse.
- **PySpark Expertise:** Write and optimize complex data transformation jobs using PySpark and the Spark DataFrame API to process petabytes of structured and unstructured data efficiently.
- **SQL Mastery:** Utilize Advanced SQL for complex querying, data manipulation, stored procedures, performance tuning, and optimizing database schema design in relational and analytical databases.
- **Data Quality \& Governance:** Implement data validation, cleansing, and monitoring routines to ensure high data quality, integrity, and adherence to security and governance standards.
- *Architecture and Infrastructure**
- **Data Modeling:** Design and implement optimal data models (e.g., Dimensional Modeling, Data Vault, Snowflake Schema) for our data warehouse to support business intelligence and analytical needs.
- **Cloud Integration:** Drive cloud\-native data solutions primarily leveraging Azure Databricks, Azure Data Lake, and Synapse (or comparable frameworks like AWS S3/Redshift and Google BigQuery) to build and deploy data solutions.
- **Automation:** Implement orchestration tools like Apache Airflow, Azure Data Factory, or AWS Step Functions to automate data workflows and manage pipeline dependencies.
- *Collaboration and Operational Excellence**
- **Cross\-Functional Leadership:** Collaborate closely with Data Scientists, Data Analysts, Product Managers, and Business Stakeholders to understand data requirements and translate them into high\-level technical specifications.
- **Monitoring \& Support:** Monitor, troubleshoot, and resolve critical issues in production data pipelines, ensuring maximum uptime and timely data delivery.
- **Best Practices \& Mentorship:** Lead code reviews, enforce strict coding standards, mentor junior engineers, and contribute to the continuous improvement of development and deployment practices (CI/CD, Git).
- *Required Technical Skills (Mandatory)**
- **Certification:** **Active Azure Databricks Certification (e.g., Databricks Certified Data Engineer Associate/Professional).**
- **PySpark:** Expert\-level, hands\-on experience in developing, tuning, and optimizing large\-scale data processing applications using PySpark (Python for Apache Spark).
- **SQL:** Mastery of Advanced SQL (including window functions, complex joins, stored procedures, and query performance tuning) across various database systems (e.g., Snowflake, Redshift, PostgreSQL).
- **Programming:** Strong proficiency in Python for scripting, automation, and general data manipulation libraries (e.g., Pandas).
- **Big Data Architecture:** Deep understanding of Big Data concepts, distributed systems architecture, data lakes, and modern data warehousing principles.
- **ETL/ELT:** Proven experience in designing and implementing enterprise\-grade ETL/ELT pipelines.
- *Preferred Qualifications (Good to Have)**
- Hands\-on experience with wider Azure ecosystem components (Azure Data Factory, Azure Synapse, Key Vault).
- Familiarity with workflow orchestration tools like Apache Airflow.
- Experience with real\-time/streaming data processing (e.g., Spark Structured Streaming, Kafka, or Event Hubs).
- Advanced knowledge of Data Governance, Data Cataloging, and Data Security best practices.
Pay: ₹1,200,093\.57 \- ₹1,956,668\.30 per year
License/Certification
- Databricks Certified Data Engineer Associate/Professional (Required)
Work Location: In person