We are seeking an experienced Data Engineer with strong expertise in Databricks and PySpark to design, develop, and optimize scalable data pipelines and cloud\-based data platforms. The ideal candidate should have hands\-on experience with big data technologies, cloud services, ETL/ELT processes, and modern data engineering practices.
Required Skills \& Experience
6–8 years of experience in Data Engineering, Data Warehousing, and Big Data solutions.
Strong hands\-on experience with **Databricks**, **PySpark**, and **Apache Spark**.
Expertise in building and maintaining large\-scale **ETL/ELT pipelines**.
Strong proficiency in **Python** and **SQL**.
Experience with **Delta Lake**, **Unity Catalog**, and Databricks Workflows.
Hands\-on experience with cloud platforms such as **Azure**, **AWS**, or **Google Cloud Platform (GCP)**.
Experience with cloud storage solutions:
Azure Data Lake Storage (ADLS)
Amazon S3
Google Cloud Storage
Knowledge of data ingestion tools and frameworks.
Experience with **Azure Data Factory (ADF)**, **AWS Glue**, or similar ETL orchestration tools.
Strong understanding of **Data Lake**, **Data Warehouse**, and **Lakehouse Architecture**.
Key Responsibilities
Design, develop, and optimize scalable data pipelines using Databricks and PySpark.
Build and maintain data ingestion, transformation, and processing frameworks.
Develop batch and real\-time data processing solutions.
Implement Delta Lake and Lakehouse architecture best practices.
Collaborate with Data Scientists, Analysts, and Business stakeholders to deliver data solutions.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Create and maintain data models, data marts, and enterprise data warehouses.
Implement monitoring, logging, and troubleshooting processes for data platforms.
Ensure data quality, governance, security, and compliance standards are maintained.
Participate in code reviews, architecture discussions, and technical design sessions.
Preferred Qualifications
Experience with **Azure Databricks** is highly preferred.
Knowledge of **Snowflake**, **Redshift**, **BigQuery**, or Synapse Analytics.
Experience with Infrastructure as Code (Terraform).
Databricks, Azure, AWS, or GCP certifications are a plus.
Experience in Agile/Scrum environments.
Mandatory Technologies
*Databricks, PySpark, Apache Spark, Python, SQL, Delta Lake, Data Lake, ETL/ELT, Cloud Platform (Azure/AWS/GCP), Airflow, Git, CI/CD, Kafka, Data Warehousing.**