Proven experience designing and implementing ETL pipelines in Databricks / Spark and Delta Lake.
Experience \- 8\+ years
Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.
Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
Good communication skills and experience working with domain experts to capture requirements. Preferred
Prior experience in pharma or clinical research environments.
Knowledge of data governance, privacy regulations and secure handling of patient data.
Experience with Unity Catalog, Databricks Delta Sharing, and cloud infrastructure (Azure/AWS).