Job Title: Data Engineer
- *Experience:** 4\+ Years
- *Work Shift:** 2:00 PM – 10:00 PM
- *Domain Requirement:** Pharmaceutical / Healthcare / Life Sciences
About the Role
We are seeking an experienced Healthcare Data Engineer / Clinical Data Scientist to support the development of high\-quality real\-world evidence (RWE) datasets. The ideal candidate will work closely with clinical Subject Matter Experts (SMEs) to implement clinical rules, engineer patient\-level datasets, and integrate structured and unstructured healthcare data sources. This role requires strong expertise in healthcare data, clinical event algorithms, and large\-scale data processing.
Key ResponsibilitiesClinical Rule Implementation
- Translate SME\-designed clinical rules into scalable, reproducible data pipelines operating against centralized healthcare data lakes.
- Implement protocol\-driven clinical event definitions and business logic.
Structured \& Unstructured Data Integration
- Engineer patient\-level features using medical claims, pharmacy claims, EMR/EHR data, laboratory results, and NLP\-derived outputs.
- Integrate structured and unstructured healthcare data to improve clinical data completeness and accuracy.
Disease\-Specific Dataset Development
* Build and maintain disease\-focused datasets, including
- Cohort identification and construction
- Index date determination
- Treatment sequencing
- Clinical event labeling
- Outcome tracking
Line of Therapy (LOT) Algorithm Development
- Design and implement line\-of\-therapy algorithms that address real\-world treatment complexities, including:
- Combination regimens
- Treatment gaps
- Dose modifications
- Switching patterns
- Off\-label therapy usage
NLP Signal Integration
* Incorporate NLP\-derived clinical signals such as
- Diagnosis mentions
- Disease staging
- Biomarker results
- Disease progression indicators
- Combine NLP outputs with structured claims and EMR data to enhance dataset quality.
Data Quality Assurance
- Develop and maintain data quality validation frameworks and reports.
- Conduct sample\-level audits to verify clinical logic and dataset accuracy.
- Identify anomalies and recommend corrective actions.
Cross\-Functional Collaboration
- Partner closely with clinicians, epidemiologists, data scientists, and SMEs.
- Participate in iterative reviews to refine clinical rules, identify logic gaps, and improve data outputs.
Required Qualifications
- 4\+ years of experience in Data Science, Health Data Engineering, Biostatistics, Real\-World Data (RWD), Health Economics \& Outcomes Research (HEOR), or related life sciences domains.
- Strong proficiency in SQL and Python (preferred) or R.
- Hands\-on experience working with:
- Medical Claims Data
- Pharmacy Claims Data
- EMR/EHR Data
- Experience building patient\-level healthcare datasets at scale.
- Familiarity with NLP\-generated outputs and integrating unstructured clinical information into structured datasets.
- Proven experience implementing clinical event algorithms from protocol\-level specifications.
- Strong analytical, problem\-solving, and data validation skills.
Preferred Qualifications
- Experience working with cloud\-based data platforms such as Snowflake, Redshift, or Databricks.
- Familiarity with OMOP CDM or other healthcare data models.
- Experience with oncology datasets or specialty disease areas.
- Knowledge of Real\-World Evidence (RWE) methodologies and healthcare analytics.
- Ability to work effectively within a pod\-based, highly collaborative clinical environment.
Technical Skills
- SQL
- Python / R
- Healthcare Claims Data
- EMR/EHR Data
- NLP Integration
- Clinical Data Modeling
- Cohort Building
- Line of Therapy Algorithms
- Data Quality \& Validation
- Snowflake / Databricks / Redshift (Preferred)
- OMOP CDM (Preferred)
Interested candidates can share their updated resume to
- *jagadeesh@kasmoprav.com**
Pay: ₹70,000\.00 \- ₹90,000\.00 per month
Work Location: Remote