
PROJECTS








Note - All case studies, company names, and data presented on this site are fictitious and created solely for the purpose of showcasing my skills and expertise.
Optimizing Data Ingestion at OmniLogistics Inc.
Demonstrates how Azure Data Factory enables incremental file processing, audit logging, and alerting for a logistics data pipeline. Failure alerts, archival strategies, and CI/CD via GitHub Actions ensure traceability and operational resilience. The approach supports compliance and reduces compute costs by processing only new or changed files in a high-volume, multi-source logistics environment.
Harmonizing Clinical Data at FusionPharma Analytics
Explores how ADF Mapping Data Flows handle diverse formats like CSV, XML, and JSON for harmonizing clinical trial and EHR data. The solution supports schema drift, data quality validation, and secure hybrid ingestion from on-prem systems. It produces standardized, analytics-ready outputs essential for safety monitoring, regulatory submissions, and trial optimization in global pharmaceutical research.

Modernization for FinanceCo Global
Highlights a Databricks Lakehouse solution for financial analytics with SCD modeling, performance tuning, and structured development workflows. Unity Catalog enforces governance through fine-grained access controls and lineage tracking. The architecture enables scalable, compliant reporting with improved performance, replacing legacy systems and supporting critical financial operations like regulatory reporting, reconciliation, and month-end close with high reliability.
Building a Data Lakehouse for AeroSense with Azure Databricks and Unity Catalog
Showcases a real-time data platform built on Databricks Structured Streaming and Auto Loader to process high-volume sensor data. The solution leverages schema inference, bronze-silver-gold architecture, autoscaling clusters, and Unity Catalog for governance. It supports predictive maintenance by enabling low-latency insights and cost-efficient, scalable processing of industrial IoT telemetry.
Engineering a Data-Driven Future at CityHop
This case presents a SQL-based analytics solution addressing 11 key business questions around revenue, rider behavior, and data quality. Using structured queries over trips, riders, and drivers data, it delivers insights on trends, loyalty, anomalies, and usage patterns. The approach showcases how pure SQL modeling within the lakehouse can support fast, decision-ready insights without relying on external BI tools or transformation frameworks.
Engineering a 'Trending Now' Chart for a Podcast Platform
This case demonstrates a first-principles approach to engineering a data product by building a "Trending Now" chart for a fictional podcast platform using PySpark. The solution transforms raw listening events into sophisticated metrics for popularity (7-day rolling average) and momentum (week-over-week growth) using window functions.
Unifying Retail Sales Data
To solve a fictional retailer's fragmented data problem, I developed a PySpark application to ingest, harmonize, and unify siloed online and in-store sales data. This code-based processing logic creates a single source of truth to prevent poor inventory and marketing decisions. The solution pivots data for a direct, side-by-side KPI comparison, delivering actionable insights to identify high-performing "Channel Stars" and drive a cohesive, data-driven retail strategy.
GDPR Compliance Engine in MS Fabric
This project is an enterprise-grade solution for GDPR compliance, built in Microsoft Fabric. It solves the complex challenge of data erasure on a live platform that ingests new data daily. Its unique two-pipeline architecture decouples the incremental data ingestion, powered by PySpark MERGE, from the on-demand erasure logic, which uses surgical SQL commands. This creates a scalable, resilient, and fully auditable system that automates a critical, high-risk business function, ensuring complete compliance with data privacy regulations.






