
AeroSense Technologies
Case Study
About
AeroSense Technologies needed to modernize its data platform to support real-time predictive maintenance. With thousands of machines generating telemetry data every second, the existing batch-based system struggled with delays, scale, and data quality. A new architecture was required—one that could process streaming data with low latency, ensure data integrity, and scale effortlessly as data volumes grew.
The Challenge
The legacy platform processed data in hourly batches using Python scripts and traditional databases. As AeroSense scaled, this setup faced critical issues:​
​
-
Delayed insights: Anomalies were detected hours late, reducing the value of predictive analytics.
-
Scalability limits: VM-based infrastructure couldn’t keep up with increasing data velocity.
-
Data quality issues: Poor schema enforcement led to corrupted analytics.
-
High maintenance overhead: Frequent failures required manual intervention.​
A platform redesign was needed—one that was fast, reliable, and future-ready.
Solution
A modern lakehouse architecture was implemented on Azure using Databricks, Delta Lake, and Unity Catalog, following the Medallion pattern:
-
Bronze Layer - Raw JSON telemetry was streamed using Auto Loader, enabling efficient ingestion and schema evolution. All raw data was stored as-is in Delta format for full traceability.
-
Silver Layer - Streaming jobs cleaned, cast, and validated data, enforcing schema and filtering out corrupt records. This created a trusted dataset for analytics.
-
Gold Layer - Aggregated metrics (averages, max values, event counts) were generated and enriched with static reference data (e.g., location, device type). The result was a business-ready Delta table, optimized for Power BI and ML use.
Governance with Unity Catalog
Unity Catalog provided centralized governance across all layers:
-
Role-based access control
-
End-to-end data lineage
-
Centralized data discovery
Impact
The platform upgrade enabled true real-time anomaly detection and helped AeroSense unlock the full potential of its IoT data—transforming predictive maintenance from a reactive tool into a strategic capability.

Catalog & Schema Creation
This script performs the essential one-time setup for the entire data platform. It establishes the foundational governance structure by creating a Unity Catalog named aerosense_catalog and a dedicated reference_data schema.
Within this structure, it creates and populates a devices table with static information about each IoT sensor, such as its physical location and installation date. This small but critical reference table is used later in the Gold layer to enrich the high-volume streaming telemetry data, adding vital business context (like which factory a sensor belongs to) to the raw sensor readings for meaningful analysis.

Bronze Ingestion
This notebook is the entry point for all raw IoT data into the lakehouse. Using Databricks Auto Loader, it efficiently discovers and processes new JSON telemetry files from the raw storage container in near real-time. The script reads the data as a stream, adds essential metadata like the ingestion timestamp and source file name, and lands the raw, unaltered data into a Bronze Delta table. This creates a scalable and auditable historical record of all incoming sensor readings, serving as the foundation for all subsequent transformations and ensuring no data is ever lost.

Silver Cleansing
The Silver notebook focuses on data quality and standardization. It reads the continuous stream of raw data from the Bronze table and applies a series of critical cleansing transformations. This includes casting columns to their proper data types (e.g., string to timestamp, string to double), enforcing a consistent schema, and filtering out low-quality or incomplete records, such as those missing a device ID or timestamp. The output is a reliable, validated stream of data written to a Silver Delta table, creating an enterprise-wide source of truth that is ready for business-level analysis.

Gold Aggregation
This notebook creates the final, business-ready analytics assets. It consumes the clean data stream from the Silver table and performs time-windowed aggregations, calculating key metrics like hourly average temperature, maximum pressure, and event counts for each device. To add crucial business context, it enriches the aggregated data by joining it with the static device reference table to include information like sensor location. The resulting high-value, aggregated data is written to a Gold Delta table, optimized for high-performance querying by BI dashboards, data scientists, and other end-users.

Data Lineage
Data lineage provides a complete, end-to-end map of your data's journey. It tracks the data from its origin (e.g., a raw source file), follows it through all the transformations and processing steps (like the Bronze, Silver, and Gold notebooks), and shows where it ends up (e.g., in a final analytics table or dashboard).
In this project, Unity Catalog automatically captures this lineage. This is crucial for trust and debugging, as it allows you to instantly see how a final metric was calculated and what source data it came from, ensuring full auditability and transparency.