top of page
Abstract Futuristic Background

AeroSense Technologies

Case Study

About

AeroSense Technologies needed to modernize its data platform to support real-time predictive maintenance. With thousands of machines generating telemetry data every second, the existing batch-based system struggled with delays, scale, and data quality. A new architecture was required—one that could process streaming data with low latency, ensure data integrity, and scale effortlessly as data volumes grew.

The Challenge

The legacy platform processed data in hourly batches using Python scripts and traditional databases. As AeroSense scaled, this setup faced critical issues:​

​

  • Delayed insights: Anomalies were detected hours late, reducing the value of predictive analytics.

  • Scalability limits: VM-based infrastructure couldn’t keep up with increasing data velocity.

  • Data quality issues: Poor schema enforcement led to corrupted analytics.

  • High maintenance overhead: Frequent failures required manual intervention.​

 

A platform redesign was needed—one that was fast, reliable, and future-ready.

Solution

A modern lakehouse architecture was implemented on Azure using Databricks, Delta Lake, and Unity Catalog, following the Medallion pattern:

  • Bronze Layer - Raw JSON telemetry was streamed using Auto Loader, enabling efficient ingestion and schema evolution. All raw data was stored as-is in Delta format for full traceability.

  • Silver Layer - Streaming jobs cleaned, cast, and validated data, enforcing schema and filtering out corrupt records. This created a trusted dataset for analytics.

  • Gold Layer - Aggregated metrics (averages, max values, event counts) were generated and enriched with static reference data (e.g., location, device type). The result was a business-ready Delta table, optimized for Power BI and ML use.

Governance with Unity Catalog

Unity Catalog provided centralized governance across all layers:

  • Role-based access control

  • End-to-end data lineage

  • Centralized data discovery

Impact

The platform upgrade enabled true real-time anomaly detection and helped AeroSense unlock the full potential of its IoT data—transforming predictive maintenance from a reactive tool into a strategic capability.

Screenshot 2025-07-16 at 10.47.37 AM.png

Catalog & Schema Creation

This script performs the essential one-time setup for the entire data platform. It establishes the foundational governance structure by creating a Unity Catalog named aerosense_catalog and a dedicated reference_data schema.

Within this structure, it creates and populates a devices table with static information about each IoT sensor, such as its physical location and installation date. This small but critical reference table is used later in the Gold layer to enrich the high-volume streaming telemetry data, adding vital business context (like which factory a sensor belongs to) to the raw sensor readings for meaningful analysis.

Bronze

Bronze Ingestion

This notebook is the entry point for all raw IoT data into the lakehouse. Using Databricks Auto Loader, it efficiently discovers and processes new JSON telemetry files from the raw storage container in near real-time. The script reads the data as a stream, adds essential metadata like the ingestion timestamp and source file name, and lands the raw, unaltered data into a Bronze Delta table. This creates a scalable and auditable historical record of all incoming sensor readings, serving as the foundation for all subsequent transformations and ensuring no data is ever lost.

Silver

Silver Cleansing

The Silver notebook focuses on data quality and standardization. It reads the continuous stream of raw data from the Bronze table and applies a series of critical cleansing transformations. This includes casting columns to their proper data types (e.g., string to timestamp, string to double), enforcing a consistent schema, and filtering out low-quality or incomplete records, such as those missing a device ID or timestamp. The output is a reliable, validated stream of data written to a Silver Delta table, creating an enterprise-wide source of truth that is ready for business-level analysis.

Gold

Gold Aggregation

This notebook creates the final, business-ready analytics assets. It consumes the clean data stream from the Silver table and performs time-windowed aggregations, calculating key metrics like hourly average temperature, maximum pressure, and event counts for each device. To add crucial business context, it enriches the aggregated data by joining it with the static device reference table to include information like sensor location. The resulting high-value, aggregated data is written to a Gold Delta table, optimized for high-performance querying by BI dashboards, data scientists, and other end-users.

Lineage

Data Lineage

Data lineage provides a complete, end-to-end map of your data's journey. It tracks the data from its origin (e.g., a raw source file), follows it through all the transformations and processing steps (like the Bronze, Silver, and Gold notebooks), and shows where it ends up (e.g., in a final analytics table or dashboard).

In this project, Unity Catalog automatically captures this lineage. This is crucial for trust and debugging, as it allows you to instantly see how a final metric was calculated and what source data it came from, ensuring full auditability and transparency.

bottom of page