Data Engineering on Unstructured Dataset Using AWS for a US-Based Home Automation Company
Overview
Our client is a US-based OEM producing HVAC equipment, water heaters, and boilers for residential and commercial buildings. They needed a secure and economical solution for large data set analysis.
Download Case Study
Challenges
Structure the format of 120K+ live devices that send 40GB data per day, and it is expanding
Designing a scalable, secure, and cost-effective solution with flexible architecture and
Solution
- Secure, scalable and flexible architecture having serverless computing and storage
- Data collection, Extract-Transform-Load (ETL) and Data pipeline
- Data Catalog and Database management
- Project-based implementation with Infrastructure as Code (IaC)
- CloudFormation script for Infrastructure management
- CICD pipeline using GitHub Action
- Data collection script to transfer MongoDb data to S3
- Developed ETL script to convert unstructured data to a structured format
- Dashboard development to display and monitor field devices’ data
- SSO authentication
- Analyze and visualize data
Outcomes
- Enabled faster data transformation by dividing day execution into hour execution for time series data
- Designed pipelines to process data for 1 year that continuously delivered meaningful insights to client