Overview

TransCloud built the end-to-end data pipelines from multiple data sources to build the data warehouse.

Client:
Growing FinTech startup
Industry:
FinTech, SaaS
Services:
Data Warehouse Modernization, Automated Data Pipelines

Challenge

The Company provides financial solutions for India’s young & growing blue-collar workforce. The smartphone-linked credit-on-tap service for gig & contract workers is routed through their aggregators and employers, allowing users to access and spend funds within 3 minutes.

The Fintech SaaS product collects a lot of various information from user’s mobile data, application transactional data, app analytics data, etc., Building a robust data platform lets the downstream ML and analytics function reliably, which in turn can provide a better customer experience.

Goal

The current data architecture in AWS is not extendible and less maintainable. The data collected were stored, processed and consumed with AWS services like S3, Athena and Glue, etc., The team found it difficult to operate as the user base and data volume was increasing.

The team had clear thoughts to leverage serverless and managed services like Google BigQuery wherever appropriate to keep the pipeline and warehouse maintainable with lesser operational efforts.

Highlights

  • 1 million+ events per day and growing at a rate of 20% MoM
  • 100+ GBs of data from 3 different data sources
  • Fastens up the implementation of new Analytics solutions and ML models
  • The master data is made reliable with the process streamlined and standardized
  • With the adoption of serverless GCP services, the cost of the system is more relevant to the amount of usage
  • Ease of operations with centrally integrated logging, monitoring and error reporting

Technical Excellence

  • Assessed the data schema, processing scripts and the requirements
  • Analysed all the components in the current architecture to design the maintainable data warehouse with BigQuery to support the product and business needs
  • Architected the pipelines for all data sources
  • Leveraged serverless Cloud Workflows to orchestrate the pipelines
  • Implemented ELT based data warehouse where the raw gets stored and then processed in BigQuery
  • Single Window Monitoring of Resources, Latency, Performance, and Service Availability, which can provide escalated notifications
  • Various Google Cloud services such as BigQuery, Cloud Storage, Pub/Sub, Cloud Workflows, Cloud Scheduler, Cloud Functions, Cloud Firestore, Data Transfer Service, Cloud IAM, Logging, Monitoring and Error Reporting

Image Credits: https://techdaily.ca