Jun 15User Notification Throttling System: Design and Implementation with Confluent KafkaDesigning an effective user notification throttling system is crucial for applications that need to manage large-scale user notifications while ensuring optimal performance and user experience. This article explores the design and implementation of such a system using Confluent Kafka, a powerful distributed streaming platform. By leveraging Kafka’s capabilities, we propose…Kafka6 min readKafka6 min read
May 5End-to-End ETL: How to Streamline Data Transfer from Freshdesk to Snowflake with Mage and build Streamlit AppIn this article let us see how we can use Mage and perform ETL (extract, transform, load) to transfer data from Freshdesk, a customer support software, to Snowflake, a cloud-based data warehousing platform. …Etl6 min readEtl6 min read
Apr 28Building a Streamlit app on the data residing in datalakehouse using Deltalake and DuckDbIn today’s data-driven world, businesses are generating and collecting massive amounts of data from various sources. However, managing and analyzing this data is becoming increasingly challenging as data is spread across various systems and platforms. …Duckdb5 min readDuckdb5 min read
Feb 3In this article lets see how DuckDB is a game changer in analytical world and how it helps data…In this article lets see how DuckDB is a game changer in analytical world and how it helps data scientists and analysts in doing quick computations. What is DuckDB ? DuckDB is an open-source, relational database management system optimized for analytical workloads. It uses a columnar storage layout, vectorized query execution, and advanced query…Data4 min readData4 min read
Dec 13, 2022Generate Sample Data using FakerIn this article lets see how we can generate sample data in CSV or Json format that will be needed for our quick testing of Data Pipeline jobs. Faker is a Python package that generates fake data for us. Whether you need to bootstrap your database, create good-looking XML documents…Python2 min readPython2 min read
Dec 13, 2022Moving data from BigQuery to On-premise HiveIn this article let’s discuss the solution for the use case that we received from our customer, Can you help us move some 200+ tables from BigQuery back to On-Prem Hive ? Initially when we read about this use case in an email, we thought it’s a typo from our…Bigquery3 min readBigquery3 min read
Published inDev Genius·Nov 7, 2022Real-time Streaming Analytics with Databricks Delta Live TableIn this story let us understand how useful the streaming analytics is with a real world example. Let us see how can we detect COVID cases in certain area using tools/technologies like with Apache Kafka, Databricks Delta Live Tables, and streaming data analytics. This is a story on one of…Dlt4 min readDlt4 min read
Nov 2, 2022How to create BigQuery tables on top of external data?In this article let’s see how we can create external Big query table on top of the files present in the google storage bucket using python client. Script location: https://github.com/abhr1994/Personal_Scripts/tree/main/BigqueryTables We would need google-cloud-bigquery library installed before running the script. This can be installed by running below command, pip install…Bigquery1 min readBigquery1 min read
Nov 2, 2022SFTP File WatcherIn this article let’s discuss about a use-case where the need is to trigger the ingestion pipeline for the data files that land in the SFTP server at random times. As the file landing time in random it is not possible to use conventional schedulers like cron, Autosys etc. I…Etl4 min readEtl4 min read
Jul 9, 2021Delta Lake Streaming InternalsIn this article, we are going to see how to use Delta lake as a streaming source & also as a sink along with the internals of delta streaming. Introduction Many organisations have started using Lakehouse architecture in their data pipelines and Delta as the default storage format. Delta lake is…Databricks3 min readDatabricks3 min read