Data Engineering

Sign in Subscribe

Latest — Aug 12, 2025

Three Formats Walk into a Lakehouse: Iceberg, Delta and Hudi in a Local Setup You Can Run on Your Laptop

I still remember building an entire bookkeeping system in FoxPro for my university coursework back in 2007. Planning every table structure upfront, carefully designing indexes because adding them later meant locking everything up for a reindex, managing backups manually, worrying about file corruption. Every schema change was a small crisis.

More issues

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

It all began with a fairly normal data pipeline. Events were coming in through Kafka, landing in AWS S3 as Parquet files after going through some lightweight microbatch processing. It looked clean at first glance. Efficient. Predictable. But one day I opened one of the hourly folders and saw the

You’re Not the First. You Won’t Be the Last. But We Survive

A WeCoded 2025 Story. I didn’t grow up dreaming of being in tech - it just happened, one decision at a time. I was born in a provincial city in Belarus in a family of an engineer and an economist. We lived in a marginal neighborhood - not in

Who Wins Dev.to Challenges? (And The Hidden Benefits No One Talks About)

At the beginning of this year I made a decision - I was going to participate in every Dev.to challenge. Well, almost every challenge. Let’s be honest, frontend isn’t my thing, so I happily skipped those. But everything else? Fair game. Why? Because after writing on Dev.

Scaling Data Analytics: Building a Starter Kit with Neon, Airflow, and Streamlit

Setting up a data analytics project from scratch can be a headache. You need a database, a way to pull in data, and a dashboard to make sense of it all. I wanted to make that process easier—something lightweight, flexible, and beginner-friendly. That’s how I ended up building

Handling Dates in Argo Workflows

Working with dates in Argo Workflows can range from simple to unexpectedly challenging depending on your needs. Whether you're scheduling tasks, extracting specific parts of a date, or looping through hours for complex workflows there’s plenty you can achieve—but figuring it out might take some trial

Logs Don’t Lie: Debugging My Data Engineering Crisis in 2025

Every data pipeline has its breaking point. Mine came in late 2024, throwing errors I couldn’t ignore. Logs showed signs of stagnation, over-processing, and the need for a refreshed perspective. But here’s the thing: I don’t really have anything to complain about. I hold a classical Software

Goodbye Netlify, Hello Ghost: My Blog's New Home

In October 2021, I published my first blog post on Medium: Pub/Sub to BigQuery: How to Build a Data Pipeline Using Dataflow, Apache Beam, and Java. The feedback I received was incredible, and it motivated me—I wanted a personal blog, a space where I could share my thoughts

Who’s Really Following You on Dev.to? A Guide to Analyzing Your Audience

The reason I’m writing this post is to shed some light on an aspect of Dev.to that many of us don’t think twice about: our followers. We put so much effort into creating content, hoping it resonates with readers and builds our community, but have you ever

12 Steps to Organize and Maintain Your Python Codebase for Beginners

The reason why I’m writing this post is to share some insights on keeping a project clean, even with lots of contributors. This is especially important for data engineers, given the ever-changing nature of data and the processing demands in Python libraries and applications. The title might sound a