<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Data-Processing on Asadbek Kurbonov</title><link>https://advenn.github.io/blog/tags/data-processing/</link><description>Recent content in Data-Processing on Asadbek Kurbonov</description><generator>Hugo -- 0.148.0</generator><language>en-us</language><lastBuildDate>Mon, 24 Nov 2025 22:00:00 +0100</lastBuildDate><atom:link href="https://advenn.github.io/blog/tags/data-processing/index.xml" rel="self" type="application/rss+xml"/><item><title>When You Can't Find the Bug: Architecting Around Production Issues</title><link>https://advenn.github.io/blog/posts/go-python-architecture/</link><pubDate>Mon, 24 Nov 2025 22:00:00 +0100</pubDate><guid>https://advenn.github.io/blog/posts/go-python-architecture/</guid><description>&lt;p>&lt;em>This is Part 2 of a series. Read &lt;a href="../pandas-vs-polars-in-production/">Part 1: Pandas vs Polars in Production - Performance Comparison&lt;/a> for the background on the Polars migration.&lt;/em>&lt;/p>
&lt;hr>
&lt;p>After migrating from Pandas to Polars, CPU performance improved dramatically—but a memory problem persisted. Despite extensive debugging, I couldn&amp;rsquo;t identify the root cause. So I made a pragmatic decision: architect around it.&lt;/p>
&lt;p>This is the story of splitting a monolithic Python application into a Go orchestration service with Python workers, not because I fully understood the problem, but because I needed production to be stable.&lt;/p></description></item><item><title>Pandas vs Polars in Production: Performance Comparison</title><link>https://advenn.github.io/blog/posts/pandas-vs-polars-in-production/</link><pubDate>Sun, 23 Nov 2025 23:02:39 +0100</pubDate><guid>https://advenn.github.io/blog/posts/pandas-vs-polars-in-production/</guid><description>&lt;p>When performance bottlenecks started affecting my production data pipeline, I decided to test whether Polars could deliver on its performance promises. This is what I learned from migrating a real production workload from Pandas to Polars.&lt;/p>
&lt;h2 id="the-workload">The Workload&lt;/h2>
&lt;p>The application was a data aggregation service running as a Kubernetes pod with the following constraints:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Resources&lt;/strong>: 2 CPUs, 3 GB RAM&lt;/li>
&lt;li>&lt;strong>Execution frequency&lt;/strong>: Every 2-2.5 minutes&lt;/li>
&lt;li>&lt;strong>Data volume&lt;/strong>: 5,000-7,000 rows × 100-150 columns per run&lt;/li>
&lt;li>&lt;strong>Operations&lt;/strong>: Multiple database calls, API requests, DataFrame merges, arithmetic operations (additions, multiplications), and group-by aggregations&lt;/li>
&lt;li>&lt;strong>Web server&lt;/strong>: FastAPI with Uvicorn handling production traffic&lt;/li>
&lt;/ul>
&lt;p>All operations were properly vectorized-no row-by-row iteration. The pipeline combined data from various sources into a single DataFrame, transformed it, and output the results.&lt;/p></description></item></channel></rss>