When You Can't Find the Bug: Architecting Around Production Issues
This is Part 2 of a series. Read Part 1: Pandas vs Polars in Production - Performance Comparison for the background on the Polars migration. After migrating from Pandas to Polars, CPU performance improved dramatically—but a memory problem persisted. Despite extensive debugging, I couldn’t identify the root cause. So I made a pragmatic decision: architect around it. This is the story of splitting a monolithic Python application into a Go orchestration service with Python workers, not because I fully understood the problem, but because I needed production to be stable. ...