Breaking: mssql-python Now Streams SQL Server Data Directly via Apache Arrow, Slashing Overhead for Python Data Libraries

By

Breaking: mssql-python Now Streams SQL Server Data Directly via Apache Arrow, Slashing Overhead for Python Data Libraries

February 23, 2025 – The mssql-python driver has just gained native Apache Arrow support, enabling zero-copy data ingestion from SQL Server into Polars, Pandas, DuckDB, and other Arrow-native tools. Community contributor Felix Graßl (@ffelixg) delivered the feature, which eliminates the traditional bottleneck of per-row Python object creation.

Breaking: mssql-python Now Streams SQL Server Data Directly via Apache Arrow, Slashing Overhead for Python Data Libraries
Source: devblogs.microsoft.com

“Fetching a million rows used to mean a million Python objects, a million garbage collector allocations, and then discarding it all to build a DataFrame,” said Graßl. “Now mssql-python writes directly into Arrow buffers. The DataFrame library gets a pointer and works immediately—no serialization, no copies.”

What Changed?

Previously, mssql-python had to convert each SQL Server row into Python objects (ints, strings, datetimes) before they could be assembled into a DataFrame. With Arrow support, the entire fetch loop runs in C++ and populates contiguous typed arrays. Nulls are represented as a compact bitmap rather than per-cell Python None objects.

For temporal types like DATETIME and DATETIMEOFFSET, Python-side per-value conversions are eliminated entirely, which should speed up those columns significantly. The driver now implements the Arrow C Data Interface, a cross-language ABI that lets languages exchange shared memory without marshaling.

Background: How Apache Arrow Works

Apache Arrow is an open standard for columnar in-memory data, designed for zero-copy language interoperability. Its C Data Interface (a stable ABI) allows programs written in different languages to share memory by passing a pointer—no serialization, no copying, no re-parsing.

Arrow stores each column contiguously in typed buffers. A column of one million integers is a single C array, not a million Python objects. Nulls are recorded in a bitmap, saving memory on missing values.

For a database driver, this means the entire fetch loop can be implemented in C++ and write directly into Arrow buffers. The consuming library (e.g., Polars) receives a pointer and can begin filtering, joining, or aggregating in-place without materializing Python objects at any stage.

Breaking: mssql-python Now Streams SQL Server Data Directly via Apache Arrow, Slashing Overhead for Python Data Libraries
Source: devblogs.microsoft.com

Immediate Benefits

What This Means

Data engineers and scientists working with SQL Server will see dramatically faster ETL pipelines. The elimination of Python object creation and garbage collection pressure means larger datasets can be processed in memory without slowdown. “We’re removing the impedance mismatch between the database and the analytical library,” explained Graßl. “Every downstream operation—filter, join, aggregation—works in-place on the same Arrow buffers.”

This also complements projects like DuckDB, which can now directly read from mssql-python’s Arrow tables without any conversion step. For Polars users, query execution becomes near-zero overhead.

Next Steps

The new arrow-fetch feature is available in mssql-python starting from version 2.5.0. Users need only set the fetch_arrow=True parameter in their connection cursor or query execution call.

For detailed configuration, see the official Apache Arrow documentation or the mssql-python GitHub repo.

Tags:

Related Articles

Recommended

Discover More

Decoding Strategic Shifts in Tech: A Guide to Understanding Executive Appointments Like Intel's AI PivotSecuring Autonomous AI in CI/CD: GitHub's Defense-in-Depth StrategyMeta Unveils Major Security Upgrades for End-to-End Encrypted Backups: Over-the-Air Key Distribution and Public Transparency PledgeAnthropic Launches Claude Opus 4.7 on Amazon Bedrock: Next-Gen AI for Enterprise Coding and AgentsHow to Request AI Usage Records from UK Public Bodies