When Hidden Dependencies Clash: The TCMalloc, Restartable Sequences, and Kernel Compatibility Saga

Introduction

In the world of software development, the gap between how an API is documented and how it is actually used can become a minefield. The kernel community is currently navigating such a minefield, thanks to a classic manifestation of Hyrum's Law: any observable behavior of a system will eventually become a dependency for someone. This article explores the recent tensions between Linux kernel updates, restartable sequences, and Google's TCMalloc memory allocator, highlighting the delicate balance between progress and stability.

When Hidden Dependencies Clash: The TCMalloc, Restartable Sequences, and Kernel Compatibility Saga

Understanding Hyrum's Law

Hyrum's Law, often cited in software engineering, warns that even undocumented or unintentional behaviors can become relied upon by users. The kernel's recent experience with restartable sequences (rseq) is a textbook example. The documented API for rseq was preserved in a performance fix for the 6.19 release, yet that preservation proved insufficient because of a hidden reliance by TCMalloc on behaviors that were never part of the official contract.

Restartable Sequences and Their Performance Benefits

Restartable sequences are a low-level mechanism that allows user-space code to perform critical operations without needing heavyweight atomic instructions or locking. They work by marking a sequence of instructions as restartable: if the kernel preempts the thread during that sequence, it transparently restarts it from the beginning. This technique dramatically improves performance for allocators and other data‑structure operations. Both glibc and TCMalloc have used rseq to accelerate memory allocation and deallocation.

The 6.19 Kernel Changes to Restartable Sequences

In the 6.19 release, kernel developers introduced optimizations to the rseq subsystem to address performance regressions that had been observed in certain workloads. The changes strictly adhered to the documented specification: the public API was unchanged, all existing ioctl commands and flags remained valid, and the semantics were preserved. From the kernel's perspective, the update was backward‑compatible in every documented way.

TCMalloc's API Violation

However, Google's TCMalloc library had been relying on an undocumented side effect of the previous implementation. Specifically, TCMalloc assumed that a particular rseq control structure would remain in a certain state even when the application code did not explicitly set it—a behavior that was never guaranteed by the kernel API. With the 6.19 optimizations, this hidden assumption broke. Not only did TCMalloc itself fail, but its usage also prevented other software from using restartable sequences correctly, because the library's internal state clashed with the kernel's new behavior.

This situation is a direct illustration of Hyrum's Law: TCMalloc depended on an observable (but undocumented) behavior, and when that behavior changed, the whole system suffered.

The No‑Regressions Rule and Its Consequences

The Linux kernel project operates under a strict no‑regressions policy: any change that breaks user‑space is generally reverted unless a fix can be found quickly. This rule ensures that existing applications continue to work, but it also forces kernel developers to accommodate poorly‑behaved code. In this case, the TCMalloc violation meant that the 6.19 optimizations—which improved performance for many users—could not simply be rolled out as planned.

The kernel community had to find a way to support TCMalloc's existing (though non‑standard) usage while still moving forward. This required careful analysis of the rseq subsystem to identify exactly what TCMalloc relied on, and then designing a compatibility layer that would allow the new optimizations without breaking the existing binary behavior.

Finding a Way Forward

After extensive discussion, the kernel developers proposed a multi‑pronged solution:

Document the previously undocumented fields that TCMalloc depended on, making them a stable part of the ABI.
Add a new compatibility flag that allows applications to opt into the legacy behavior if they need it.
Work with Google to fix TCMalloc to use the official API properly, so that future kernel changes do not break it again.

This approach honors the no‑regressions rule while also encouraging better adherence to documented interfaces. It also serves as a cautionary tale: even the most well‑intentioned optimizations can be derailed by hidden dependencies.

Lessons for Developers

The TCMalloc incident underscores several important lessons:

Document your assumptions. If you rely on any observable behavior—even one not in the official API—that behavior becomes a de facto contract.
Test against bleeding‑edge kernels to catch regressions early, especially when using low‑level facilities like rseq.
The no‑regressions rule is a two‑edged sword: it protects users, but it can also force temporary workarounds for non‑standard usage.

Conclusion

The interplay between restartable sequences, TCMalloc, and Hyrum's Law highlights the complexity of maintaining operating system kernels in a world of diverse user‑space libraries. By understanding these dynamics, both kernel developers and application authors can work toward more robust and performant systems. The final resolution—a compatibility layer paired with long‑term fixes—shows that compromise and careful engineering can keep the Linux ecosystem stable while still moving forward.

For more details, see the original LWN article (subscription required).