DRM Xe and the Computational Nightmare

This text begins as an investigation note. I am still learning about DRM Xe, so I prefer to publish the process as it happens rather than waiting for a definitive write-up.

I have always been interested in understanding the internals of software, but that kind of curiosity demands time beyond willingness. Last year, I bought an Asus Zenbook S14 for work, equipped with an Intel Core Ultra 7 258V and 32 GB of integrated RAM. Since the machine came with Windows 11, I quickly set up dual boot with Fedora to closely follow kernel and driver updates.

The peace did not last long: under heavy CPU usage, the laptop would completely freeze, requiring forced reboots. It was that inconvenience that pushed me toward the low level. What began as a search for the cause of the freezes quickly became something more interesting: understanding how recent hardware and software meet in the kernel and, perhaps, how to contribute to the maturation of that platform.

Along the way, I found DRM Xe, Intel’s newest Direct Rendering Manager. It is the successor to i915 on recent architectures and is expected to centralise support for Intel’s next GPU generations on Linux. Although development started in 2021, it already shows itself as a mature driver, built by Intel engineers who actively contribute to the Open Source ecosystem.

Diving into Xe made me realise that managing a modern iGPU (integrated) imposes enormous challenges, especially in memory management and power states (D-states). Unlike i915, which carried years of “legacy,” Xe proposes a clean codebase but introduces complex abstractions like the GuC (Graphics Microcontroller) and the HuC.

The GuC, in particular, is its own universe I did not even know existed. It is a microcontroller integrated into the SoC responsible for orchestrating command submission and GPU scheduling. For this purpose, it uses the GGTT (Global Graphics Translation Table), a table that allows the microcontroller to map and access system RAM. The biggest surprise, however, was discovering that some GuC units use ARM or RISC-V architectures. It makes perfect sense: these are far more energy-efficient architectures for that function than x86 would be.

This discovery of a “computer inside the computer” managing resources made several questions surface:

1. If user processes use the GPU, who controls the memory (VRAM) they use?

A: The driver itself. It maintains internal structures and virtual address spaces that ensure isolation between contexts. The driver knows exactly who “owns” each allocation to guarantee one process cannot access another’s data, but it is agnostic to the content — to the driver, it matters little whether those bits are a texture or a mathematical computation.

2. And how does the driver use DRAM in a shared way to create VRAM?

A: In integrated GPUs, we use UMA (Unified Memory Architecture). There is a complex mapping system so the GPU sees parts of the system RAM as if it were its own local memory. This dynamic mapping is a dense topic I intend to detail in future posts.

3. How is a program’s interface transformed into pixels on screen?

A: This is a point I am still studying in depth, but it involves resources inherited and shared with i915. The driver is responsible for managing BOs (Buffer Objects), chunks of memory containing raw data, and orchestrating their display through KMS (Kernel Mode Setting), controlling which monitor to update and at what frequency.

Answering those questions led me to another conclusion: if there are mappings for processes, there are execution contexts. And who controls the switching of those contexts at the software level is, again, the driver.

To better visualise where DRM Xe fits into this flow, I drew this macro pipeline:

Basic rendering pipeline, showing the location of DRM Xe between User Space and Hardware.

Interestingly, although the system sees the iGPU as a PCI device, in Lunar Lake this communication bypasses the physical PCIe wires and runs through an internal silicon fabric (Fabric), which explains why a bug here brings the system down so quickly: we are sharing the same main data highway as the CPU.

It is important to remember that this “resource manager” role of the driver is vital not just for video, but for multiprocessing via GPU kernels — an essential feature for Machine Learning models, for example.

The path to contribution

But how do you enter this game? I found that development for new contributors happens in a branch focused on integration: drm-tip. Although there are internal processes that keep the main kernel in parity with this branch, it is in drm-tip that new features are validated.

Intel’s own engineers review patches and suggest improvements. To ensure stability, there is a massive test suite (like IGT GPU Tools) and a repository for running automated live tests on real hardware, validating each change before it is even merged into the main code.

I know this is only the first post on this topic and that much remains to be explained. My biggest challenge right now is exactly this vastness of details, but the goal is clear: document everything until I land my first patch in DRM.

Follow along this journey.

1. If user processes use the GPU, who controls the memory (VRAM) they use?

2. And how does the driver use DRAM in a shared way to create VRAM?

3. How is a program’s interface transformed into pixels on screen?

The path to contribution

Discussion