Using EGL and the dma_buf kernel framework to associate two textures with the contents of the same buffer without copy taking place

It’s been a few weeks I’ve been experimenting with EGL/GLESv2 as part of my work for WebKit (Browsers) team of Igalia. One thing I wanted to familiarize with was using shared DMA buffers to avoid copying textures in graphics programs.

I’ve been experimenting with the dma_buf API, which is a generic Linux kernel framework for sharing buffers for hardware access across multiple device drivers and subsystems, using EGL and GLESv2.

Buffer sharing using the dma-buf mechanism

Let’s first see how the dma_buf Linux kernel framework could be used for content sharing, a generic case:

When a driver A (importer) wants to use buffers created by a driver B (exporter):

▶ Driver B (exporter) must be able to implement the dma_buf API operations for the API, allocate and share the buffer, decide about the actual backing storage when the allocation happens, and take care of any migrations of the scatterlist for all shared users (importers) of the buffer.

▶ Driver A (importer) doesn’t need to worry about how the buffer is allocated or where, but needs a mechanism to get access to the scatterlist that makes up this buffer in memory mapped into its own address space, so it can access the same area of memory.

Note that the importer and the exporter can be the same driver.

Some libraries that could be used to make sharing easier

EGL provides its own mechanisms to export and import the DMA buffers using a file descriptor through some extensions. I’ve used them in my experiments where I created multiple GLESv2 contexts and textures and made them share their data using dma_buffers instead of copy operations.

A note: From the description right above it might sound like shared context would be a more generic approach for these experiments. But according to EGL reference there’s a quite important EGL restriction to consider here:
“[…] all rendering contexts that share data must themselves exist in the same address space. Two rendering contexts share an address space if both are owned by a single process […]”
As we were looking for a more flexible mechanism that would fit multiple processes too, using DMA buffers was a much preferable and generic solution (proposed by my colleague Žan Doberšek).

There is another useful library that can be used to allocate and share DMA buffers: libgbm. GBM (or Generic Buffer Manager), is a memory allocator for device buffers. It provides an API for other buffer operations too (e.g. memory mapping). A libgbm implementation can be found on mesa.

I might write more about this library in some follow up post. For the moment let’s focus on my very first experiment with DMA buffers that was a quite simple (and single-process) case of buffer sharing with EGL.

EGL extensions to import and export dma buffers

There are some EGL extensions that can be used to share dma buffers:

EGL_MESA_image_dma_buf_export allows creating a Linux dma_buf file descriptor (or multiple in the case of multi-plane YUV image) from an EGLImage which can then be used to create another EGL_Image using EGL_EXT_image_dma_buffer_import.

EGL_EXT_image_dma_buf_import_modifiers can be used to import an image’s modifiers used for tiling, compression, and additional non-linear modes. It also adds support for a fourth auxiliary plane, and queries for the implementation-supported types.

For these extensions to support imports of GL_TEXTURE_EXTERNAL_OES a GLESv2 version that supports GL_OES_EGL_image_external is also required to be present.

In the example that follows I’ve used them to export some texture data from a context and import them into another without copying them.

A simple use case of `EGL/dma_buf` extensions

I’ve performed a few experiments with EGL and DMA buffers. In my first and simplest one, I used the native EGL/GLESv2 driver to allocate and exchange the dma_buf buffers but two different contexts.

I’ve written a program where two contexts (let’s call them A and B) allocate two GLESv2 textures ( texA and texB).

Context A creates an EGL_Image imgA from texA and exports the file descriptor of the corresponding dma_buf buffer.
Context B imports the dma_buf to its own EGL_Image imgB using its file descriptor.
Context B renders to texB / imgB a XOR pattern.
Context A displays texA in an X11 window.

Expected result is to see texB contents appear on the X11 window surface where I’ve mapped texA: as the same dma_buf serves a backing storage for both textures, rendering a pattern on texture texB should fill the dma_buffer, and mapping texA on the X11 window’s surface should made that pattern visible on screen as now both textures share the same buffer contents.

No copy of texBdata was required to fill texA.

In this example, I’ve used a single process only to keep things simple. To extend the program to use multiple processes (one per context for example), we would need some sort of inter-process communication (e.g.: unix sockets) to exchange the dma_buf file descriptor. But the rest of the code would have been the same.

[UPDATE:] After I had written this post, I’ve found a very short example that does exactly this: exchanges a dma_buf FD across different processes. There’s a link to it at the end of this post. As you can see, the rest of the code is very similar to the example I’m going to describe right away.

Example

The full source code of this example can be found in this https://github.com/hikiko/shctx/tree/wip/egl-to-egl-dma-exchange-working. (Branch contains some extra files I’ve used in my previous experiments and is in draft state. The most relevant code is in src/main.cc.)

Let’s start with an overview of main:

int main(int argc, char **argv)
{
    if (!init()) {
        fprintf(stderr, "Failed to initialize contexts.\n");
        return 1;
    }

    if (!gl_init())
        return 1;

    for (;;) {
        XEvent xev;
        XNextEvent(xdpy, &xev);
        if (!handle_xevent(&xev))
            break;
        if (redraw_pending) {
            redraw_pending = false;
            display();
        }
    }

    cleanup();
    return 0;
}

int main(int argc, char **argv)

{

if (!init()) {

fprintf(stderr, "Failed to initialize contexts.\n");

return 1;

}

if (!gl_init())

return 1;

for (;;) {

XEvent xev;

XNextEvent(xdpy, &xev);

if (!handle_xevent(&xev))

break;

if (redraw_pending) {

redraw_pending = false;

display();

}

cleanup();

return 0;

}

As you can see main calls very few functions: init that initializes EGL, creates the contexts, and the X11 windows (we use 2 windows, one per context, but the one for the context that is not rendered on screen is hidden), gl_init that initializes GL structs, handle_event that performs the event handling and display that is the GL draw function.

I’ve used the EGL extensions I’ve mentioned above in gl_init in file: src/main.cc:

First I’ve generated my pixels. There are many ways to do that but I preferred to simply fill a pixel array (using software code) with my favorite XOR pattern to keep things simple:

    // xor image
    unsigned char *pptr = pixels;
    for (int i = 0; i < 256; i++) {
        for (int j = 0; j < 256; j++) {
            int r = (i ^ j);
            int g = (i ^ j) << 1;
            int b = (i ^ j) << 2;

            *pptr++ = r;
            *pptr++ = g;
            *pptr++ = b;
            *pptr++ = 255;
        }
    }

// xor image

unsigned char *pptr = pixels;

for (int i = 0; i < 256; i++) {

for (int j = 0; j < 256; j++) {

int r = (i ^ j);

int g = (i ^ j) << 1;

int b = (i ^ j) << 2;

*pptr++ = r;

*pptr++ = g;

*pptr++ = b;

*pptr++ = 255;

}

Then, I made current the context that will draw this pattern in a visible x11 window ( ctxA), and:

created the vertex buffers and the shader program to render a quad on screen
created an empty texture texA I’d use to display the XOR pattern of the dma_buf

    // Context A that draws
    eglMakeCurrent(ctxA.dpy, ctxA.surf, ctxA.surf, ctxA.ctx);
    static const float vertices[] = {
        1.0, 1.0,
        1.0, 0.0,
        0.0, 1.0,
        0.0, 0.0
    };

    glGenBuffers(1, &gl_vbo);
    glBindBuffer(GL_ARRAY_BUFFER, gl_vbo);
    glBufferData(GL_ARRAY_BUFFER, sizeof vertices, vertices, GL_STATIC_DRAW);

    gl_prog = create_program_load("data/texmap.vert", "data/texmap.frag");
    glClearColor(1.0, 1.0, 0.0, 1.0);

    glGenTextures(1, &texA);
    glBindTexture(GL_TEXTURE_2D, texA);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 256, 256, 0,
                 GL_RGBA, GL_UNSIGNED_BYTE, 0);

// Context A that draws

eglMakeCurrent(ctxA.dpy, ctxA.surf, ctxA.surf, ctxA.ctx);

static const float vertices[] = {

1.0, 1.0,

1.0, 0.0,

0.0, 1.0,

0.0, 0.0

};

glGenBuffers(1, &gl_vbo);

glBindBuffer(GL_ARRAY_BUFFER, gl_vbo);

glBufferData(GL_ARRAY_BUFFER, sizeof vertices, vertices, GL_STATIC_DRAW);

gl_prog = create_program_load("data/texmap.vert", "data/texmap.frag");

glClearColor(1.0, 1.0, 0.0, 1.0);

glGenTextures(1, &texA);

glBindTexture(GL_TEXTURE_2D, texA);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 256, 256, 0,

GL_RGBA, GL_UNSIGNED_BYTE, 0);

Note that I didn’t set any pixels in glTexImage2D above (I’ve passed a null pointer to data because I wanted that texture to be empty as I plan to fill the backing storage dma_buf from the other context).

I’ve then created an EGLImage from this texture and exported the file descriptor and some other information related to the backing dma_buf of texA:

    EGLImage imgA = eglCreateImage(ctxA.dpy, ctxA.ctx, EGL_GL_TEXTURE_2D, (EGLClientBuffer)(uint64_t)texA, 0);
    assert(imgA != EGL_NO_IMAGE);

    PFNEGLEXPORTDMABUFIMAGEQUERYMESAPROC eglExportDMABUFImageQueryMESA =
        (PFNEGLEXPORTDMABUFIMAGEQUERYMESAPROC)eglGetProcAddress("eglExportDMABUFImageQueryMESA");
    PFNEGLEXPORTDMABUFIMAGEMESAPROC eglExportDMABUFImageMESA =
        (PFNEGLEXPORTDMABUFIMAGEMESAPROC)eglGetProcAddress("eglExportDMABUFImageMESA");

    EGLBoolean ret;
    ret = eglExportDMABUFImageQueryMESA(ctxA.dpy,
                                        imgA,
                                        &gl_dma_info.fourcc,
                                        &gl_dma_info.num_planes,
                                        &gl_dma_info.modifiers);
    if (!ret) {
        fprintf(stderr, "eglExportDMABUFImageQueryMESA failed.\n");
        return false;
    }
    ret = eglExportDMABUFImageMESA(ctxA.dpy,
                                   imgA,
                                   &dmabuf_fd,
                                   &gl_dma_info.stride,
                                   &gl_dma_info.offset);
    if (!ret) {
        fprintf(stderr, "eglExportDMABUFImageMESA failed.\n");
        return false;
    }

EGLImage imgA = eglCreateImage(ctxA.dpy, ctxA.ctx, EGL_GL_TEXTURE_2D, (EGLClientBuffer)(uint64_t)texA, 0);

assert(imgA != EGL_NO_IMAGE);

PFNEGLEXPORTDMABUFIMAGEQUERYMESAPROC eglExportDMABUFImageQueryMESA =

(PFNEGLEXPORTDMABUFIMAGEQUERYMESAPROC)eglGetProcAddress("eglExportDMABUFImageQueryMESA");

PFNEGLEXPORTDMABUFIMAGEMESAPROC eglExportDMABUFImageMESA =

(PFNEGLEXPORTDMABUFIMAGEMESAPROC)eglGetProcAddress("eglExportDMABUFImageMESA");

EGLBoolean ret;

ret = eglExportDMABUFImageQueryMESA(ctxA.dpy,

imgA,

&gl_dma_info.fourcc,

&gl_dma_info.num_planes,

&gl_dma_info.modifiers);

if (!ret) {

fprintf(stderr, "eglExportDMABUFImageQueryMESA failed.\n");

return false;

}

ret = eglExportDMABUFImageMESA(ctxA.dpy,

imgA,

&dmabuf_fd,

&gl_dma_info.stride,

&gl_dma_info.offset);

if (!ret) {

fprintf(stderr, "eglExportDMABUFImageMESA failed.\n");

return false;

}

The following dma_buf storage related information is useful to import the dma_buf from the other context ( ctxB):

static int dmabuf_fd;

struct tex_storage_info {
    EGLint fourcc;
    EGLint num_planes;
    EGLuint64KHR modifiers;
    EGLint offset;
    EGLint stride;
};
static struct tex_storage_info gl_dma_info;

static int dmabuf_fd;

struct tex_storage_info {

EGLint fourcc;

EGLint num_planes;

EGLuint64KHR modifiers;

EGLint offset;

EGLint stride;

};

static struct tex_storage_info gl_dma_info;

where dmabuf_fd is the shared dma_buf file descriptor and the tex_storage_info gl_dma_info struct is storing the information about the dma_buf storage.

eglExportDMABUFImageQueryMESA above is used to retrieve the pixel format of the buffer (as specified by drm_fourcc.h) the number of planes in the image and the Linux drm modifiers.



In the DRM subsystem, framebuffer pixel formats are described using the fourcc codes defined in include/uapi/drm/drm_fourcc.h. In addition to the fourcc code, a Format Modifier may optionally be provided, in order to further describe the buffer's format - for example tiling or compression.

File: include/uapi/drm/drm_fourcc.h contains a big DOC comment with more information about modifiers.

Note that <fourcc>, <num_planes> and <modifiers> may be NULL, in which case no value is retrieved.

eglExportDMABUFImageMESA retrieves the dma_buf file descriptors, strides and offsets for the image. The caller should pass arrays sized according to the num_planes values retrieved previously. Passing arrays of the wrong size will have undefined results. If the number of fds is less than the number of planes, then subsequent fd slots should contain -1.

So, at this point in ctxA we have a texture texA that is backed by a dma_buf which we accessed through EGL image imgA.

Let’s configure context ctxB. While still in gl_init, I’ve imported the dma_buf file descriptor in the other context ( ctxB) like this:

    // Context B that fills the texture texB with a XOR pattern
    eglMakeCurrent(ctxB.dpy, ctxB.surf, ctxB.surf, ctxB.ctx);
    EGLAttrib atts[] = {
        // W, H used in TexImage2D above!
        EGL_WIDTH, 256,
        EGL_HEIGHT, 256,
        EGL_LINUX_DRM_FOURCC_EXT, gl_dma_info.fourcc,
        EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_fd,
        EGL_DMA_BUF_PLANE0_OFFSET_EXT, gl_dma_info.offset,
        EGL_DMA_BUF_PLANE0_PITCH_EXT, gl_dma_info.stride,
        EGL_NONE,
    };
    EGLImageKHR imgB = eglCreateImage(ctxB.dpy, EGL_NO_CONTEXT, EGL_LINUX_DMA_BUF_EXT, (EGLClientBuffer)(uint64_t)0, atts);
    assert(imgB != EGL_NO_IMAGE);

    PFNGLEGLIMAGETARGETTEXTURE2DOESPROC glEGLImageTargetTexture2DOES =
        (PFNGLEGLIMAGETARGETTEXTURE2DOESPROC)eglGetProcAddress("glEGLImageTargetTexture2DOES");
    assert(glEGLImageTargetTexture2DOES);

    glGenTextures(1, &texB);
    glBindTexture(GL_TEXTURE_2D, texB);
    glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, imgB);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, pixels);

// Context B that fills the texture texB with a XOR pattern

eglMakeCurrent(ctxB.dpy, ctxB.surf, ctxB.surf, ctxB.ctx);

EGLAttrib atts[] = {

// W, H used in TexImage2D above!

EGL_WIDTH, 256,

EGL_HEIGHT, 256,

EGL_LINUX_DRM_FOURCC_EXT, gl_dma_info.fourcc,

EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_fd,

EGL_DMA_BUF_PLANE0_OFFSET_EXT, gl_dma_info.offset,

EGL_DMA_BUF_PLANE0_PITCH_EXT, gl_dma_info.stride,

EGL_NONE,

};

EGLImageKHR imgB = eglCreateImage(ctxB.dpy, EGL_NO_CONTEXT, EGL_LINUX_DMA_BUF_EXT, (EGLClientBuffer)(uint64_t)0, atts);

assert(imgB != EGL_NO_IMAGE);

PFNGLEGLIMAGETARGETTEXTURE2DOESPROC glEGLImageTargetTexture2DOES =

(PFNGLEGLIMAGETARGETTEXTURE2DOESPROC)eglGetProcAddress("glEGLImageTargetTexture2DOES");

assert(glEGLImageTargetTexture2DOES);

glGenTextures(1, &texB);

glBindTexture(GL_TEXTURE_2D, texB);

glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, imgB);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, pixels);

In the snippet above, I’ve used the modifiers information I had retrieved from imgA/ ctxA before, as well as the texA width and height in an EGL attribute list and created an imgB similar to imgA.

EGL_LINUX_DMA_BUF_EXT is used to denote that we are going to import an external dma_buf buffer.

Then I’ve called glEGLImageTargetTexture2DOES from OES_EGL_image_external to create a texture from imgB that uses the same dma_buf with imgA.

Finally, I’ve filled texB with the XOR pattern I’ve described above.

Then in display loop (function display of src/main.cc) I’ve made context ctxA and displayed texA:

    // context A draws using texA, we should see the pattern we
    // wrote in tex B
    eglMakeCurrent(ctxA.dpy, ctxA.surf, ctxA.surf, ctxA.ctx);
    bind_program(gl_prog);
    glBindTexture(GL_TEXTURE_2D, texA);
    glBindBuffer(GL_ARRAY_BUFFER, gl_vbo);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, 0);
    glEnableVertexAttribArray(0);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
    eglSwapBuffers(ctxA.dpy, ctxA.surf);

// context A draws using texA, we should see the pattern we

// wrote in tex B

eglMakeCurrent(ctxA.dpy, ctxA.surf, ctxA.surf, ctxA.ctx);

bind_program(gl_prog);

glBindTexture(GL_TEXTURE_2D, texA);

glBindBuffer(GL_ARRAY_BUFFER, gl_vbo);

glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, 0);

glEnableVertexAttribArray(0);

glBindBuffer(GL_ARRAY_BUFFER, 0);

glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

eglSwapBuffers(ctxA.dpy, ctxA.surf);

I’ve used the shader program and vertices that I’ve created when ctxA was current to draw a quad where I had mapped texA.

Results

When the X11 window that corresponds to the EGL surface of context ctxA is displayed we see it contains the XOR pattern we used to fill texB from ctxB. This is expected because texB and texA share the same backing storage (the dma_buf we exported using imgA and imported using imgB). By filling the pixels of texB, we fill the dma_buf backing storage of texA as well.

We filled texB with a XOR pattern and then displayed texA. XOR pattern appeared on the screen.

Source Code

The snippets above came from: src/main.cc, and the following functions were not mentioned above:

init: calls egl_init, creates 2 windows (one per context), one of which is visible (and corresponds to context ctxA that draws the shared dma_buf contents on screen) and the other is hidden (corresponds to ctxB that fills the shared DMA buffer).
egl_init: called from init, initializes EGL, creates the contexts and the surfaces.
x_create_window: creates an X11 window with a visual matching the EGL configuration.
handle_event: event handling using X11 (mostly keyboard handling).

The full source code can be found here.

Links

[1] Example source code (draft):
https://github.com/hikiko/shctx/tree/wip/egl-to-egl-dma-exchange-working

[2] Buffer Sharing and Synchronization (The Linux Kernel)
[3] EGL Reference pages
[4] GLESv2 Reference pages
[5] EGL_MESA_image_dma_buf_export specification
[7] EGL_EXT_image_dma_buf_import specification
[8] EGL_EXT_image_dma_buf_import_modifiers specification
[9] Generic Buffer Manager in Wikipedia
[10] OES_EGL_image_external specification
[11] linux/master/include/uapi/drm/drm_fourcc.h

[12] [UPDATED]: I was about to publish this post when I’ve found a nice example describing a simple case of inter-process communication to exchange the dma_buf fd. As it’s what I was planning to do next, it deserves a place in my reference links, although I haven’t talked about IPC anywhere. You can read about it here in Blaztinn’s blog!

Well, this post was quite long! But you’ve finally reached THE END of it!! 🙂
See you next time!