Blog

An efficient image format for SDL

2022-09-28

Superfluous Returnz is a 2D video game with a very classic “cartoon” style: if we put aside the descriptions of the levels and the sound part, the entire content to be loaded therefore consists of 2D images (the animations being simply successions of images).

In order to avoid excessively long loading times between levels and an overly large memory use on disk, choosing the right image format is therefore critical.

Note: the summary of the benchmark is given at the end of the article.

Obvious options

“Basic” SDL can only read BMP, the “historic” raster image format, released by Microsoft and IBM in 1990. This is probably one of the simplest image formats: it is simply a description of each pixel successively by a certain number of bytes (in general, one per channel, i.e. 4 if an RGB format + the alpha transparency channel is used).

The big advantage of BMP is that it is extremely easy to “decode” for SDL: basically, to send the image to the GPU, you just have to copy the memory block occupied by the file, since the format is already unpacked.

On the other hand, BMP is heavy, very heavy in memory, since it uses absolutely no compression algorithm. So, just to store ONE level of non-scrolling background image at HD resolution (1920x1080 pixels, my game's base resolution), the BMP format would require close to 6MB! This means the game would quickly end up weighing several gigabytes. It can be justified for an AAA game with huge 3D universes, not for an indie 2D game like mine.

Benchmark for game images in BMP format:

Total memory space: 1 GiB
Loading time: 0.59 s

Anyway. Thanks to the SDL Image module, we can fortunately use other formats that use less disk space. The most obvious format for cartoons with lots of plain color tints is PNG¹. The size is drastically reduced compared to the BMP (30 times less space occupied!), but by running my benchmark, I realize that the loading time is considerably longer ...

Benchmark for in-game images in PNG format:

Total memory space: 36 MiB
Loading time: 3.54 s

At first glance, this seems counter-intuitive: we know that disk accesses are slow, so the simple fact that PNG is almost 30 times lighter than BMP should make it faster to read, right?

With the SDL functions SDL, we can separate the “disk reading” step in itself, and the “creating the SDL image from the loaded data” step. And then we understand what's going on:

Benchmark for game images in BMP format:

Total memory space: 1 GiB
Time to read file from disk: 0.22 s
Time to create image: 0.36 s
(Total time: 0.59 s)

Benchmark for in-game images in PNG format:

Total memory space: 36 MiB
Time to read file from disk: 0.01 s
Time to create image: 3.53 s
(Total time: 3.54 s)

There it is: the huge gain in terms of reading the file from disk is completely crushed by the fact that creating the image now takes ages! And for a good reason: PNG compression is meant to be disk space efficient, but not necessarily time efficient. Decompressing a PNG image takes time, although we usually don't realize it!

Indeed, in general, your usage is just loading one image at a time and looking at it: it doesn't matter then that the loading time is a few milliseconds instead of a few microseconds, you don't see the difference. For a game where you're going to have to load lots of HD images, sometimes animations with many frames per second, it's a different story: the loading time for a level can quickly reach several seconds.

You may think a few seconds isn't so bad, but then again, for a small 2D game like mine, that seems overkill to me: you'd expect levels to load almost instantly.

So let's try to do better.

First, let's note that while PNG is basically a non-destructive compression format (each pixel of the image is exactly the same as that of the same uncompressed image in BMP format), it is nevertheless possible to apply algorithms that slightly modify the image to make it more efficiently compressible by PNG. This is for example what pngquant does, by calculating an optimal reduced color “palette” for your image, even if it means slightly changing some color pixels.

The results are impressive in terms of disk space, it takes up 4.5 times less space than with classic PNGs, or 125 times less space than with BMPs!

The problem, unfortunately, is again in terms of loading times ... If it's faster than the classic PNG, it's still twice as slow as the BMP.

Benchmark for in-game PNG images powered by pngquant:

Total memory space: 8 MiB
Time to read file from disk: 0.01 s
Time to create image: 1.22 s
(Total time: 1.22 s)

Let's also add that these results are obtained by applying a strong quantification with pngquant, which is done at the cost of an alteration of the colors of the images which becomes quite visible. Not great.

Well, it looks like we can forget PNG if we want fast load times. However, using BMP is still out of the question because of the crazy weight of the images. Other formats supported by SDL did not give better results. So what can we do?

LZ4: the fast compressor

If BMP allows for the rapid construction of images but at the cost of huge memory space, then why not compressing a BMP to get the best of both worlds? In practice, we just might fall back on the same problem as with PNG: too high decompression times which neutralize the gain in disk space and disk read time.

Then comes the LZ4 format, a compression algorithm Wikipedia tells us is “focused on compression and decompression speed”. Of course, this is done at the cost of efficiency, as the disk usage is not as optimal as 7zip for example, but it can be tried.

Rather than compressing BMPs, I thought it would be even more efficient to directly compress the memory contents of an SDL image: thus, the creation of the image will come down to a pure copy of a memory block: we can hardly do it faster.

SDL Surfaces VS Textures

A technical detail that is important: what we compress is the SDL_Surface structure, which will contain the decompressed image in a format similar to BMP. Technically, to display it, we will convert it to SDL_Texture, which is a format used directly by the graphics card. This last format is dependent on the GPU and the driver you are using, and it is neither documented nor publicly accessible, so you cannot directly compress and store an SDL_Texture.

But we can optimize our SDL_Surface so that the construction of an associated SDL_Texture is fast: indeed, if our SDL_Surface uses a storage format (the order of the different channels and the number of bytes) which is not natively supported by our graphics driver, then we will pay for a conversion.

SDL allows us to know the natively supported formats thanks to the information contained in SDL_RendererInfo. So I experimented with the few platforms available to me:

Platform	Linux Mint	Windows 10 VM	Android	Mac OS
Name	opengl	direct3d	opengles2	opengl
`SDL_PIXELFORMAT_ARGB8888`	X	X	X	X
`SDL_PIXELFORMAT_ABGR8888`	X		X	X
`SDL_PIXELFORMAT_RGB888`	X		X	X
`SDL_PIXELFORMAT_BGR888`	X		X	X
`SDL_PIXELFORMAT_YV12`	X	X	X	X
`SDL_PIXELFORMAT_IYUV`	X	X	X	X
`SDL_PIXELFORMAT_NV12`	X		X	X
`SDL_PIXELFORMAT_NV21`	X		X	X
`SDL_PIXELFORMAT_UNKNOWN`			X

So it seems that SDL_PIXELFORMAT_ARGB8888 is the most universally supported RGB+transparency format (an alpha transparency channel + the RGB channels, in that order, each taking 8 bits, so one byte). It is therefore this format that we will compress with LZ4.

SDL+LZ4: an optimal image format

The benchmark results are clear: images compressed in LZ4 format give the best loading time being 1.4 times faster than BMP (and 8 times faster than PNG), while maintaining an acceptable size, only 20% higher than PNG (and over 25 times smaller than BMP).

Benchmark for game images in LZ4 format:

Total memory space: 43 MiB
Time to read file from disk: 0.01 s
Time to create image: 0.41 s
(Total time: 0.42 s)

Great, then! And to further improve the results, we can use an HC (high compression) variant of the LZ4 compression algorithm: this variant is very slow to compress, but it offers even smaller file sizes while maintaining a fast decompression time. As the (slower) compression will only be done once (by me) and the size reduction and decompression will benefit everyone playing, it's worth it!

Benchmark for game images in LZ4HC format:

Total memory space: 33 MiB
Time to read file from disk: 0.01 s
Time to create image: 0.38 s
(Total time: 0.39 s)

The gain is not huge compared to the classic LZ4, but there is no reason to not take advantage of it. Note that the data is then even lighter than regular PNG!

Implementation

Compression is very simple: we read our image in the input format (BMP, PNG, whatever), create our SDL_image and convert it, optionally, to SDL_PIXELFORMAT_ARGB8888, then compress it.

In practice, we will first write the dimensions of our image (as well as the code of the pixel format, to be able to possibly support formats other than SDL_PIXELFORMAT_ARGB8888). I wrote a function using the same syntax as the functions in SDL Image:


int IMG_SaveLZ4_RW (SDL_Surface* surface, SDL_RWops* dst, int freedst, int hc)
{
  Uint16 width = (Uint16)(surface->w);
  Uint16 height = (Uint16)(surface->h);
  Uint32 surface_format = surface->format->format;

  SDL_RWwrite (dst, &width, sizeof(width), 1);
  SDL_RWwrite (dst, &height, sizeof(height), 1);
  SDL_RWwrite (dst, &surface_format, sizeof(surface_format), 1);
  Uint8 bpp = surface->format->BytesPerPixel;
  Uint32 uncompressed_size = width * height * (Uint32)bpp;

  const char* uncompressed_buffer = (const char*)(surface->pixels);
  int max_lz4_size = LZ4_compressBound (uncompressed_size);
  char* compressed_buffer = malloc (max_lz4_size);
  int true_size = -1;

  if (hc)
    true_size = LZ4_compress_HC(uncompressed_buffer, compressed_buffer,
                                uncompressed_size, max_lz4_size,
                                LZ4HC_CLEVEL_MAX);
  else
    true_size = LZ4_compress_default (uncompressed_buffer, compressed_buffer,
                                      uncompressed_size, max_lz4_size);

  SDL_RWwrite (dst, &true_size, sizeof(int), 1);
  SDL_RWwrite (dst, compressed_buffer, 1, true_size);

  free (compressed_buffer);

  if (freedst)
    SDL_RWclose (dst);

  return 0;
}

We also give the function that saves directly to a file (and not to an SDL RW structure):


int IMG_SaveLZ4 (SDL_Surface* surface, const char* file, int hc)
{
  SDL_RWops* dst = SDL_RWFromFile (file, "wb");
  return (dst ? IMG_SaveLZ4_RW (surface, dst, 1, hc) : -1);
}

At the reading/decompressing level, it's again very simple: we read the dimensions of the image and the pixel format, which allows us to allocate the desired SDL_Surface, then we decompress the memory block directly in the memory space allocated by the SDL.


SDL_Surface* IMG_LoadLZ4_RW (SDL_RWops* src, int freesrc)
{
  Uint16 width;
  Uint16 height;
  Uint32 surface_format;
  int compressed_size;

  SDL_RWread (src, &width, sizeof(width), 1);
  SDL_RWread (src, &height, sizeof(height), 1);
  SDL_RWread (src, &surface_format, sizeof(surface_format), 1);
  SDL_RWread (src, &compressed_size, sizeof(compressed_size), 1);

  SDL_Surface* out = SDL_CreateRGBSurfaceWithFormat (0, width, height, 32, surface_format);
  Uint8 bpp = out->format->BytesPerPixel;
  Uint32 uncompressed_size = width * height * (Uint32)bpp;

  char* compressed_buffer = malloc (compressed_size);
  SDL_RWread (src, compressed_buffer, 1, compressed_size);
  char* uncompressed_buffer = (char*)(out->pixels);
  LZ4_decompress_safe (compressed_buffer, uncompressed_buffer, compressed_size, uncompressed_size);
  free (compressed_buffer);

  if (freesrc)
    SDL_RWclose (src);

  return out;
}

Same thing, we use an additional function:


SDL_Surface* IMG_LoadLZ4 (const char* file)
{
  SDL_RWops* src = SDL_RWFromFile (file, "rb");
  return (src ? IMG_LoadLZ4_RW (src, 1) : NULL);
}

Benchmark summary

FORMAT	READING TIME (s)	CREATING SURFACE (s)	CREATING TEXTURE (s)	TOTAL (s)	SIZE (MiB)
BMP	0.22	0.22	0.14	0.59	1046
PNG	0.01	3.15	0.38	3.54	36
PNG (quant)	0.01	1.07	0.15	1.23	8
LZ4	0.01	0.25	0.16	0.42	43
LZ4 (HC)	0.01	0.22	0.16	0.39	33

The LZ4 format compressed with the HC method therefore offers the shortest loading time. In terms of data size, it is beaten only by a quantized version of PNG which involves destructive compression and visible color alteration of images.

To conclude, in my use case, storing my images as SDL_Surface objects compressed with the LZ4HC algorithm is an optimal choice both in terms of memory and in terms of loading time.

Source code

The encoding/decoding functions to this SDL LZ4 format as well as the files necessary for the benchmark (without the images) are available on this repo.

the JPG format is totally ignored for 3 reasons. First, it was primarily designed for photography and is counterproductive on cartoon-like designs; second, it is destructive and creates highly visible artifacts on cartoon-like drawings, especially around black strokes; third, it doesn't handle transparency which is needed in my case. ↩