The basic idea of this still ongoing project is to create an as fast as possible per-pixel alpha blending routine, while still keeping the routines as simple as possible to use: just pass two buffers, one containing the background image and the other one the source image, overlay the second on top of the first one, taking into account the alpha channel.
Possible uses for all this might be the various digital television applications which uses per-pixel alpha-blending heavily, complex graphics libraries, games etc. I tried to make the source code as clear and simple as possible, to ease integration with your own projects.
Until now there are three implementations of the basic functionality:
- - Base implementation written in C code (see AlphaBlt function)
- - MMX implementation written in assembler (see AlphaBltMMX function)
- - SSE implementation written in assembler (see AlphaBltSSE function)
The functions take the same parameters, and are designed as simple as possible. The parameters are the destination image, source image, and the width and height of images (the source and destination image sizes must be the same). The source and destination are 32 bit images with RGBA pixel format (alpha channel on most important byte, Red channel on least important byte).
Following is a description of the alpha-blending routines, which, again, have exactly the same parameters:
void AlphaBlt(unsigned char *dst, unsigned char *src, int w, int h)
void AlphaBltMMX(unsigned char *dst, unsigned char *src, int w, int h)
void AlphaBltSSE(unsigned char *dst, unsigned char *src, int w, int h)
Inputs:
- dst
- Destination image buffer.
- src
- Source image buffer.
- w
- Width of image.
- h
- Height of image.
Output:
- dst
- The resulting alpha-blended image.
Remarks:
The MMX and SSE versions requires that the width of the image must be word
aligned (divisible by two).
Note that the routines don't perform well when mixing floating point with MMX/SSE
instructions. To solve this, add an EMMS instruction after using MMX and/or SSE.
This instruction was left out to allow more precise benchmarks.
Final considerations:
The MMX version added an almost 200% improvement is speed comparing to the
basic, un-optimized, C implementation, while the SSE version added another
speed boost, giving an almost 240% improvement over the basic version.
The downloads, include a demo program which also includes some benchmarks. The application blits a bitmap containing an alpha channel, over another bitmap, measuring the frame rate. The following keys are handled by the program:
- '1' - Use the SSE alpha-blend routine
- '2' - Use the MMX alpha-blend routine
- '3' - Use the base alpha-blend routine
- '4' - SSE alpha-blend benchmark
- '5' - MMX alpha-blend benchmark
- '6' - Base alpha-blend benchmark
The benchmarks measure the time it takes to make 10,000 blits using the selected implementation.
Coming soon, SSE2 implementation.
Discuss this article
Version: | 1.0 |
License: | GPL |
OS: | Windows |
Development Tools: | Microsoft Visual Studio 6 SP5 |
Last Update: | November 25st, 2005 |
Download: |
Sources |
Executable |