Skip to content

Stéphane Hockenhull's website

Bits and Can-do Dad

Optimizing GLES2.0 Shaders : The case of scale2x on PowerVR SGX 530

Posted on 2012-05-05 By rv6502 No Comments on Optimizing GLES2.0 Shaders : The case of scale2x on PowerVR SGX 530

From >200ms down to a usable 30ms.

OpenPandora forum user sebt3 started testing out scaling filter shaders on the SGX530 for use the OpenPandora console, the first working version ended up way too slow to be of any use.
Another forum user, FSO, improved the speed a bit (160ms to 180ms per frame).

I asked sebt3 if he’d let me have a go at it.

This was the 2nd iteration with the improvements from forum user FSO.
(180ms per frame)

precision mediump float;
varying vec2 v_texCoord[5];
varying vec2 pos;
uniform sampler2D s_texture0;
uniform vec4 u_param;
void main()
{
	    vec4 E = texture2D(s_texture0, v_texCoord[0]);
	    vec4 D = texture2D(s_texture0, v_texCoord[1]);
	    vec4 F = texture2D(s_texture0, v_texCoord[2]);
	    vec4 H = texture2D(s_texture0, v_texCoord[3]);
	    vec4 B = texture2D(s_texture0, v_texCoord[4]);
	    vec2 p = fract(pos);
		vec4 tmp1 = p.x < 0.5 ? D : F;
		vec4 tmp2 = p.y < 0.5 ? H : B;
		vec4 tmp3 = D == F || H == B ? E : tmp1;
		gl_FragColor = tmp1 == tmp2 ? tmp3 : E;
}

For the next iteration sebt3 changed the vec4 to vec3 which reduced the time to between 130ms per frame

Next I changed all the vectors I could to low precision (lowp) which brought it down to 95ms per frame, a good improvement but still not nearly enough.

precision mediump float;
varying vec2 v_texCoord[5];
varying vec2 pos;
uniform lowp sampler2D s_texture0;
uniform vec4 u_param;

void main()
{
	lowp vec3 E = texture2D(s_texture0, v_texCoord[0]).xyz;
	lowp vec3 D = texture2D(s_texture0, v_texCoord[1]).xyz;
	lowp vec3 F = texture2D(s_texture0, v_texCoord[2]).xyz;
	lowp vec3 H = texture2D(s_texture0, v_texCoord[3]).xyz;
	lowp vec3 B = texture2D(s_texture0, v_texCoord[4]).xyz;
	lowp vec2 p = fract(pos);

	lowp vec3 tmp1 = p.x < 0.5 ? D : F;
	lowp vec3 tmp2 = p.y < 0.5 ? H : B;
	lowp vec3 tmp3 = D == F || H == B ? E : tmp1;
	gl_FragColor.xyz = tmp1 == tmp2 ? tmp3 : E;
}

Then I tried attacking the boolean conditional operators and replacing them by vector math operations which would provide a similar truth table, changing the last line to

	gl_FragColor.xyz = ((tmp1 - tmp2) != vec3(0.0)) || ((D - F) * (H - B) == vec3(0.0)) ? E : tmp1;

further reduced the time to 80ms per frame at the expense of not behaving exactly the same in some rare cases (while remaining visually pleasing)

Then some reordering of the code to improve the most common case and removing the last boolean OR operator.

precision mediump float;
varying vec2 v_texCoord[5];
varying vec2 pos;
uniform lowp sampler2D s_texture0;
uniform vec4 u_param;

void main()
{
        lowp vec3 E = texture2D(s_texture0, v_texCoord[0]).xyz;
        lowp vec3 D = texture2D(s_texture0, v_texCoord[1]).xyz;
        lowp vec3 F = texture2D(s_texture0, v_texCoord[2]).xyz;
        lowp vec3 H = texture2D(s_texture0, v_texCoord[3]).xyz;
        lowp vec3 B = texture2D(s_texture0, v_texCoord[4]).xyz;

        if ((D - F) * (H - B) == vec3(0.0)) {
                gl_FragColor.xyz = E;
        } else {
                lowp vec2 p = fract(pos);
                lowp vec3 tmp1 = p.x < 0.5 ? D : F;
                lowp vec3 tmp2 = p.y < 0.5 ? H : B;
                gl_FragColor.xyz = ((tmp1 - tmp2) != vec3(0.0)) ? E : tmp1;
        }
}

Tadaaaaaaaah! from 180ms down to a very playable 30ms per frame.

This demonstrate the need to avoid boolean operators such as || and && in shaders as much as possible.
They have a particularity that make them very bad in shader code: they conditionally evaluate the right-side expression depending on the left side.
This is an old C language feature for programs run on old CPUs but causes code branching which is atrocious on GPUs.
Using vector math to evaluate everything ends up being faster due to the very costly code branches on modern processors and GPUs.

Experiments, GPU, Open Pandora

Post navigation

Previous Post: Brickster for Game Boy Color
Next Post: Hello World, Again

Related Posts

Starduino – 3D Gaming in 28KB – Behind the Pixels Arduino
Bad Duino – Bad Apple!! on Arduino Arduino
Game Boy Tracker Replay Routines Released Audio
Starduino For Arduboy – 3D Gaming Inside 28KB Arduino
Raytracing in < 2KiB Experiments
Fixing that OpenPandora hinge design Experiments

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • August 2022
  • October 2019
  • July 2019
  • January 2019
  • January 2018
  • July 2014
  • June 2014
  • May 2012
  • January 2012
  • December 2011
  • December 2010

Categories

  • Arduino
  • ATmega32U4
  • ATmega644
  • Audio
  • AVR
  • Blog
  • Computer Graphics
  • Computer Vision
  • Experiments
  • Game Boy
  • Games
  • GPU
  • Hardware
  • Linux
  • Mac OSX
  • MS Windows
  • MSDOS
  • Open Pandora
  • Raspberry Pi
  • Renderer
  • Software Releases
  • Uncategorized

Recent Posts

  • Starduino for Uzebox
  • We built an EDM BBQ
  • Starduino Turbo For Arduboy – Faster 3D Gaming Inside 28KB
  • Starduino – 3D Gaming in 28KB – Behind the Pixels
  • Starduino For Arduboy – 3D Gaming Inside 28KB

Recent Comments

  1. Linus on Bad Duino – Bad Apple!! on Arduino
  2. Akratos on Bad Duino – Bad Apple!! on Arduino
  3. Clint Beacock on Starduino For Arduboy – 3D Gaming Inside 28KB
  4. Void on Starduino Turbo For Arduboy – Faster 3D Gaming Inside 28KB
  5. millim on Starduino Turbo For Arduboy – Faster 3D Gaming Inside 28KB

Copyright © 2025 Stéphane Hockenhull's website.

Powered by PressBook Masonry Dark

Cleantalk Pixel