Tuesday, February 8, 2011

Mega-Structures; Geometry Shaders #2

Two weeks ago I showed a stupid “electro bolt” example that can be done with geometry shaders. Uhm... Quake1 already had lighting bolt polygons as well, so do we really need geometry shaders for that? The answer ist 'nein' of course, but it was just to show how those programs look like. Yet, it took me some thinking before I could make up a real good reason to implement Geometry Shaders. While they already exists for a couple of years. Yet, here some more advanced applications. Bon appetit!

--- Tesselation tricks ---
The most interesting trick you can do with a GS, is generating new geometry. We’re still toying around with bumpMaps, and wacky parallax tricks to improve the illusion of relief in our surfaces. But what if we can really use hardcore triangles instead of normalMapping & Co? You know, instead of drawing bricks, you actually model them. Obviously, the problem is that this will cost millions and millions of triangles for a somewhat complex scene. Maybe billions when looking at a typical GTA scene.

You don’t need all that detail for distant stuff though. Who cares if there is a small hole in that brick wall 100 yards away? Level of Detail (LOD) is a common trick to reduce the polycount for terrain or objects that are further away from the camera. Either replace a high-detail mesh with a lower one, or dynamically reduce the quad division for heightMap/grid geometry such as terrains. Now with a GS, you could make brilliant advantage of that. Imagine a brick wall again, originally just modeled as a flat quad. That’s good enough normally, but when the camera comes closer, you could let a GS start dividing that quad 1,4,8,16,... For each new generated “sub” vertex, interpolate the texture coordinate, fetch the height, then recalculate the vertex 3D position (and normals). Since there is real relief in the wall now, you don’t need normalMaps for shading.

When using a 512x512 heightMap on a surface, you could effectively subdivide to 512x512 tiny quads. That is, ouch... 524.288 triangles. Hmmm, still too much for modern hardware, even if you only use it for geometry within a few meters from the camera. But hey, who knows. Last 15 years the maximum polycount has been multiplied with a factor 500. If not more.

--- Rendering CubeMaps ---
Aight, that still leaves us with nothing. But wait, there is another extremely handy trick that can be done with Geometry Shaders, making it worth to implement them: drawing to cubeMaps or 3D textures in a SINGLE pass. And yes, there are certainly reasons to want that these days. When you have to draw data in the background (shadowMap, deferred engine, etc.), you can render onto simple 2D target textures. But in some cases, you may also need to render onto cubeMaps, 2D array textures, or a 3D(volume) texture:
-.... cubeMaps : Reflections, (light?) probes, point shadowMaps
-.... 3D textures : Injecting points (ambient render techniques)
-.... 2D Array tex : Cascaded shadowMaps (sun)
In essence, all these "special" texture types are just collections of 2D textures. A 3D texture with 32 layers is... 32 2D textures stacked on each other. And a cubeMap is made of 6 textures. So when you like to render the environment into a cubeMap for reflections, you’ll need to render 6 times:
- Switch to cubeMap face –X  Render everything on the left
- Switch to cubeMap face +X  Render everything on the right
- Etecetera.

Now Geometry Shaders can not only generate extra output, it can also tell on which “layer” to render a primitive. So, instead of rendering the environment for a cubeMap 6 times, we can also render it just once, then let the geometry shader decide on which side(s) to put those primitives. The easiest (and probably fastest) way is just to duplicate each given primitive 6 times. Use 6 matrices you'd normally use to setup the camera before rendering the -X, +X, -Y, +Y, -Z or +Z side to transform the primitive in the proper perspective for each layer. In practice that means the primitive will fall outside the view-frustum in most cases, so no pixels will get harmed.

Wait a minute... You are still rendering everything 6 times in the end, right? Yes, in essence both tricks do the same. In case you can do 100% perfect culling in the traditional way, the same amount of triangles will get pushed. However, the GS way still has the advantage of:
- CPU only has to bother once. Don't have to swap the same textures/parameters multiple times.
- Culling is simpler (100% perfect culling may cost as well)
- Don't have to switch targets

The speed gain may not be gigantic, but there is more. What if you have a large amount of "probes" collecting... Ambient light or something. You could render on a 3D texture with many layers, and duplicate those triangles even more times. As long as the probes are working with the same set of input data (geometry pushed by the CPU), you can do it in a single call.

I'd love to place a working cubeMap Geometry Shader now. But sadly, my parrot died so I couldn’t try it yet. You’ll have to do with this pseudo code:

TRIANGLE void main(
AttribArray|float3| position : POSITION, // Coordinates
AttribArray|float2| texCoord0 : TEXCOORD0 // Custom stuff. Just pass that
uniform float4x4 cameraMatrix[6] // Camera matrices for each direction
for (int i=0; i<6; i++)
// Set cubeMap target face 0..5
flatAttrib( i : LAYER );

// Pass the geometry data, transformed
// into the current camera direction
for (int j=0; j<3; j++)
float3 transformedPos = mul( position[j], cameraMatrix[i] );
emitVertex( transformedPos : POSITION,
texcoord0 : TEXCOORD0 );

Not sure about the layer thing, and the matrix calculation is incomplete(what matrix, view and projection?), but this is basically it.

--- Rendering points into a 3D texture ---
Instead of playing with cubes, I was messing experimenting with a new realtime G.I. method. Just like Crytek's LPV method, it requires a massive amount of points to get inserted into a volume texture (multiples, actually). The position inside the 3Dtex would depend on the world-coordinate of the point, where the Y(height) tells on which layer to render. Easier said than done.

Just like with cubeMaps, you first have to tell on which layer you are going to put the upcoming stream of geometrical crap. That would be fine if that point-array was oredered on Y coordinates, but it's a complete random mess of points. So either we sort out that array first, or we render the whole damn thing again and again for each layer we activate:

Fbo.setRenderTarget( some3Dtexture );
For i:=0 to 31 do
....Fbo.setOutputLayer( i );
....bigDotArray.draw( again, sigh );

Eventually the CPU or shader has to test whether the given point belongs on the current layer. In other words, @#$@#%. Then there is a third option: "Bliting"... or is it "Blitting"? Anyway, OpenGL has this "glBlitFramebufferEXT" command. Some sort of advanced version of glTexCopy. Basically you copy a rectangle from buffer to another. So, what you can do is rendering everything on a flat simple texture first, then copy it into a 3D one:

-... Unfold the 3D texture to a simple 2D one. 32x32x32  1024 x 32
-... Render all stuff in one pass on the 2D target, and spread out over the width. Let the vertex-shader calculate an offset based on the Y or Z coordinate (what you like). So everything that belongs on the bottom layer gets an offset of 0. Layer[4] gets an offset of 4 x (1/32), etcetera.
-... Make a loop that “blits” a subrect from the 2D target onto the 3D texture layer:

Fbo.setRenderTarget( some3Dtexture );
For i:=0 to 31 do
....sourceRect := rect( i*32, 0, {width}32, {height}32 );
....targetRect := rect( 0, 0, {width}32, {height}32 );
....glBlitFramebufferEXT( sourceRect, targetRect, GL_COLOR_BUFFER_BIT , GL_NEAREST );

You still have to loop through that whole 3D texture, but at least you only have to draw that point array once, without worries. But it still doesn't win the Nobel prize for peace. But with a Geometry Shader, it becomes childplay. Just act like a fool that doesn’t know anything, bind the 3D texture as a renderTarget, and render that stupid array already. Your shaders remain the same as well, except for one thing:

POINT void main( AttribArray position : POSITION )
// Given positions are 3D texcoords!
// xy between 0 and 1, z is layer index.
// Thus when placing something on layer 4 when it has 16 layers in total, z=0.25
// Eventually you need to add a half "(1/16)*0.5" to Z
float4 p = position[0];

// Select 3D texture slice based on Z coordinate
flatAttrib( p.z : LAYER );
p.z = 0;
emitVertex( p : POSITION );

Done. Or wait. One last thing. You need to attach the target as a LAYERED texture. Otherwise all your points still get dumped on layer 0 only. There is a slight difference in setting up a FBO where you render to a specific layer, or a layered one where you can access all layers via the geometry shader. Too bad I don't have the code here right now, too bad it's 01:15 already, and too bad I'm too lazy to look for it now :p But just keep that in mind. I also had to disable the depthbuffer attachment (or use a "layered depthbuffer"?), otherwise it didn't work.

Any screenies then? You'll have to do with this bucket.


  1. I'd be careful with geometry shader amplification...it can severely degrade your GPU performance. The IHV's have always recommended that a geometry shader emit no more 4 verts per execution. In my experience it's always been quicker to just switch render targets (for cube map rendering), or use a texture atlas (for cascaded shadow maps).

  2. Hey MJP (the famous Gamedev MJP ? :))

    Didn't test cubeMaps yet. Opinions differ when browsing the web about this topic, but in theory it could be a booster of course. Don't know the exact (hardware) quirks in practice though :) But for filling that 3D texture, GS is a life savior.