Saturday, March 30, 2013

T22 Book Club

What is this?! Blogger suddenly changed the whole lay-out?! Jesus would turn over in his cave. And walk out.

Maybe this new look isn't so bad, but some parts are broke, and I hate coding HTML. I've been programming pretty much everything the past century, but I just never got into HTML, Javascripts, or any other webpage-building technique. Don't know why. Either I like something, or it doesn't interest me a single binary bit.

Well, the Compute Shader tutorial is finished, and let's not complain again about how difficult it is to motivate a team, make rapid progress, and show you eye candy every few weeks. No. Let's talk about something different... right? .... Hmmmm .... everything ok at work …. Nice weather ... Or actually not, my balls are still freezing off here in Holland ... yeah ….. cold ….. saw any TV yesterday? …. Pfff, really cold outside …. I’m allergic for woolen trousers ….

Nope, not much special to report from T22 either, so I just recycle this gun again.

Wait, I know something. Books!
I’m not much of a reader. Except that I read fairy tails and kids stories (“Jip en Janneke”) every day for our little girl. Which is completely awesome, so if you need a good excuse to read kids books, just make a kid somewhere. But other than that, we don’t have filled bookcases around here. Not that I don’t like reading, and sometimes I do get a nice book from friends, but I usually just don’t have time. Or don’t make time for it, whatever.

However, last Christmas Santa gave me two books (using my own wallet to pay them). First there was “Eugene Sledge: With the old Breed”, which is truly brilliant. You may remember the name from the HBO series “The Pacific”. Well, character “Sledge” really existed, he fought in the Pacific (WO2 really happened too!), and wrote a book about it. Not just a summation of 4 April 1942, Hitler shaved his moustache, 5 April 1942 Frozen beef for diner again, 6 April 1942 some Japanese made stinking Sushi in the foxhole next to us. No, the man actually had the talent for writing things down in a grim, graphical, but also neutral way. No waving American flags and hero’s, just the war as dirty as it is. A biography that shows humanity in its worst possible way. I can really recommend it, and I’ll sure get back on it here some day.

"The Why's and How's of Level Design"
But after five paragraphs I still didn’t reach the actual topic (bad habit), that other book, “The why's and how's of Level design”, by Heurences, or Sjoerd de Jong. That name sounds Dutch btw, or Belgium maybe. You don’t have to be a mastermind to figure what’s the book about. As said before, I never really red about “making games” either. Pretty much all the programming knowledge comes from internet webpages, example programs, and just by looking at the neighbors. Also, the design of a level is not exactly the terrain of a programmer. Design contains
• Picking themes
• Making realistic plans. What can be done, and what can’t be done with the time & tools
• Drawing floorplans
• Make the map suitable for the given type of gameplay (shoot, puzzle, race, platformer, …)
• Making wise use of eye catchers
• Texture & color palette
• Lighting
• Modeling it
• … And so on …

This book covers all these topics, but on a more general level. It doesn’t explain which buttons to click in Maya to create a donut. It focuses on techniques that generally work for level design, and of course, he also shows the counterparts: things that should be avoided. That may not sound very helpful, but it’s just true that many, many amateur and hobbyists make the same errors when designing a new level pack or game MOD. Really, level Design is a job on its own, and you can’t just teach it yourself by reading a book or two. Neither with this book (but neither does the author claim that).

Like developing any creative talent, becoming a good Level designer requires practice. Lots of it. But Rick, why would you want to become a Level Designer? Get back in your programmer hole! Maybe I should, but unfortunately, this project doesn’t have a level designer yet. Of course artists have nice ideas and knowledge about how to setup an interesting scene. But it’s too fragmented to create a consistent world that exactly fits the Tower22 needs. As the book explains, all different elements need to become one. A sports car may look great, but it doesn’t belong in the world of T22. Of course the artists know what the word harmony means, but then the level design still needs to meet the story and gameplay requirements. And my twisted mind about how a horror game should look like, which isn’t the same as Resident Evil or Silent Hill, to name a few.

That’s why projects have one or a few lead designers. They understand all these requirements, and coordinate the artists. Obviously, I play a part in that, as I created the ideas. And most others don’t have time to learn the game plot thoroughly, neither time to coordinate and monitor other artists. So, that makes me pretty much the Level Designer.

No Emmy award for this drawing, but at least I know how to use MS Paint a little bit to make my point clear.

Yet I lack skills when it comes to architecture, making outstanding artwork, advanced 3D geometry, or drawing textures. Not that I have to model everything myself, but it would be nice to get some better understanding, to improve the communication. One can only transfer his ideas to another if he knows how to explain, sketch, and divide the work in concrete tasks. Usually programmers (Beta’s) and artists (Alfa’s) approach things completely different, so as a programmer, I needed to dive in their world a bit to get on one line. That’s why I bought this book basically.

Back to the book. The author covers most of the design aspects you would expect. How to make geometry look interesting? How to break up boring repeating geometry (a very real problem for the T22 corridors), which lights can be used where, and the importance of respecting core gameplay features rather than just mixing random “cool” ideas. But it also tells about planning, and making realistic, feasible ideas. A mistake often made by beginners is trying to do “everything”, but soon finding out the plans are overambitious. Leading to nothing, or half-finished inconsistent results. The book uses a lot of colored pictures, comparing good & wrong situations to show you why certain techniques work, or don’t work. Hence the book title “whys and How’s”. The book finishes with some Unreal Tournament levels he did, and also nice, interviews with artists from the games industry.

My 50 cents
Cool and The Gang. But, the key question, did we learn from it? Hmmm. First, as said before and by the author as well, you don’t just learn level design. You have to try it yourself. Then this book can be used as a guideline to reach good results earlier, and to avoid pitfalls. And although I disagree on a few statements, the book makes logical sense. The advises are true, and he manages to explain it without floating away in vagueness, though a beginning artist may miss some deeper explanations here and there. Many examples are a little bit too Captain Obvious.

Then again, maybe I’m not a beginner when it comes to Level Design. Ever since Super Nintendo and Doom2, I’ve been drawing floorplans, fantasizing about game worlds, and carefully looking at other games. When I play Crysis, I don’t just get “wowed”. I try to find graphical weaknesses that reveal how the world was made. Which techniques, shaders and elements were used? When playing Halflife2, I look beyond the battle and notice the backgrounds and styles that are used to make a believable, immersive world. When thinking about puzzles, I remember how smart and complex the Zelda worlds were made. I know the contrast between nineties games that focused on simple but addictive gameplay, and the more realistic 21th century “next gen” engines (that don’t always succeed in delivering a fun game). So, that doesn’t make the advice from this book less valuable, it’s just that I wasn’t surprised by most advises.

Ok, a little bit news then; bullet holes.

Second, the showcases are obviously aimed at the Action & Shooter genre, using UDK and Unreal Tournament (Deathmatch) levels in particular. That’s fine of course, since shooters are still a popular genre and tend to search graphical limits more than any other genre. But gameplay wise, I couldn’t map it on Tower22, which has a very different, almost unique, style. For example, although I agree with the author that clichés and exaggeration works in games to compensate the lacking (hardware)capabilities to pull you in the game, I try to break with some of them. T22 should not look as if it has been done before.

Another difference. Unreal-like games are split in levels. It’s all about rapid, addictive gameplay. World one doesn’t have to do anything with world 2, just as long they are well designed when it comes to shooting another. World A can be a science fiction space station while world B has a “Capture the ketchup in McDonalds” theme. But most single player, story driven games, can’t permit this variation. Especially not a game like Tower22, where the horror atmosphere is the dominant factor (maybe even more than gameplay). A single mistake can ruin the immersion. Rooms that should be scary but feel safe, cheesy music, a laughable monster, overused predictable clichés, a wrong pacing and timing of scary events… all will reduce the horror experience to a joke. A shooter game can fall back on its core action elements if the environment makes a mistake, but T22 can’t. Of course the book explains how to make floorplans and climaxes, but not in detail. Making a complicated but satisfying puzzle, or a truly scary environment needs some more explanation.

The verdict
Well, those were my two complaints. It’s not really a mistake of the author, as he just choose to use the action genre as a demonstration. You can’t write a book about everything, for both a beginning & experienced audience. So if you want to make action game levels / MODs but don’t have a whole lot of experience yet, I’m sure the book will give you valuable advice. As for me, I need to find an Level Designer that has experience with both horror and complex interconnected puzzle worlds (such as Zelda or Metroid). But where to find those? Abduct George Trevor maybe? (fictive architect of the Resident Evil mansion)

Saturday, March 16, 2013

Charlie & The Compute-Shader factory #3: Tiled Deferred Lighting

Finally. Another post. Sorry, I’ve been busy, mainly at work. And with playing the Sims with our daughter… Maybe I shouldn’t have said that.

The previous Compute Shader post ended with brabble about semaphores, mutexes and other ways to synchronize and avoid conflicts between threads. Ok, but why would one worker has to bother another worker? You would be pissed too if the neighbor shows his head above the fence everyday to interfere with whatever business you’re doing. Respect a man’s privacy! Yet, in Compute Town, there are scenario’s that require cooperation between the elements being executed within a Warp/Wavefront. The last compute shader manuscript, for now. A practical example, on Tiled Deferred Lighting, made by the Pope, in Taiwan, brought to you by MacDonalds.

Deferred Lighting, anno 1725
This post is aimed for the more advanced users. So, if you never wrote a Deferred Renderer or the likes, try that first. But anyway, here a short mind refresher on the traditional Deferred Lighting pipeline:
1- Fill (full screen) G-Buffers
With pixel attributes that represent the scenery your camera sees. Attributes such as the 3D position, diffuse/specular color and normal for each pixel.

2- Draw diffuse/specular lighting into another texture buffer(s)
- For each lamp, render a rectangle, cone, sphere or other shape that covers the area that is *potentially* affected by that lamp.
- For all pixels being overlapped by that shape, calculate if the lamp really affects the pixel, and ifso, compute the color results. To do so, use the G-Buffers from step 1.
- Use additive blending to sum up the result of each light, in case a pixel gets affected by multiple lights.

3- Render the scene again, multiply it with the lighting buffers from step 2.

Systems up? Good. Although this approach is easier and faster than traditional forward rendering, there are still two major issues that slow down the process:
- If a pixel is overlapped by 10 lamps, some steps such as reading the G-Buffers and doing some vector calculations, have to be redone 10 times.
- Additive blending, although not slow, is not super fast either.

These two issues are the price you pay for handling each light in a separate pass. It would be nice if we could combine all the lamps into a single pass, so we only have to do the computations once, and do the additive blending internally. Like this:

Thanks to uniform buffers and such, making an array of lights isn’t too hard. But… there is one stinky catch. How do you know which lamps from the array apply on a particular pixel? If you have 100 active lights scattered on your screen, it doesn’t mean each pixel should loop through all 100 lamps. Well you can do … but it’s stupid.

Deferred Lighting, anno now --> Tiled Deferred Lighting
Did you see the Batman signal projected at the clouds? That means a Compute Shader is needed. We can do all the testing and lighting in a single program. The idea is pretty much the same as illustrated in the Rick++ code above, except that we also test which lights should be involved, and which can be skipped for a small region of pixels. After all, a local light in the top left corner of the screen shouldn’t lay his dirty hands on pixels in the opposite screen corner. Since this testing step is quite expensive, we don’t cull the lights for each pixel, but per “tile” (hence the name).

Tower22 is progressing very well

Technically speaking, all pixels within a Warp/Workgroup can form a square tile together (32x32 pixels for example). Instead of testing for each individual pixel which lights affect it, we do it per tile. And since we have 32x32 (or more) pixels within a tile, we can nicely divide the work. For example, if each pixel just tests a single light, we can perform 1024(32x32) checks simultaneously. Oh yeah, parallel working remember? All pixels within the tile are executed simultaneously, so instead of 1 pixel doing all the work, kick their lazy asses of the couch and divide the work.

That sounds logical, but if you are like me, you are probably already trying to figure out how you would code that in Cg, HLSL, GLSL or whatever language… coming to the conclusion you don’t have a clue how to let pixels cooperate. Well, that’s one of the major differences with common shaders and Compute Shaders such as CUDA or OpenCL. Let me explain the “Tiled Deferred Lighting” (OpenCL) compute shader step by step. First, an overview of all steps performed within this (single!) shader:
1- Setup tasks to run
2- Attach in- and output buffers to CS (parameter setup)
3- In the CS, let each task read the pixel position (and maybe normal) from the G-Buffers
4- Make a bounding box for each tile
5- Test by which lamps a tile is affected (thus test per tile, not per pixel!)
6- Apply the lamps from #5 on the tile pixels. Sum up the results
7- Write the results to the two output light textures

******* 1. Setup
Before we can drive, we first need to start the car of course. Same for launching Compute Shaders. This is a bit different than you may be used to with OpenGL for example, where you activate a shader (change the state-machine) so all upcoming drawing calls make use of it. Remember, Compute shaders have nothing to do with GL/DX, so neither do they have to be executed within a GL/DX context.

Well, as for Deferred Rendering, we typically render the results in one or two full-screen textures. Let’s say your screen resolution was a whopping 1024 x 768. That means we have 786.432 pixels to calculate. In other words, the Compute Shader has to run 786.432 tasks, where each task calculated the lighting and writes the output into those two textures.

We give these tasks to the GPU, and to make real advantage of the hardware, we make Warps/Wavefronts (or called “Workgroups” in OpenCL) of 16x16, or 32x32 tasks (or whatever you prefer). Remember, tasks within a Warp can run simultaneously. Each group would draw one tile on the screen. Btw, one note, keep in mind that in OpenCL, the total number of tasks must be dividable through the workgroup size. 1024 / 32 = 32 = ok. 768 / 32 = 24 = ok. If the outcome wasn’t a rounded number, you may need to adjust either the workgroup size, or the total amount of tasks.

******* 2. Attach in- and output buffers
I said Compute Shaders have nothing to do with your graphics API (let’s assume OpenGL), but that is not entirely true of course. Our CS needs to read G-Buffers that were produced earlier via common ways, and also the output must be inside a texture that GL understands. Luckily, this is possible via Interop Buffers. You can share GL vertex, uniform and texture buffers with a CS so you can directly read or write in them. Phew.

Besides textures, we also need some sort of buffer that tells about all the (active) lights in the scene, so the CS can loop through them. I would make arrays of structs for pointlights, spotlights, and so on. Those structs then contain the light colors, matrices, shadowMap coordinates, et cetera. I store all shadowMaps within one bigger texture btw. To illustrate what you may need, here the kernel declaration in the CS shader:
__kernel void tiledDeferredLighting( const float camX,  const float camY,  const float camZ,
  __global struct shUBO_Lights* lights,
  __read_only image2d_t gBuf_Specular,
  __read_only image2d_t gBuf_Normal,
  __read_only image2d_t gBuf_WorldPos,
  __read_only image2d_t iTexShadowMapsSpot, // all spot shadowMaps
  __write_only image2d_t oTexDiffuse, // Diffuse output
  __write_only image2d_t oTexSpecular // Specular output
   …Magic Johnson…

******* 3. Read G-Buffers
Diving into the G-Spot, eh, CS code now. OpenCL can read 1D, 2D and 3D textures, using linear or integer coordinates, and eventually with mipmapping. The code is less handy compared to your common shaders, but it works. One slight difference is that you have to make texture-coordinates yourself now. This can be done by looking at the local- or global IDs that are given for each task. If you run tasks as a 2D array spread over the screen, the ID’s will correspond with (integer) pixel coordinates:
  int2 globalID = (int2)( get_global_id(0), get_global_id(1) );
  int2 localID = (int2)( get_local_id(0), get_local_id(1) );
  int2 texcoord = globalID;
  // Get G-Buffer data
  const sampler_t samplerG = CLK_NORMALIZED_COORDS_FALSE| // <- use integer coords instead of 0..1
              CLK_ADDRESS_REPEAT         | // 
                     CLK_FILTER_NEAREST;    // <- filtering method

  float4 gWorldPos = read_imagef( gBuf_WorldPos , samplerG, texcoord );

The global ID is the absolute number of a task. A local ID is the same, but within a Warp/Wavefront/Tile/Workgroup or whatever the hell you like to call them.

******* 4. Make a bounding box for each tile
In the setup from above, we made 32x32 tiles. Instead of letting each pixel test by which lights it would be affected, we do it per tile. Do the math, either test 1024 x 768 = 786.432 times, or {1024 x 768} / {32/32} = 768

E.Honda wins. If we know a bounding box, we can do a simple test to see if a light intersects the contents of a tile (and thus affects 1 or more of the pixels within). As an extra test, I also compute the average normal for each tile. If the pixel normals vary a lot within the tile, it has no use. But often you'll be looking at a relative flat piece where all pixels face the same direction more or less. So if the average normal is useful, we can also exclude lights that shine from the wrong direction.

Now, how to find the furthest or closest pixel within a tile? Let each pixel read a whole rectangle from a Z buffer? No, no, no. Damn no. This is where cooperation between tasks becomes useful. Let each task just read a single pixel, as usual. But use shared variables and a “min” & “max” function. Each task would overwrite the highest value in case it found a further pixel. However… Remember all the thread drama from previous post? Since the tasks run simultaneously, you can’t just do “furthestPixel = max( furthestPixel, myValue );”. Use an “atomic” operation instead. This ensures only 1 task will update the variable at a time:
  __local float minZ; // “__local” tells the variable is shared with all tasks within the tile
  __local float maxZ;

  ... read depth buffer

  minZ = atomic_min( minz, pixel.z );
  maxZ = atomic_max( minz, pixel.z );

  // Notice that these atomic operations may slow down the progress as a whole, as other tasks
  // within the tile need to wait (shortly). Minimize the atomic operations, or if it's really causing
  // problems, consider doing the testing on a lower resolution buffer (= less tasks).

The same tricks can be applied to find out whether the all the normals within a tile are more or less the same. Ifso, you can skip lights that shine from the wrong direction. You could for example sum up all normals, and then calculate the average normal and see if it’s not too different from the min/max normals.

Before averaging, you may want to ensure all tasks are done summing up. And also, if you don’t want this whole normal-check, you still have to wait till all tasks are done before you can proceed with the next step. To do so, use a barrier:
........barrier( CLK_LOCAL_MEM_FENCE );

******* 5. Test which lamps affect a tile
In the previous step we found some values to make a bounding box, and eventually an average normal. Now let's see which lights intersect, and can potentially lit them (notice we don't test shadows yet). Although the test is relative cheap, again we have to cooperate instead of letting 1 task looping through all lights and the other pixels jerking themselves off. For example, give each pixel 1 light to test, using its local index (see #3). So if there are 50 lamps, pixel0..49 will test... and the remaining ones will jerk off.

In practice, it's a bit more complicated as we have several types of lights. Mainly spotlights, pointlights, and huge sources such as the sun. So, use your creativity. The point is, spread the work! If a light passes the test, it has to be added to a list. If you know threading, you also know that dealing with lists can be tricky. Consider this:

In higher programming languages, we can usually lock a list, add or delete an element, then unlock it again. But we're working in the abyss here. Luckily, its fairly easy to achieve the same. Just use
__local int arrayIndexCounter = 0;
__local struct lights[MAX];
if ( lightPassed )
    int n = atomic_add( arrayIndexCounter ); 
    lights[n] = myTestedLamp;

The atomic operation ensures n will be filled with a proper value. Btw, there are also faster hardware counters for this purpose I believe, but I haven't tried them yet.

******* 6 & 7 Lighting
Showtime. You found all the lights and placed them in array(s). Now let each pixel loop through all these lights and apply them on itself. Where traditional Deferred Lighting would need additive blends, we can just sum lights with the good old + operator.

This step does pretty much the same as a normal lighting pixel shader, except that it does all the lights at once in a loop. The bad news is that you may have to (re)write quite some code in OpenCL for all the different types of lights. So, do it smart and write it as functions you could reuse in some possible future CL program.

The results simply get written back into a texture, in a similar fashion as we readed pixels in step #3.

Step 4 and 5, where multiple pixels share and contribute to the same data, is something that wouldn't be possible with ordinary shaders. And although you may not be in the need for such tricks soon, there are certainly scenario's that can benefit from this, or wouldn't even be possible without Compute Shaders (making octrees on the GPU for example). For that reason, it's good to step inside the world of Compute Shaders when you have a chance. Setting up a CS and launching it is childsplay, as the OpenCL API is very small compared to OpenGL. Finding good reasons to use a CS on the other hand is another story. It requires creativity, and a good understanding of when & what can benefit from CS features that aren't possible with common shaders. To be honest, I haven't implemented any CS into Tower22 yet. Either I could do without, or my older laptop card didn't support some of the (atomic) operations that make a technique like Tiled Deferred Lighting interesting.

Well, just download a demo and see for yourself. As usual, the best practice comes from trying yourself and looking at the big boys. Once you sucessfully filled some buffers, you will also learn that a CS can be used beyond 3D graphics and games. Maybe it will become useful one day!

I couldn't make a post without showing at least 1 interesting image. Or at least... I'm a bit proud of it as this is the first time I kinda sucessfully used the Wacom tablet, not drawing like a toddler. Other than that, it's nothing more than a conceptual monster that probably won't make it to the final rounds ;)