Saturday, February 18, 2017

Party Particles

For the fourth time in the T22 "lifecycle", I have to (re)implement Particles. Yippee. And as usual, it throws the question "How to do it? (this time)". In OpenGL, in a decent 2017 manner.

Iteration #1: CPU driven
The first iteration was basic, CPU driven particles. The old fashioned "pre-shader-era" way. Or let's just say the stupid way. CPU calculates position/size/colour/animation(UV) for a bunch of "sprites", and draws them one by one. Advantage is that you can easily access the previous state of particles and calculate the next step any fancy way you desire.

Disadvantage is, well, it's just slow. In general, the CPU should shut the hell up. Rather than looping through dozens of particles and interfering the GPU the whole time like a kid with a thousand questions, batch stuff, and let the GPU do the hard work with a single CPU call. And in the meanwhile, the CPU can do something different. Play with its toes, bother mom, smoke pot, whatever.
The idea of particles and how to finally render them didn't change much last twenty years. But what did change, are the methods to get a LOT of them.

Iteration #2: Transform Feedback
It has been a while since I used this. And though it's much better than step 1, it's also a bit harder to grasp... And I don't like hard stuff. Let's see if I remember...

So you have not one, but two "Vertex Buffers" (VBO’s) stored on the GPU. Filled with the maximum number of particles you allow, say a million. Instead of sprite quads, it is sufficient to store just points, thus 1 vertex per particle. When drawing this “point array”, a Geometry Shader can be used to generate a camera-facing quad(=sprite) out of each point when actually drawing the particles. But the more interesting and challenging part with particles, is the “Motion-Update” step. Since we moved away from the CPU, the GPU is responsible for moving/sizing/rotation and animation the particles now. But how?

In this case, with *Vertex Shaders*. But instead of calculating screen-coordinates for rendering stuff, you will be overwriting vertex values in that buffer, using "Transform Feedback", hence the name. So you’re actually not drawing anything, you’re modifying the contents of a VBO.

And the reason you need 2 buffers, is so you can ping-pong between them. Read BufferA to fetch the current state of each particle, evaluate its position/colour/size/velocity/…/, then write the result in BufferB. In coding terms:

                FOR EACH Vertex IN BufferA
                               IF alive THEN
                                               Update life-cycle                   (kill particle after X time or Events)
Update velocity                      (gravity, wind, launchForce, collided, …)
Update position                     ("vertexA.position + vertexA.velocity")
                               Update size                            (growing, shrinking?)
                               Update animation state        (cycle through sprite frames)
                               Put the vertex somewhere far away, or put size on ZERO
                               Check if we should RESPAWN
STORE modified Vertex in BufferB

So basically we use the Vertex (composed out of a couple of vectors, 64 bytes in my case) to store the position and a few other properties -The State- of a particle. Updating these "States" works fast, because the CPU is quite now, and Vertex Shader are quick as hell, just like you can draw other shapes with many thousands of vertices. 

But now the problem. A GPU doesn't have access to RAM or local/global variables like the CPU can. It has no sense of its environment to test for collisions, or catching local influences, like the wind. As with other shaders, the only way to pump such variables into your shader is either with uniform parameters / UBO's or through textures. You'll be doing that, but being limited in what you can do with them, the data lacks awesomeness. It requires some tricks and smart thinking to work around with that, with a minimal amount of variable inputs.

Such variables could be:
  • LifeCycle - Usually particles live fast and die young
  • Velocity - the actual velocity, randomized spawn velocity / directions
  • Physics. Gravity, Air friction, how far they can bounce off surfaces, ...
  • Rotation 
  • Scaling- actual size, growing, shrinking, minimum / maximum random size
  • Colour / Alpha. Again, the colour may change over time, particles may fade in and out
  • Spawning volume. The box/sphere/shape where new particles are born
  • And some good Random spicing... Shaders do NOT have a "random" function, you'll need some help
The Vertex needs to carry some of those actual states, plus you'll need some more inputs / or hard-coded values in the shader. Typically when you Spawn a new particle, you want to randomize these values, using some (artist) input.
The biggest issue I had with Transform Feedback, is the variety of "Movement Patterns". A raindrop is pretty simple, doesn't require much variables, and the vertex shader would basically just move the position down, given a gravity factor, and maybe some wind vectors. But how about fire & smoke? Flies randomly circling a cadaver? Tornado's? Magical Twinky Twonky Fairy Thingies? A piece of paper or a leaf gently whirling around? Blood splatter? Fountains?

With some variable vectors you can achieve quite a lot, but I felt my "SuperShader" for particles was quickly getting a lot of overhead, and still wasn't able to deal with more advanced, custom situations. Got to note this problem is not tight to “Transform Feedback” itself; all shaders have these issues. Just saying it was time for Iteration #3.


Iteration #3: OpenCL Compute Shaders, composed of custom code-chunks
By far the best solution I had, so far. Not sure if it was faster than the "Transform Feedback" variant, as I used OpenCL (not GL!) at the time, which isn't really integrated in an OpenGL pipeline by nature. Meaning data has to be swapped between the two.

But then again... performance wasn't that much of an issue anyway. I mean, Tower22 doesn't need millions of particles to begin with. Second, fill-rate is a much bigger problem really. Particles never felt slow, unless drawing larger clouds, having many layers of relative large particles. In fact, doing a one-million particle fountain is child’s play. Draw 30 layers of smoke, and your supercomputer will crumble. Just saying.

The nice thing about Compute Shaders, is that you can have easier access to buffers. For instance, you can also fetch data from other neighbour particles. You just work with arrays, so you can grab any structure from it. All in all, it feels more natural, and gave a bit more freedom. Neither did I need to ping-pong between two buffers, you can read & write at the same time.

But the biggest change was that I made an actual editor this time. Instead of just writing out a whole (Compute)Shader, you would be selecting small chunks. One chunk that controls the position, another one that controls the size, and so on. And while you could still write custom code, you could also pick standard pre-coded pieces, sufficient for most standard cases. And as you modified code, you could directly preview your results.

The bad news, besides OpenCL sort of an Alien in my OpenGL system, was that you would end up with hundreds, if not thousands of different shaders. Basically each particle-generator could have slight different code. So I tried to make some sort of batching system. If 2 shaders used the same code-blocks (but with different variable inputs maybe), they would refer to the same shader instance. Hindsight though, I'm not so sure if it's that important. I mean, how many different particle-generators do you have loaded & active at the same time?

Instead of writing (the same kind of) code for each possible scenario, you could enable/disable options and compose an (OpenCL) ComputeShader.

Iteration #3 and a half: Adding dynamic lighting
A bit off road, but relevant for particles nevertheless: how to light those little bastards? Well, pretty simple actually. In a pre-stage, render all your positions into a texture. Each particle would get 1 pixel, so you can do quite a lot with a 1024x1024 texture for example.

Then threat that position-buffer as you would do normal Deferred Lighting. Make a second “Particle Light Texture”, bind it as a target, loop through your lights, and do your thing:
                // Step 1: Render positions into texture
                setRenderTarget(  particlePositionTexture  )
                FOR EACH particle IN vertexBuffer
                               render particle World position into targetTexture
                               Use particle ID(index) to calculate the X/Y target coordinates

                // Step 2: Bring light to me
                setRenderTarget(  targetLightTexture  )
                enable AdditiveBlend
                FOR EACH light
                                render fullscreen quad
                                               FOR EACH pixel(=particle)
                                                               get position from "particlePositionTexture"
                                                               calculate normal (position to camera)
                                                               apply light/Shadowmapping as you would do on a normal solid surface
In the final render-stage, each particle would grab its light value again from that texture, and multiply it with its texture or whatever colour data you have.

Iteration #4?
Same thing as 3 really, but with some small differences. Like swapping OpenCL for GLSL computeshaders. I’ll explain in deeper detail next time, including a few notes about particle collision detection.

This test-setup was absolutely not inspired on an older Dutch lady that caught the tabloids with a leaked pipi video recently. No.