For the
fourth time in the T22 "lifecycle", I have to (re)implement
Particles. Yippee. And as usual, it throws the question "How to do it?
(this time)". In OpenGL, in a decent 2017 manner.
Iteration #1: CPU driven
The
first iteration was basic, CPU driven particles. The old fashioned
"pre-shader-era" way. Or let's just say the stupid way. CPU
calculates position/size/colour/animation(UV) for a bunch of
"sprites", and draws them one by one. Advantage is that you can
easily access the previous state of particles and calculate the next step any
fancy way you desire.
Disadvantage
is, well, it's just slow. In general, the CPU should shut the hell up. Rather
than looping through dozens of particles and interfering the GPU the whole time
like a kid with a thousand questions, batch stuff, and let the GPU do the hard work
with a single CPU call. And in the meanwhile, the CPU can do something
different. Play with its toes, bother mom, smoke pot, whatever.
The idea of particles and how to finally render them didn't change much last twenty years. But what did change, are the methods to get a LOT of them.
Iteration #2: Transform Feedback
It has
been a while since I used this. And though it's much better than step 1, it's also
a bit harder to grasp... And I don't like hard stuff. Let's see if I remember...
So you
have not one, but two "Vertex Buffers" (VBO’s) stored on the GPU.
Filled with the maximum number of particles you allow, say a million. Instead
of sprite quads, it is sufficient to store just points, thus 1 vertex per
particle. When drawing this “point array”, a Geometry Shader can be used to generate
a camera-facing quad(=sprite) out of each point when actually drawing the
particles. But the more interesting and challenging part with particles, is the
“Motion-Update” step. Since we moved away from the CPU, the GPU is responsible
for moving/sizing/rotation and animation the particles now. But how?
In this
case, with *Vertex Shaders*. But instead of calculating screen-coordinates for
rendering stuff, you will be overwriting vertex values in that buffer, using
"Transform Feedback", hence the name. So you’re actually not drawing
anything, you’re modifying the contents of a VBO.
And the
reason you need 2 buffers, is so you can ping-pong between them. Read BufferA
to fetch the current state of each particle, evaluate its position/colour/size/velocity/…/,
then write the result in BufferB. In coding terms:
FOR EACH Vertex IN BufferA
IF alive THEN
Update life-cycle (kill particle after X time
or Events)
Update velocity (gravity, wind,
launchForce, collided, …)
Update position ("vertexA.position +
vertexA.velocity")
Update size (growing, shrinking?)
Update animation state (cycle through sprite frames)
ELSE
Put the vertex somewhere far away, or
put size on ZERO
Check if we should RESPAWN
STORE modified Vertex in
BufferB
So basically we use the Vertex (composed out of a couple of vectors, 64 bytes in my case) to store the position and a few other properties -The State- of a particle. Updating these "States"
works fast, because the CPU is quite now, and Vertex Shader are quick as hell,
just like you can draw other shapes with many thousands of vertices.
But now
the problem. A GPU doesn't have access to RAM or local/global variables like
the CPU can. It has no sense of its environment to test for collisions, or
catching local influences, like the wind. As with other shaders, the only way to pump such variables
into your shader is either with uniform parameters / UBO's or through textures.
You'll be doing that, but being limited in what you can do with them, the data
lacks awesomeness. It requires some tricks and smart thinking to work around with that, with a minimal amount of variable inputs.
Such variables could be:
- LifeCycle - Usually particles live fast and die young
- Velocity - the actual velocity, randomized spawn velocity / directions
- Physics. Gravity, Air friction, how far they can bounce off surfaces, ...
- Rotation
- Scaling- actual size, growing, shrinking, minimum / maximum random size
- Colour / Alpha. Again, the colour may change over time, particles may fade in and out
- Spawning volume. The box/sphere/shape where new particles are born
- And some good Random spicing... Shaders do NOT have a "random" function, you'll need some help
The Vertex needs to carry some of those actual states, plus you'll need some more inputs / or hard-coded values in the shader. Typically when you Spawn a new particle, you want to randomize these values, using some (artist) input.
The
biggest issue I had with Transform Feedback, is the variety of "Movement
Patterns". A raindrop is pretty simple, doesn't require much variables,
and the vertex shader would basically just move the position down, given a
gravity factor, and maybe some wind vectors. But how about fire & smoke?
Flies randomly circling a cadaver? Tornado's? Magical Twinky Twonky Fairy Thingies? A
piece of paper or a leaf gently whirling around? Blood splatter? Fountains?
With
some variable vectors you can achieve quite a lot, but I felt my
"SuperShader" for particles was quickly getting a lot of overhead,
and still wasn't able to deal with more advanced, custom situations. Got to
note this problem is not tight to “Transform Feedback” itself; all shaders have
these issues. Just saying it was time for Iteration #3.
Iteration #3: OpenCL Compute Shaders, composed
of custom code-chunks
By far
the best solution I had, so far. Not sure if it was faster than the
"Transform Feedback" variant, as I used OpenCL (not GL!) at the time,
which isn't really integrated in an OpenGL pipeline by nature. Meaning data has
to be swapped between the two.
But then
again... performance wasn't that much of an issue anyway. I mean, Tower22
doesn't need millions of particles to begin with. Second, fill-rate is a much
bigger problem really. Particles never felt slow, unless drawing larger clouds,
having many layers of relative large particles. In fact, doing a one-million
particle fountain is child’s play. Draw 30 layers of smoke, and your
supercomputer will crumble. Just saying.
The nice
thing about Compute Shaders, is that you can have easier access to buffers. For
instance, you can also fetch data from other neighbour particles. You just work
with arrays, so you can grab any structure from it. All in all, it feels more
natural, and gave a bit more freedom. Neither did I need to ping-pong between
two buffers, you can read & write at the same time.
But the
biggest change was that I made an actual editor this time. Instead of just
writing out a whole (Compute)Shader, you would be selecting small chunks. One
chunk that controls the position, another one that controls the size, and so
on. And while you could still write custom code, you could also pick standard
pre-coded pieces, sufficient for most standard cases. And as you modified code,
you could directly preview your results.
The bad
news, besides OpenCL sort of an Alien in my OpenGL system, was that you would
end up with hundreds, if not thousands of different shaders. Basically each
particle-generator could have slight different code. So I tried to make some
sort of batching system. If 2 shaders used the same code-blocks (but with
different variable inputs maybe), they would refer to the same shader instance.
Hindsight though, I'm not so sure if it's that important. I mean, how many
different particle-generators do you have loaded & active at the same time?
Instead of writing (the same kind of) code for each possible scenario, you could enable/disable options and compose an (OpenCL) ComputeShader.
Iteration #3 and a half: Adding dynamic
lighting
A bit
off road, but relevant for particles nevertheless: how to light those little bastards?
Well, pretty simple actually. In a pre-stage, render all your positions into a
texture. Each particle would get 1 pixel, so you can do quite a lot with a 1024x1024
texture for example.
Then
threat that position-buffer as you would do normal Deferred Lighting. Make a
second “Particle Light Texture”, bind it as a target, loop through your lights,
and do your thing:
// Step 1: Render positions into texture
setRenderTarget( particlePositionTexture )
FOR EACH particle IN vertexBuffer
render
particle World position into targetTexture
Use
particle ID(index) to calculate the X/Y target coordinates
// Step 2: Bring light to me
setRenderTarget( targetLightTexture )
enable
AdditiveBlend
FOR EACH light
render fullscreen quad
FOR EACH pixel(=particle)
get
position from "particlePositionTexture"
calculate
normal (position to camera)
apply
light/Shadowmapping as you would do on a normal solid surface
In the
final render-stage, each particle would grab its light value again from that
texture, and multiply it with its texture or whatever colour data you have.
Iteration #4?
Same
thing as 3 really, but with some small differences. Like swapping OpenCL for
GLSL computeshaders. I’ll explain in deeper detail next time, including a few
notes about particle collision detection.
This
test-setup was absolutely not inspired on an older Dutch lady that caught the tabloids
with a leaked pipi video recently. No.