Saturday, March 4, 2017

Party Particles #2- The Implementation

Iteration #4:?
Having refreshed my mind about implementing particles in the previous post, how about version 4? I don't have spectacular plans, so I'll just try to blend the best parts together. Option #1 (particles evaluated by the CPU) is a no-go obviously, Transform Feedback was a bit confusing to setup as far as my Korsakoff brain can remember, and OpenCL is currently not in (and probably won't be) Engine22. But GLSL ComputeShaders are...

Being all OpenGL and stuff, usage of GLSL ComputeShaders is much easier than having to implement OpenCL. Though I found them wonky. They worked pretty well on my dev-laptop, but terribly, and I mean really terribly slow on other (I admit, somewhat older) stations. For instance, I rewinded Tiled Deferred Rendering back to old fashioned Deferred Lighting, and gained speed :&

But as said, the major problem in previous iterations wasn't speed really. It was flexibility. If I want my particles to dance the Tango, then I should be able to program that behaviour, somehow. Somehow easily. And even though it sounds cheap, I think just having full-code access might be the easiest option after all. Which simply boils down to writing a bunch of ComputeShaders, completely by hand, and adding a few variables to tweak minor behaviour adjustments, such as speeds, colours or directions. Then pick whatever shader suits your needs best, or eventually code a new one if none does.

For artists, this kinda sucks though. Assuming they don't know / don't want to code GLSL, they are tied to whatever I bring them. No fancy editor like I had in Iteration #3 (or Node-based stuff like Unreal or Unity has). Now sure I can still add such a feature anytime, but to start quickly, I just give it a good 'ol handjob. Erh, right.
This is how all materials are edited in Engine22: you chose an "UberShader", then toggle options on/off. Certainly not as flexible (and less fun) as Node based systems. Then again, it works quick & easy for the majority of materials I've met so far.

VBO Initialisation

Into the details gentlemen. So a particle in Engine22 is essentially this:

eGX_Particle      = record
       pos        : eVec4; // Pos        Size
       state      : eVec4; // X: Life    Y: State    Z: Random 0..1   W: ID
       velocity   : eVec4; // Velocity   Rotation
       color      : eVec4; // Color      Opacity
end; // eGX_Particle      64 bytes
Yes, that's Delphi code. I need it to initialize things, but basically the GPU takes over from there, so I have a similar structure for GLSL. And since particles usually jump on you in big groups, I actually initialize an mega array of 1.048.576 particles (= 64 MB). Why that number? Because it matches a 1024x1024 texture. But more about that later.

There is one thing the CPU has to do when making this 1 million particle VBO: Give each particle an ID (0,1,2,3,...), and also give it a random value (state.z in this case). It will come in handy later on. Anyhow, making this VBO could look something like this:

procedure eGX_ParticleManager.initializeVBO();
var    particles    : array of eGX_Particle;
       i            : eInt;
       // Make one big fat motherfucker of a particle array
       // (note it can handle multiple emitters, not just 1)
       setLength( particles, EGX_MAX_PARTICLES );

       for i:=0 to EGX_MAX_PARTICLES-1 do begin
             particles[i].pos.w    := 0;
             particles[i].state.x  := 0;
             particles[i].state.y  := 0;
             particles[i].state.z  := mRandomF( 0, 1 );
             particles[i].state.w  := i;

       self.vbo := eglBuffers.getVBO( EGL_VERTEXLAYOUT_PXNW );
       self.vbo.setDataPoints( @particles[0].pos.x,
                           EGX_MAX_PARTICLES, 4, true );

       // Got our stuff on the GPU now, remove the RAM copy
       setLength( particles, 0 );
end; // initializeVBO
procedure eGL_VBO.setDataPoints( 
vertexArrayPtr      : pointer; // points to an array of particle records
       const vertCount     : eInt;    // 1 million something
       const attribCount   : eInt;    // 4, cause 4 vectors per point (particle)
       const dynamicContent: eBool    // yeah, pretty much
       self.makeVBO( vertexArrayPtr, vertCount, attribCount,
                    GL_POINTS, dynamicContent );
end; // VBO setDataPoints

procedure   eGL_VBO.makeVBO(      vertices               : pointer;
                    const vertCount, attribCount : eInt;
                    const glMode                 : TGLEnum;
                    const dynamicContent         : eBool );

       self.attribCount:= attribCount;
       self.vertCount  := vertCount;
       self.stride     := self.attribCount * 16;
       self.glMode     := glMode;
       self.useIndices := false;

       glGenBuffers( 1, @self.glName );
       glBindBuffer( GL_ARRAY_BUFFER, self.glName );
       if ( dynamicContent ) then
             glBufferData( GL_ARRAY_BUFFER, self.vertCount * self.stride,
                           vertices, GL_STATIC_DRAW )
             glBufferData( GL_ARRAY_BUFFER, self.vertCount * self.stride,
                           vertices, GL_DYNAMIC_DRAW );

       glBindBuffer( GL_ARRAY_BUFFER, 0 );
end; // makeVBO

So, we ended up having 1 big VBO, made of a million vertices/points/particles, whatever you like to call them. Each particle made out of 4 vectors storing particles properties that need to be evaluated over its short life. Now that we reserved this memory (on the GPU), ComputeShaders can mess around in it.

Fixed one million particles ?!

Now you may wonder, what if I only need a few particles? The amount of particles we need changes all the time, right? True, but we'll be rendering a million particles every time anyway. The big array is fragmented in several smaller “subranges”, which can be reserved by particle-emitters (clouds, fire, sparks, plasma piss, …):

Particle Generators

A particle generator typically is made of a position & direction (matrix), a bunch of settings like launch-force or colours, and of course what shaders it uses for rendering and evaluating its particles. Each generator will get a VBO subrange (if still available), depending on how much particles it may need on its peak. Each generator will also use a ComputeShader, which will be executed over that subrange. These ComputeShaders will evaluate the "particle structs", by reading and writing back into the VBO. Typically they will move the positions, grow or shrink sizes, adjust velocities based on physics/collisions, and so on.

Particles that are unassigned, or are temporarily invisible in case its generator doesn't want to render it, notice its "State" property in the code above. Inactive particles will exit their code early, and being rendered zero sized, and/or somewhere far out of sight.

Note that more complex effects such as that flaming spicy Thai-food fart above (the mushroom cloud I mean), may require multiple generators, in case the sprite textures and/or motion ComputeShaders have significant differences. For example, one generator would do the ring, another generator the explosion itself, and yet another generator the darker debris clouds on the bottom. Eventually you could fade out generators on different distances. The big stuff gets rendered from larger distances, the smaller details won’t be rendered until the camera is close. And of course, don’t forgot to release your VBO portion when the generator is done.


ParticleLife: Born to get Killed

The (GLSL) ComputeShader does three things globally: It spawns, evaluates, and kills. The ComputeShader is executed for every particle, whether it’s active or not. Particles that are already active, will continue updating their positions and other attributes, using (crude) physics or some other weird movement pattern you programmed. It will also countdown its "life" value. When it reaches zero, the particle comes in a "dead" or "inactive" state, from where it can be reincarnated again. And yeah, usually the colour fades out prior to that. Some particles may also get killed based on collisions by the way. Rain for example can be recycled as soon as it reaches below the floor.

While dead, we'll check if we should re-enable it. This is a bit tricky, as generators may increase or decrease their "max particle number" dynamically. Think about a fountain that stops squirting... ok. One way is to simply check if your particle.ID (see structure above) is higher than the max-allowed particles of the generator at that time. If so, do not recycle, keep dead. If not, you're free to go again.

Yet, keeping a continuous flow can still be tricky; if all particles are fired at once, simultaneously, nothing will be spawned in the meanwhile. What you need is some (random) time-offsets before (re)activating. If particles can live up to 5 seconds, the initial delay should probably be a random number between 0 and 5 seconds.
Local weather... very local.

When spawned, you probably want to set some particle data, like its initial position, colour, size, and velocity/direction. The parent Particle Generator should give us a “breeding volume”, for example a box or sphere that moves and rotates along the generator matrix. Particles will get a random position somewhere within that volume. Depending on what motion will follow, you may also want to define a “velocity cone”, the range within launch-vectors can be randomized.

All in all, there is an important portion of "random" here, but shaders can't do such a thing – shaders don’t have a random function (although you can make one, given some continuous varying input). That's why we gave each particle a random seed number (state.z). Together with the particle ID and maybe generator position or an overall timer value, you should be able to randomize decently.

       float random(vec2 coordinate, float seed)
          return fract(sin(dot(coordinate*seed, vec2(12.9898, 78.233)))*43758.5453);

       // Generate a bunch of random numbers (0..1), based on particle ID,
       // particle Random input, and elapsed time
       float  pRand1 = random( vec2(, part.rand), pElapsed  );
       float  pRand2 = random( vec2(pElapsed  ,, part.rand  );
       float  pRand3 = random( vec2(part.rand , pElapsed), );


Once spawned, the same ComputeShader has to update positions and such. It can be as simple as this:
                uint  particleIndex       = gl_GlobalInvocationID.x + offsetIndex;
       // Read particle from VBO
       particle = VBO[particleIndex];   
       // Drop down
       particle.pos.y -= gravity * deltaTime; 
       // Write particle back into VBO
       VBO[particleIndex] = particle;
But that's a bit boring, right? As mentioned before, it really depends on what kind of effect we're trying to achieve here. This would be slightly more interesting: += deltaTime
       // Eliminate velocity after 1.5 seconds, and drop down
       particle.velocity = mix( particle.velocity,
min(, 1.5f ) );
       particle.pos += particle.velocity * deltaTime;

Well, you have to play with this yourself. Same stuff to evaluate colour, size, rotation, typically based on the lifetime. And don't forget you could include lookup textures to colorize and such. Last but not least, make sure you make some standard functions so you can quickly compose another shader -or even better, compose them with a visual editor like the Big Boys do.               
No, snow doesn't drop straight down.


So your particles can move. Next step would be to keep them from falling through walls and such. Would be nice if rain splatters on the ground, right? Now I have little experience in this area, so forgive me if I forget smart tricks. But the major problem is that your GPU can't access your fancy (RAM) BSP trees, Octrees, Newton physics or whatever it is you use to store geometry and determine collisions.

But there are a few workarounds though. Imperfect, but still. First, in Engine22 each sector(room, hallway, ...). One of them is to use your Depth-Buffer. Most likely you have one somewhere lying around, for other techniques, such as SSAO or something.

If particle depth becomes larger than what your screen depth-texture indicates, it *may* have intersected the floor, walls, or any other object. I say *may*, because it could also be behind it. You can include your normal-buffer (if you have a gBuffer approach), or compare positions to calculate the plausability of a collision. Besides depth, I also store the world-positions in my gBuffer. So I can calculte the distance between a particle and the pixel that is occluding it in the foreground. Small distance? Likely a collision. Bigger distance? Particle is likely behind.

// Check if we collided with the environment based on depth & normal gBuffer
// Convert particle (world) pos to screen UV – if its on the screen at all!
       vec4 screenUV       = (viewProjMatrix * vec4(, 1.f ));
             screenUV.xy  = (screenUV.xy + screenUV.ww) * 0.5f;
             screenUV.xy /= screenUV.ww;

       if ( screenUV.x > -0.0001f   &&  screenUV.x < 1.0001f  &&
            screenUV.y > -0.0001f   &&  screenUV.y < 1.0001f )    
             vec4 gPos    = texture( gBufferDepth, screenUV.xy );

             // Compare depth. Pixel in foreground?
             if ( partDistToCam + partSize*0.5 >= gPos.w ) {
                    // Possible intersection. Check dist between part and surf
                    float distToSurf = length( - );
                    if ( distToSurf < 0.3 ) {
                           // Impact. Now do ... something
                    #ifdef _BOUNCE                   
                           #macro _PARTICLES_BOUNCE
                    #ifdef _SPLAT
                           #macro _PARTICLES_Splat
       } // OnScreen
Sure, it's inaccurate, but it may just do the trick. Remember particles in most games you have seen are likely pretty inaccurate. Every noticed it? No? Fine then.
Now next step, how the fuck do we keep that darn snow outside? Other than just shaping the generators properly, I didn't found out that yet. Adjusting the generator shapes works fine in a small game like Tower22, but in a huge mixed in/outdoor world (think about a city), you want a more dynamic solution.

Depth Sorting & Fill-rate issues

Well, as usual this post got much longer than I intended again, so let me finish with a more complete ComputeShader, as an example. But before leaving, keep in mind you can also use another Computeshaders to sort your VBO (per subrange) by comparing distances and swapping array elements. I never really needed proper sorting so far though (when using Additive or Pre-Multiplied alpha-blending), but just keep it in mind.

I managed to update hundred thousands particles without sweat. A bigger problem remains the fill-rate. Many tiny particles, like those snowflakes, won’t be a problem. But to form thicker clouds, you usually need larger particles to make a consistent “whole”. Drawing many overlapping layers can still ruin your party. One idea might be to accumulate them in a (downsized) buffer, using very simple fragment shaders only. Then when upscaling, apply the more advanced tricks like lighting & texturing. It’s a somewhat specific trick that won’t work for every type of particle though, but here, read it:

struct Particle {
       vec4   pos;         // Pos        Size
       vec4   state;       // X: Life    Y: State    Z: Random 0..1   W: ID
       vec4   velocity;    // Velocity   Rotation
       vec4   color;       // Color      Opacity
}; // Particle

layout( std140, binding=0 ) buffer VBO {
       Particle particles[];
} outBuffer;

layout( local_size_x = 4, local_size_y = 1, local_size_z = 1 ) in;

// Generator Variables
uniform vec4 pTiming;
uniform vec4 pGenerator;
uniform vec4 pSpawnVolume;
uniform vec4 pLaunchVector1;
uniform vec4 pLaunchVector2;
uniform mat4 pMatrix;

       float random(vec2 coordinate, float seed)
           return fract(sin(dot(coordinate*seed, vec2(12.9898, 78.233)))*43758.5453);
const int STATE_DEAD = 0;
const int STATE_ALIVE      = 1000;

void main() {
       uint  offsetIndex   = uint(pGenerator.z);
       uint  maxParticles  = uint(pGenerator.w);
       uint  globID        = gl_GlobalInvocationID.x + offsetIndex;
// Fetch particle from VBO, decode its data
       Particle part       = outBuffer.particles[ globID ];
       vec3  partPos       =;
       float partSize             = part.pos.w;
       float partLife             = part.state.x;
       int   partState     = int( part.state.y );
       float partRand             = part.state.z;    
       vec3  partVeloc     =;
       vec3  partColor     =;
       float  pSpawnDelay  = pTiming.x;
       float  pElapsed     = pTiming.z;
       float  pLifeTime    = pGenerator.x;
       float  pGravity     = pGenerator.y;
       vec2   pSize        =;
       float  pAirFriction = pLaunchVector2.w;
// Generate a bunch of randoms
       float  pRand1       = random( vec2(pTiming.y, partRand), pElapsed  );
       float  pRand2       = random( vec2(pElapsed ,pTiming.y), partRand  );
       float  pRand3       = random( vec2(partRand , pElapsed), pTiming.y );

       float  deltaSecs    = pTiming.w;

       // Age
       if ( gl_GlobalInvocationID.x < maxParticles )
             partLife += deltaSecs;
       if ( partState >= STATE_ALIVE-1 ) {
             // Apply Gravity & Launch Velocity
       #ifdef _GRAVITY           
             partVeloc.y  += pGravity * deltaSecs;
             partVeloc.xz *= vec2( pAirFriction ); // reduce velocity over time
             partPos      += partVeloc * deltaSecs;
             // Check if we aged.
             // Ifso, kill particle, gen a random respawn delay time
             if ( partLife > pLifeTime  ) {
                    partState    = STATE_DEAD;
                    partLife     = -pRand3 * pSpawnDelay;
       } else {
             // Check if we should (re)spawn this DEATH particle
             if ( partLife >= 0  &&  gl_GlobalInvocationID.x < maxParticles ) {
                    // Spawn
                    // Generate (random) data
                    partLife     = 0;
                    partState    = STATE_ALIVE;
                    partSize     = mix( pSize.x, pSize.y, pRand1 );
                    // Generate a (local) random position & launch direction
                    partPos.x    = (-1 + pRand1 * 2) * pSpawnVolume.x;
                    partPos.y    = (-1 + pRand2 * 2) * pSpawnVolume.y;
                    partPos.z    = (-1 + pRand3 * 2) * pSpawnVolume.z;

                    partVeloc.x  = mix( pLaunchVector1.x, pLaunchVector2.x, pRand4 );
                    partVeloc.y  = mix( pLaunchVector1.y, pLaunchVector2.y, pRand5 );
                    partVeloc.z  = mix( pLaunchVector1.z, pLaunchVector2.z, pRand6 );
                    // Local to World coordinates
            = (pMatrix * vec4(partPos,1.f)).xyz;
           = mat3(pMatrix) * partVeloc;
             } else {
                    // For now, Hide it somewhere far away, in another galaxy
                    partPos      = vec3(-9999,-9999,-9999);
// Write results back into VBO
       part.pos     = vec4( partPos, partSize );
       part.state   = vec4( partLife, float( partState ), partRand, globID );
       part.velocity = vec4( partVeloc, partRotat );
       part.color   = vec4( partColor, partAlpha );

       outBuffer.particles[ globID ] = part;
} // end


No comments:

Post a Comment