Tower22: Party Particles #2- The Implementation

Iteration #4:?

Having refreshed my mind about implementing particles in the previous post, how about version 4? I don't have spectacular plans, so I'll just try to blend the best parts together. Option #1 (particles evaluated by the CPU) is a no-go obviously, Transform Feedback was a bit confusing to setup as far as my Korsakoff brain can remember, and OpenCL is currently not in (and probably won't be) Engine22. But GLSL ComputeShaders are...

Being all OpenGL and stuff, usage of GLSL ComputeShaders is much easier than having to implement OpenCL. Though I found them wonky. They worked pretty well on my dev-laptop, but terribly, and I mean really terribly slow on other (I admit, somewhat older) stations. For instance, I rewinded Tiled Deferred Rendering back to old fashioned Deferred Lighting, and gained speed :&

But as said, the major problem in previous iterations wasn't speed really. It was flexibility. If I want my particles to dance the Tango, then I should be able to program that behaviour, somehow. Somehow easily. And even though it sounds cheap, I think just having full-code access might be the easiest option after all. Which simply boils down to writing a bunch of ComputeShaders, completely by hand, and adding a few variables to tweak minor behaviour adjustments, such as speeds, colours or directions. Then pick whatever shader suits your needs best, or eventually code a new one if none does.

For artists, this kinda sucks though. Assuming they don't know / don't want to code GLSL, they are tied to whatever I bring them. No fancy editor like I had in Iteration #3 (or Node-based stuff like Unreal or Unity has). Now sure I can still add such a feature anytime, but to start quickly, I just give it a good 'ol handjob. Erh, right.

This is how all materials are edited in Engine22: you chose an "UberShader", then toggle options on/off. Certainly not as flexible (and less fun) as Node based systems. Then again, it works quick & easy for the majority of materials I've met so far.

VBO Initialisation

Into the details gentlemen. So a particle in Engine22 is essentially this:

eGX_Particle = record

pos : eVec4; // Pos Size

state : eVec4; // X: Life Y: State Z: Random 0..1 W: ID

velocity : eVec4; // Velocity Rotation

color : eVec4; // Color Opacity

end; // eGX_Particle 64 bytes

Yes, that's Delphi code. I need it to initialize things, but basically the GPU takes over from there, so I have a similar structure for GLSL. And since particles usually jump on you in big groups, I actually initialize an mega array of 1.048.576 particles (= 64 MB). Why that number? Because it matches a 1024x1024 texture. But more about that later.

There is one thing the CPU has to do when making this 1 million particle VBO: Give each particle an ID (0,1,2,3,...), and also give it a random value (state.z in this case). It will come in handy later on. Anyhow, making this VBO could look something like this:

procedure eGX_ParticleManager.initializeVBO();

var particles : array of eGX_Particle;

i : eInt;

begin

// Make one big fat motherfucker of a particle array

// (note it can handle multiple emitters, not just 1)

setLength( particles, EGX_MAX_PARTICLES );

for i:=0 to EGX_MAX_PARTICLES-1 do begin

particles[i].pos.w := 0;

particles[i].state.x := 0;

particles[i].state.y := 0;

particles[i].state.z := mRandomF( 0, 1 );

particles[i].state.w := i;

end;

self.vbo := eglBuffers.getVBO( EGL_VERTEXLAYOUT_PXNW );

self.vbo.setDataPoints( @particles[0].pos.x,

EGX_MAX_PARTICLES, 4, true );

// Got our stuff on the GPU now, remove the RAM copy

setLength( particles, 0 );

end; // initializeVBO

...

procedure eGL_VBO.setDataPoints(

vertexArrayPtr : pointer; // points to an array of particle records

const vertCount : eInt; // 1 million something

const attribCount : eInt; // 4, cause 4 vectors per point (particle)

const dynamicContent: eBool // yeah, pretty much

);

begin

self.makeVBO( vertexArrayPtr, vertCount, attribCount,

GL_POINTS, dynamicContent );

end; // VBO setDataPoints

procedure eGL_VBO.makeVBO( vertices : pointer;

const vertCount, attribCount : eInt;

const glMode : TGLEnum;

const dynamicContent : eBool );

begin

self.attribCount:= attribCount;

self.vertCount := vertCount;

self.stride := self.attribCount * 16;

self.glMode := glMode;

self.useIndices := false;

glGenBuffers( 1, @self.glName );

glBindBuffer( GL_ARRAY_BUFFER, self.glName );

if ( dynamicContent ) then

glBufferData( GL_ARRAY_BUFFER, self.vertCount * self.stride,

vertices, GL_STATIC_DRAW )

else

glBufferData( GL_ARRAY_BUFFER, self.vertCount * self.stride,

vertices, GL_DYNAMIC_DRAW );

glBindBuffer( GL_ARRAY_BUFFER, 0 );

end; // makeVBO

So, we ended up having 1 big VBO, made of a million vertices/points/particles, whatever you like to call them. Each particle made out of 4 vectors storing particles properties that need to be evaluated over its short life. Now that we reserved this memory (on the GPU), ComputeShaders can mess around in it.

Fixed one million particles ?!

Now you may wonder, what if I only need a few particles? The amount of particles we need changes all the time, right? True, but we'll be rendering a million particles every time anyway. The big array is fragmented in several smaller “subranges”, which can be reserved by particle-emitters (clouds, fire, sparks, plasma piss, …):

Particle Generators

A particle generator typically is made of a position & direction (matrix), a bunch of settings like launch-force or colours, and of course what shaders it uses for rendering and evaluating its particles. Each generator will get a VBO subrange (if still available), depending on how much particles it may need on its peak. Each generator will also use a ComputeShader, which will be executed over that subrange. These ComputeShaders will evaluate the "particle structs", by reading and writing back into the VBO. Typically they will move the positions, grow or shrink sizes, adjust velocities based on physics/collisions, and so on.

Particles that are unassigned, or are temporarily invisible in case its generator doesn't want to render it, notice its "State" property in the code above. Inactive particles will exit their code early, and being rendered zero sized, and/or somewhere far out of sight.

Note that more complex effects such as that flaming spicy Thai-food fart above (the mushroom cloud I mean), may require multiple generators, in case the sprite textures and/or motion ComputeShaders have significant differences. For example, one generator would do the ring, another generator the explosion itself, and yet another generator the darker debris clouds on the bottom. Eventually you could fade out generators on different distances. The big stuff gets rendered from larger distances, the smaller details won’t be rendered until the camera is close. And of course, don’t forgot to release your VBO portion when the generator is done.

ParticleLife: Born to get Killed

The (GLSL) ComputeShader does three things globally: It spawns, evaluates, and kills. The ComputeShader is executed for every particle, whether it’s active or not. Particles that are already active, will continue updating their positions and other attributes, using (crude) physics or some other weird movement pattern you programmed. It will also countdown its "life" value. When it reaches zero, the particle comes in a "dead" or "inactive" state, from where it can be reincarnated again. And yeah, usually the colour fades out prior to that. Some particles may also get killed based on collisions by the way. Rain for example can be recycled as soon as it reaches below the floor.

While dead, we'll check if we should re-enable it. This is a bit tricky, as generators may increase or decrease their "max particle number" dynamically. Think about a fountain that stops squirting... ok. One way is to simply check if your particle.ID (see structure above) is higher than the max-allowed particles of the generator at that time. If so, do not recycle, keep dead. If not, you're free to go again.

Yet, keeping a continuous flow can still be tricky; if all particles are fired at once, simultaneously, nothing will be spawned in the meanwhile. What you need is some (random) time-offsets before (re)activating. If particles can live up to 5 seconds, the initial delay should probably be a random number between 0 and 5 seconds.

Local weather... very local.

When spawned, you probably want to set some particle data, like its initial position, colour, size, and velocity/direction. The parent Particle Generator should give us a “breeding volume”, for example a box or sphere that moves and rotates along the generator matrix. Particles will get a random position somewhere within that volume. Depending on what motion will follow, you may also want to define a “velocity cone”, the range within launch-vectors can be randomized.

Random

All in all, there is an important portion of "random" here, but shaders can't do such a thing – shaders don’t have a random function (although you can make one, given some continuous varying input). That's why we gave each particle a random seed number (state.z). Together with the particle ID and maybe generator position or an overall timer value, you should be able to randomize decently.

float random(vec2 coordinate, float seed)

{

return fract(sin(dot(coordinate*seed, vec2(12.9898, 78.233)))*43758.5453);

}

// Generate a bunch of random numbers (0..1), based on particle ID,

// particle Random input, and elapsed time

float pRand1 = random( vec2(part.id, part.rand), pElapsed );

float pRand2 = random( vec2(pElapsed , part.id), part.rand );

float pRand3 = random( vec2(part.rand , pElapsed), part.id );

ParticleMotion

Once spawned, the same ComputeShader has to update positions and such. It can be as simple as this:

uint particleIndex = gl_GlobalInvocationID.x + offsetIndex;

// Read particle from VBO

particle = VBO[particleIndex];

// Drop down

particle.pos.y -= gravity * deltaTime;

// Write particle back into VBO

VBO[particleIndex] = particle;

But that's a bit boring, right? As mentioned before, it really depends on what kind of effect we're trying to achieve here. This would be slightly more interesting:

particle.life += deltaTime

// Eliminate velocity after 1.5 seconds, and drop down

particle.velocity = mix( particle.velocity,

vec3(0,-9.8,0),

min( particle.life, 1.5f ) );

particle.pos += particle.velocity * deltaTime;

Well, you have to play with this yourself. Same stuff to evaluate colour, size, rotation, typically based on the lifetime. And don't forget you could include lookup textures to colorize and such. Last but not least, make sure you make some standard functions so you can quickly compose another shader -or even better, compose them with a visual editor like the Big Boys do.

No, snow doesn't drop straight down.

ParticleCollision:

So your particles can move. Next step would be to keep them from falling through walls and such. Would be nice if rain splatters on the ground, right? Now I have little experience in this area, so forgive me if I forget smart tricks. But the major problem is that your GPU can't access your fancy (RAM) BSP trees, Octrees, Newton physics or whatever it is you use to store geometry and determine collisions.

But there are a few workarounds though. Imperfect, but still. First, in Engine22 each sector(room, hallway, ...). One of them is to use your Depth-Buffer. Most likely you have one somewhere lying around, for other techniques, such as SSAO or something.

If particle depth becomes larger than what your screen depth-texture indicates, it *may* have intersected the floor, walls, or any other object. I say *may*, because it could also be behind it. You can include your normal-buffer (if you have a gBuffer approach), or compare positions to calculate the plausability of a collision. Besides depth, I also store the world-positions in my gBuffer. So I can calculte the distance between a particle and the pixel that is occluding it in the foreground. Small distance? Likely a collision. Bigger distance? Particle is likely behind.

// Check if we collided with the environment based on depth & normal gBuffer

// Convert particle (world) pos to screen UV – if its on the screen at all!

vec4 screenUV = (viewProjMatrix * vec4( partPos.xyz, 1.f ));

screenUV.xy = (screenUV.xy + screenUV.ww) * 0.5f;

screenUV.xy /= screenUV.ww;

if ( screenUV.x > -0.0001f && screenUV.x < 1.0001f &&

screenUV.y > -0.0001f && screenUV.y < 1.0001f )

{

vec4 gPos = texture( gBufferDepth, screenUV.xy );

// Compare depth. Pixel in foreground?

if ( partDistToCam + partSize*0.5 >= gPos.w ) {

// Possible intersection. Check dist between part and surf

float distToSurf = length( gPos.xyz - partPos.xyz );

if ( distToSurf < 0.3 ) {

// Impact. Now do ... something

#ifdef _BOUNCE

#macro _PARTICLES_BOUNCE

#endif

#ifdef _SPLAT

#macro _PARTICLES_Splat

#endif

}

} // OnScreen

Sure, it's inaccurate, but it may just do the trick. Remember particles in most games you have seen are likely pretty inaccurate. Every noticed it? No? Fine then.

Now next step, how the fuck do we keep that darn snow outside? Other than just shaping the generators properly, I didn't found out that yet. Adjusting the generator shapes works fine in a small game like Tower22, but in a huge mixed in/outdoor world (think about a city), you want a more dynamic solution.

Depth Sorting & Fill-rate issues

Well, as usual this post got much longer than I intended again, so let me finish with a more complete ComputeShader, as an example. But before leaving, keep in mind you can also use another Computeshaders to sort your VBO (per subrange) by comparing distances and swapping array elements. I never really needed proper sorting so far though (when using Additive or Pre-Multiplied alpha-blending), but just keep it in mind.

I managed to update hundred thousands particles without sweat. A bigger problem remains the fill-rate. Many tiny particles, like those snowflakes, won’t be a problem. But to form thicker clouds, you usually need larger particles to make a consistent “whole”. Drawing many overlapping layers can still ruin your party. One idea might be to accumulate them in a (downsized) buffer, using very simple fragment shaders only. Then when upscaling, apply the more advanced tricks like lighting & texturing. It’s a somewhat specific trick that won’t work for every type of particle though, but here, read it:

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch23.html

struct Particle {

vec4 pos; // Pos Size

vec4 state; // X: Life Y: State Z: Random 0..1 W: ID

vec4 velocity; // Velocity Rotation

vec4 color; // Color Opacity

}; // Particle

layout( std140, binding=0 ) buffer VBO {

Particle particles[];

} outBuffer;

layout( local_size_x = 4, local_size_y = 1, local_size_z = 1 ) in;

// Generator Variables

uniform vec4 pTiming;

uniform vec4 pGenerator;

uniform vec4 pSpawnVolume;

uniform vec4 pLaunchVector1;

uniform vec4 pLaunchVector2;

uniform mat4 pMatrix;

float random(vec2 coordinate, float seed)

{

return fract(sin(dot(coordinate*seed, vec2(12.9898, 78.233)))*43758.5453);

}

const int STATE_DEAD = 0;

const int STATE_ALIVE = 1000;

void main() {

//-----------------------------------------------------------------------

uint offsetIndex = uint(pGenerator.z);

uint maxParticles = uint(pGenerator.w);

uint globID = gl_GlobalInvocationID.x + offsetIndex;

// Fetch particle from VBO, decode its data

Particle part = outBuffer.particles[ globID ];

vec3 partPos = part.pos.xyz;

float partSize = part.pos.w;

float partLife = part.state.x;

int partState = int( part.state.y );

float partRand = part.state.z;

vec3 partVeloc = part.velocity.xyz;

vec3 partColor = part.color.xyz;

float pSpawnDelay = pTiming.x;

float pElapsed = pTiming.z;

float pLifeTime = pGenerator.x;

float pGravity = pGenerator.y;

vec2 pSize = pDimensions.zw;

float pAirFriction = pLaunchVector2.w;

// Generate a bunch of randoms

float pRand1 = random( vec2(pTiming.y, partRand), pElapsed );

float pRand2 = random( vec2(pElapsed ,pTiming.y), partRand );

float pRand3 = random( vec2(partRand , pElapsed), pTiming.y );

float deltaSecs = pTiming.w;

//-----------------------------------------------------------------------

// Age

if ( gl_GlobalInvocationID.x < maxParticles )

partLife += deltaSecs;

if ( partState >= STATE_ALIVE-1 ) {

// Apply Gravity & Launch Velocity

#ifdef _GRAVITY

partVeloc.y += pGravity * deltaSecs;

#endif

partVeloc.xz *= vec2( pAirFriction ); // reduce velocity over time

partPos += partVeloc * deltaSecs;

// Check if we aged.

// Ifso, kill particle, gen a random respawn delay time

if ( partLife > pLifeTime ) {

partState = STATE_DEAD;

partLife = -pRand3 * pSpawnDelay;

}

} else {

// Check if we should (re)spawn this DEATH particle

if ( partLife >= 0 && gl_GlobalInvocationID.x < maxParticles ) {

// Spawn

// Generate (random) data

partLife = 0;

partState = STATE_ALIVE;

partSize = mix( pSize.x, pSize.y, pRand1 );

// Generate a (local) random position & launch direction

partPos.x = (-1 + pRand1 * 2) * pSpawnVolume.x;

partPos.y = (-1 + pRand2 * 2) * pSpawnVolume.y;

partPos.z = (-1 + pRand3 * 2) * pSpawnVolume.z;

partVeloc.x = mix( pLaunchVector1.x, pLaunchVector2.x, pRand4 );

partVeloc.y = mix( pLaunchVector1.y, pLaunchVector2.y, pRand5 );

partVeloc.z = mix( pLaunchVector1.z, pLaunchVector2.z, pRand6 );

// Local to World coordinates

partPos.xyz = (pMatrix * vec4(partPos,1.f)).xyz;

partVeloc.xyz = mat3(pMatrix) * partVeloc;

} else {

// For now, Hide it somewhere far away, in another galaxy

partPos = vec3(-9999,-9999,-9999);

}

//-----------------------------------------------------------------------

// Write results back into VBO

part.pos = vec4( partPos, partSize );

part.state = vec4( partLife, float( partState ), partRand, globID );

part.velocity = vec4( partVeloc, partRotat );

part.color = vec4( partColor, partAlpha );

outBuffer.particles[ globID ] = part;

} // end

Saturday, March 4, 2017

Party Particles #2- The Implementation

VBO Initialisation

Fixed one million particles ?!

Particle Generators

ParticleLife: Born to get Killed

ParticleMotion

ParticleCollision:

Depth Sorting & Fill-rate issues

No comments:

Post a Comment

About me

The Archives