Iteration
#4:?
Having
refreshed my mind about implementing particles in the previous post, how about
version 4? I don't have spectacular plans, so I'll just try to blend the best
parts together. Option #1 (particles evaluated by the CPU) is a no-go
obviously, Transform Feedback was a bit confusing to setup as far as my Korsakoff
brain can remember, and OpenCL is currently not in (and probably won't be)
Engine22. But GLSL ComputeShaders are...
Being
all OpenGL and stuff, usage of GLSL ComputeShaders is much easier than having
to implement OpenCL.
Though I found them wonky. They worked pretty well on my dev-laptop, but terribly, and I mean really terribly slow on other (I admit, somewhat older)
stations. For instance, I rewinded Tiled Deferred Rendering back to old
fashioned Deferred Lighting, and gained speed :&
But as
said, the major problem in previous iterations wasn't speed really. It was
flexibility. If I want my particles to dance the Tango, then I should be able
to program that behaviour, somehow. Somehow easily. And even though it sounds
cheap, I think just having full-code access might be the easiest option after
all. Which simply boils down to writing a bunch of ComputeShaders, completely
by hand, and adding a few variables to tweak minor behaviour adjustments, such
as speeds, colours or directions. Then pick whatever shader suits your needs
best, or eventually code a new one if none does.
For
artists, this kinda sucks though. Assuming they don't know / don't want to code
GLSL, they are tied to whatever I bring them. No fancy editor like I had in
Iteration #3 (or Node-based stuff like Unreal or Unity has). Now sure I can
still add such a feature anytime, but to start quickly, I just give it a good
'ol handjob. Erh, right.
This is how all materials are edited in Engine22: you chose an "UberShader", then toggle options on/off. Certainly not as flexible (and less fun) as Node based systems. Then again, it works quick & easy for the majority of materials I've met so far.
VBO Initialisation
Into the
details gentlemen. So a particle in Engine22 is essentially this:
eGX_Particle = record
pos :
eVec4; //
Pos Size
state :
eVec4; //
X: Life Y: State Z: Random 0..1 W: ID
velocity : eVec4; // Velocity
Rotation
color :
eVec4; //
Color Opacity
end; // eGX_Particle 64 bytes
Yes,
that's Delphi code. I need it to initialize things, but basically the GPU takes
over from there, so I have a similar structure for GLSL. And since particles
usually jump on you in big groups, I actually initialize an mega array of
1.048.576 particles (= 64 MB). Why that number? Because it matches a 1024x1024
texture. But more about that later.
There is
one thing the CPU has to do when making this 1 million particle VBO: Give each
particle an ID (0,1,2,3,...), and also give it a random value (state.z in this
case). It will come in handy later on. Anyhow, making this VBO could look
something like this:
procedure eGX_ParticleManager.initializeVBO();
var particles : array
of eGX_Particle;
i :
eInt;
begin
//
Make one big fat motherfucker of a particle array
//
(note it can handle multiple emitters, not just 1)
setLength(
particles, EGX_MAX_PARTICLES );
for i:=0 to EGX_MAX_PARTICLES-1 do
begin
particles[i].pos.w := 0;
particles[i].state.x := 0;
particles[i].state.y := 0;
particles[i].state.z := mRandomF( 0, 1 );
particles[i].state.w := i;
end;
self.vbo
:= eglBuffers.getVBO( EGL_VERTEXLAYOUT_PXNW );
self.vbo.setDataPoints(
@particles[0].pos.x,
EGX_MAX_PARTICLES,
4, true );
//
Got our stuff on the GPU now, remove the RAM copy
setLength(
particles, 0 );
end; // initializeVBO
...
procedure eGL_VBO.setDataPoints(
vertexArrayPtr : pointer; //
points to an array of particle records
const vertCount : eInt; // 1 million something
const attribCount : eInt; // 4, cause 4 vectors per point (particle)
const dynamicContent: eBool // yeah, pretty much
);
begin
self.makeVBO(
vertexArrayPtr, vertCount, attribCount,
GL_POINTS,
dynamicContent );
end; // VBO setDataPoints
procedure eGL_VBO.makeVBO( vertices : pointer;
const vertCount, attribCount : eInt;
const glMode : TGLEnum;
const dynamicContent : eBool );
begin
self.attribCount:=
attribCount;
self.vertCount := vertCount;
self.stride := self.attribCount * 16;
self.glMode := glMode;
self.useIndices
:= false;
glGenBuffers(
1, @self.glName );
glBindBuffer(
GL_ARRAY_BUFFER, self.glName );
if (
dynamicContent ) then
glBufferData(
GL_ARRAY_BUFFER, self.vertCount * self.stride,
vertices,
GL_STATIC_DRAW )
else
glBufferData(
GL_ARRAY_BUFFER, self.vertCount * self.stride,
vertices,
GL_DYNAMIC_DRAW );
glBindBuffer(
GL_ARRAY_BUFFER, 0 );
end; // makeVBO
So, we ended
up having 1 big VBO, made of a million vertices/points/particles, whatever you
like to call them. Each particle made out of 4 vectors storing particles
properties that need to be evaluated over its short life. Now that we reserved
this memory (on the GPU), ComputeShaders can mess around in it.
Fixed
one million particles ?!
Now you
may wonder, what if I only need a few particles? The amount of particles we
need changes all the time, right? True, but we'll be rendering a million
particles every time anyway. The big array is fragmented in several smaller “subranges”,
which can be reserved by particle-emitters (clouds, fire, sparks, plasma piss, …):
Particle Generators
A particle generator typically is made of a position & direction (matrix), a bunch of settings like launch-force or colours, and of course what shaders it uses for rendering and evaluating its particles. Each generator will get a VBO subrange (if still available), depending on
how much particles it may need on its peak. Each generator will also use a ComputeShader, which will be executed over that subrange. These
ComputeShaders will evaluate the "particle structs", by reading and
writing back into the VBO. Typically they will move the positions, grow or shrink
sizes, adjust velocities based on physics/collisions, and so on.
Particles that are unassigned, or are temporarily invisible in case its
generator doesn't want to render it, notice its "State" property in
the code above. Inactive particles will exit their code early, and being
rendered zero sized, and/or somewhere far out of sight.
Note that more complex effects such as that flaming spicy Thai-food fart
above (the mushroom cloud I mean), may require multiple generators, in case the
sprite textures and/or motion ComputeShaders have significant differences. For
example, one generator would do the ring, another generator the explosion
itself, and yet another generator the darker debris clouds on the bottom.
Eventually you could fade out generators on different distances. The big stuff
gets rendered from larger distances, the smaller details won’t be rendered
until the camera is close. And of course, don’t forgot to release your VBO portion when the generator
is done.
ParticleLife: Born to get Killed
The
(GLSL) ComputeShader does three things globally: It spawns, evaluates, and kills.
The ComputeShader is executed for every particle, whether it’s active or not.
Particles that are already active, will continue updating their positions and
other attributes, using (crude) physics or some other weird movement pattern
you programmed. It will also countdown its "life" value. When it
reaches zero, the particle comes in a "dead" or "inactive"
state, from where it can be reincarnated again. And yeah, usually the colour
fades out prior to that. Some particles may also get killed based on collisions
by the way. Rain for example can be recycled as soon as it reaches below the
floor.
While
dead, we'll check if we should re-enable it. This is a bit tricky, as
generators may increase or decrease their "max particle number"
dynamically. Think about a fountain that stops squirting... ok. One way is to
simply check if your particle.ID (see structure above) is higher than the
max-allowed particles of the generator at that time. If so, do not recycle,
keep dead. If not, you're free to go again.
Yet,
keeping a continuous flow can still be tricky; if all particles are fired at
once, simultaneously, nothing will be spawned in the meanwhile. What you need
is some (random) time-offsets before (re)activating. If particles can live up
to 5 seconds, the initial delay should probably be a random number between 0
and 5 seconds.
Local weather... very local.
When
spawned, you probably want to set some particle data, like its initial
position, colour, size, and velocity/direction. The parent Particle Generator
should give us a “breeding volume”, for example a box or sphere that moves and
rotates along the generator matrix. Particles will get a random position
somewhere within that volume. Depending on what motion will follow, you may
also want to define a “velocity cone”, the range within launch-vectors can be
randomized.
Random
All in
all, there is an important portion of "random" here, but shaders
can't do such a thing – shaders don’t have a random function (although you can
make one, given some continuous varying input). That's why we gave each
particle a random seed number (state.z). Together with the particle ID and
maybe generator position or an overall timer value, you should be able to
randomize decently.
float random(vec2 coordinate, float
seed)
{
return
fract(sin(dot(coordinate*seed, vec2(12.9898, 78.233)))*43758.5453);
}
//
Generate a bunch of random numbers (0..1), based on particle ID,
//
particle Random input, and elapsed time
float pRand1 = random( vec2(part.id,
part.rand), pElapsed );
float pRand2 = random( vec2(pElapsed , part.id), part.rand );
float pRand3 = random( vec2(part.rand
, pElapsed), part.id );
ParticleMotion
Once
spawned, the same ComputeShader has to update positions and such. It can be as
simple as this:
uint particleIndex =
gl_GlobalInvocationID.x + offsetIndex;
// Read particle from VBO
particle =
VBO[particleIndex];
// Drop down
particle.pos.y
-= gravity * deltaTime;
// Write particle back into VBO
VBO[particleIndex]
= particle;
But
that's a bit boring, right? As mentioned before, it really depends on what kind
of effect we're trying to achieve here. This would be slightly more
interesting:
particle.life
+= deltaTime
// Eliminate velocity after 1.5 seconds, and drop down
particle.velocity
= mix( particle.velocity,
vec3(0,-9.8,0),
min( particle.life, 1.5f ) );
particle.pos
+= particle.velocity * deltaTime;
Well,
you have to play with this yourself. Same stuff to evaluate colour, size,
rotation, typically based on the lifetime. And don't forget you could include
lookup textures to colorize and such. Last but not least, make sure you make
some standard functions so you can quickly compose another shader -or even
better, compose them with a visual editor like the Big Boys do.
No, snow doesn't drop straight down.
ParticleCollision:
So your
particles can move. Next step would be to keep them from falling through walls
and such. Would be nice if rain splatters on the ground, right? Now I have
little experience in this area, so forgive me if I forget smart tricks. But the
major problem is that your GPU can't access your fancy (RAM) BSP trees,
Octrees, Newton physics or whatever it is you use to store geometry and
determine collisions.
But
there are a few workarounds though. Imperfect, but still. First, in Engine22
each sector(room, hallway, ...). One of them is to use your Depth-Buffer. Most
likely you have one somewhere lying around, for other techniques, such as SSAO
or something.
If
particle depth becomes larger than what your screen depth-texture indicates, it
*may* have intersected the floor, walls, or any other object. I say *may*,
because it could also be behind it. You can include your normal-buffer (if you
have a gBuffer approach), or compare positions to calculate the plausability of
a collision. Besides depth, I also store the world-positions in my gBuffer. So
I can calculte the distance between a particle and the pixel that is occluding
it in the foreground. Small distance? Likely a collision. Bigger distance?
Particle is likely behind.
//
Check if we collided with the environment based on depth & normal gBuffer
//
Convert particle (world) pos to screen UV – if its on the screen at all!
vec4 screenUV = (viewProjMatrix * vec4(
partPos.xyz, 1.f ));
screenUV.xy = (screenUV.xy + screenUV.ww) * 0.5f;
screenUV.xy
/= screenUV.ww;
if ( screenUV.x > -0.0001f &&
screenUV.x < 1.0001f
&&
screenUV.y > -0.0001f &&
screenUV.y < 1.0001f )
{
vec4 gPos = texture(
gBufferDepth, screenUV.xy );
// Compare depth. Pixel in foreground?
if ( partDistToCam + partSize*0.5 >=
gPos.w ) {
// Possible intersection. Check dist between part and
surf
float distToSurf = length( gPos.xyz -
partPos.xyz );
if ( distToSurf < 0.3 ) {
// Impact. Now do ... something
#ifdef _BOUNCE
#macro _PARTICLES_BOUNCE
#endif
#ifdef _SPLAT
#macro _PARTICLES_Splat
#endif
}
}
} // OnScreen
Sure,
it's inaccurate, but it may just do the trick. Remember particles in most games
you have seen are likely pretty inaccurate. Every noticed it? No? Fine then.
Now next step, how the fuck do we keep that darn snow outside? Other than just shaping the generators properly, I didn't found out that yet. Adjusting the generator shapes works fine in a small game like Tower22, but in a huge mixed in/outdoor world (think about a city), you want a more dynamic solution.
Depth Sorting & Fill-rate issues
Well, as
usual this post got much longer than I intended again, so let me finish with a
more complete ComputeShader, as an example. But before leaving, keep in mind
you can also use another Computeshaders to sort your VBO (per subrange) by comparing
distances and swapping array elements. I never really needed proper sorting so
far though (when using Additive or Pre-Multiplied alpha-blending), but just
keep it in mind.
I
managed to update hundred thousands particles without sweat. A bigger problem
remains the fill-rate. Many tiny particles, like those snowflakes, won’t be a
problem. But to form thicker clouds, you usually need larger particles to make
a consistent “whole”. Drawing many overlapping layers can still ruin your
party. One idea might be to accumulate them in a (downsized) buffer, using very
simple fragment shaders only. Then when upscaling, apply the more advanced
tricks like lighting & texturing. It’s a somewhat specific trick that won’t
work for every type of particle though, but here, read it:
struct Particle {
vec4 pos; // Pos Size
vec4 state; // X: Life Y:
State Z: Random 0..1 W: ID
vec4 velocity; // Velocity
Rotation
vec4 color; // Color
Opacity
}; // Particle
layout( std140, binding=0 ) buffer
VBO {
Particle
particles[];
} outBuffer;
layout( local_size_x = 4, local_size_y = 1, local_size_z = 1 ) in;
// Generator Variables
uniform vec4 pTiming;
uniform vec4 pGenerator;
uniform vec4 pSpawnVolume;
uniform vec4 pLaunchVector1;
uniform vec4 pLaunchVector2;
uniform mat4 pMatrix;
float random(vec2 coordinate, float
seed)
{
return fract(sin(dot(coordinate*seed,
vec2(12.9898, 78.233)))*43758.5453);
}
const
int STATE_DEAD = 0;
const
int STATE_ALIVE =
1000;
void main() {
//-----------------------------------------------------------------------
uint
offsetIndex = uint(pGenerator.z);
uint
maxParticles = uint(pGenerator.w);
uint
globID =
gl_GlobalInvocationID.x + offsetIndex;
// Fetch particle from VBO, decode its
data
Particle
part = outBuffer.particles[ globID
];
vec3 partPos =
part.pos.xyz;
float
partSize = part.pos.w;
float
partLife = part.state.x;
int partState =
int( part.state.y );
float
partRand = part.state.z;
vec3 partVeloc =
part.velocity.xyz;
vec3 partColor =
part.color.xyz;
float pSpawnDelay =
pTiming.x;
float pElapsed = pTiming.z;
float pLifeTime = pGenerator.x;
float pGravity = pGenerator.y;
vec2 pSize = pDimensions.zw;
float pAirFriction = pLaunchVector2.w;
// Generate a bunch of randoms
float pRand1 = random( vec2(pTiming.y, partRand), pElapsed
);
float pRand2 = random( vec2(pElapsed ,pTiming.y), partRand
);
float pRand3 = random( vec2(partRand , pElapsed), pTiming.y );
float deltaSecs = pTiming.w;
//-----------------------------------------------------------------------
// Age
if (
gl_GlobalInvocationID.x < maxParticles )
partLife
+= deltaSecs;
if (
partState >= STATE_ALIVE-1 ) {
// Apply Gravity & Launch Velocity
#ifdef
_GRAVITY
partVeloc.y += pGravity * deltaSecs;
#endif
partVeloc.xz
*= vec2( pAirFriction ); // reduce velocity over time
partPos += partVeloc * deltaSecs;
//
Check if we aged.
//
Ifso, kill particle, gen a random respawn delay time
if ( partLife > pLifeTime ) {
partState = STATE_DEAD;
partLife =
-pRand3 * pSpawnDelay;
}
} else {
// Check if we should (re)spawn this DEATH particle
if (
partLife >= 0 && gl_GlobalInvocationID.x < maxParticles ) {
//
Spawn
//
Generate (random) data
partLife
= 0;
partState = STATE_ALIVE;
partSize = mix(
pSize.x, pSize.y, pRand1 );
// Generate a (local) random position & launch
direction
partPos.x = (-1 + pRand1 * 2) * pSpawnVolume.x;
partPos.y = (-1 + pRand2 * 2) * pSpawnVolume.y;
partPos.z = (-1 + pRand3 * 2) * pSpawnVolume.z;
partVeloc.x = mix(
pLaunchVector1.x, pLaunchVector2.x, pRand4 );
partVeloc.y = mix(
pLaunchVector1.y, pLaunchVector2.y, pRand5 );
partVeloc.z = mix(
pLaunchVector1.z, pLaunchVector2.z, pRand6 );
// Local to World coordinates
partPos.xyz = (pMatrix * vec4(partPos,1.f)).xyz;
partVeloc.xyz = mat3(pMatrix)
* partVeloc;
} else {
// For now, Hide it somewhere far away, in another galaxy
partPos = vec3(-9999,-9999,-9999);
}
}
//-----------------------------------------------------------------------
// Write results back into VBO
part.pos = vec4(
partPos, partSize );
part.state = vec4(
partLife, float( partState ), partRand, globID );
part.velocity = vec4(
partVeloc, partRotat );
part.color = vec4(
partColor, partAlpha );
outBuffer.particles[
globID ] = part;
} // end
No comments:
Post a Comment