Thursday, May 31, 2012

Making of Radar demo #8: Morphing Animations

As promised, a Blog update about monster-animations without the usual delay. Ready for the European Football tournament btw? I'm ready, at least for drinking beer in the pub while the match is playing on a screen somewhere behind me. Let's hope Portugal, Germany or Denmark doesn't end my good excuse to visit the pub in the middle of the week abruptly.


Right. The "RadarBlob" monster wasn't supposed to be animated at first. Simple, lack of time. I still need to upgrade the entire skeleton-animation system. Support for other files (now it's only Milkshape...), additive blending, making good use of modern GPU techniques, ragdolls, et cetera. Another reason for skipping animation was the lack of a good animator. Which is also the reason I still haven't implemented a renewed system. First I want a human player or monster with good animations. Sorry, but I can't code blind or on "dummies", need real test-subjects!

But... while looking at the static, "frozen", monster, I wondered how the heck we could finish that demo movie a little bit spectacular. The model, textures and shading had been improved, but other than that it was as interesting as a vase. It would look even more ridiculous if the sound would be playing dangerous music, angry monster digesting sounds and steam blowers while nothing would really happen visually. No, we needed movement, even if it was something simple. But how to do that fast & easy? The answer: Morphing Animations.


Teenage Morphing hero Blobs
------------------------------
Morphing. It sounds like a technique the Power Rangers would use, but in 3D terminology it means as much as changing shape-A into shape-B. It's pretty simple, and as well an ancient technique. Quake1 already used morphing animations for its monsters and soldiers. How it works? Imagine a ball made of 100 vertices. New 3 seconds later, imagine the same sphere, but squeezed. The 100 vertices moved to another place in order to give a "squeezed" appearance to the same ball. The initial pose and the squeezed pose 3 seconds later can be called 2 "Keyframes". If we store those keyframes into the computer memory (thus storing all 100 vertex-positions per keyframe), we can interpolate the vertex positions between those 2 keyframes over the timeline. Useful, so we don't have to store hundreds or thousands of frames.
for each vertex
.....currentVertexPosition = lerp( frame1VertexPos, frame2VertexPos, frameDelta );
* frameDelta = a value between 0 and 1. 0.25 means we're at 25% towards frame2


The math is pretty simple, and also the file-formats are straight forward. Just store the vertex-positions (and normals) at certain keyframes. Another important note, Morphing animations are very flexible, making it suitable for organic abstract shapes like our RadarBlob. Unlike Skeleton animations where the vertices are bound to a bone, we can place the vertices everywhere we like. You can change a humanoid into the Hulk, or a cube. Just as long the vertex-count and their polygon-relation stays the same.

Yet Morphing animations aren't that common anymore. They have some serious issues. First, in the old Quake times, models were much simpler. A relative low vertex-count (a few hundred or so), and just a few, relative simple, animations. These days our monsters have much higher polycounts, and more + longer animations. The CPU would have to loop through much bigger vertexlists to interpolate all positions, and also the memory would get a hit to store it all. For example, our RadarBlob has ~8.000 vertices. In optimized form with indices, it has ~3.100. It would mean the CPU has to update 3100 vertices, for each monster, each frame. And storing a single keyframe would cost at least 3100 x 12(bytes) = 36 kb. In practice it doubles, as you may also want to store normals.


Why Skeletor is more powerful
------------------------------
It's not that a modern CPU wouldn't be able to deal with these numbers. Hey, don't forget the hardware also grew in numbers since the Quake era. Yet it feels wrong to do it this "brute force" way. And it is wrong, I'll show you how the GPU can help down below. But last but not least, another good reason why Skeleton animations took over are the static restrictions. You can calculate the vertex-positions at the fly, but more likely you'll read them from an animation file (like the good old MD2 files). The animations here are "fixed", you can't just alter specific body parts during the animation. For example, having the upper-body or head/eyes follow a dynamic target gets difficult. Ragdoll animations, which is based on fully dynamic behavior, calculated by collision volumes falling on the ground, are nearly impossible in combination with Morphing animations. You can use Verlet or Cloth physics to alter the vertex-positions, but it will make the character fall like a combination between pudding & a deflated sexdoll.

Skeleton animations do not store vertex positions or anything. Instead, it stores a "skeleton", a bunch of joints and their relations ("bones"). Vertices on their turn will be assigned to one or more bones. Your left hand for example would be assigned to the "Left-wrist" bone. Fingers on the same left hand are assigned to sub-bones. If the wrist rotates or moves, all sub-bones rotate and move along with it, and so will the assigned vertices do. Yes, in essence this still means we have to recalculate all vertex-positions individually by multiplying their positions with their host-bone matrices. But skeletons have three major strengths over Morphing:
1- You only have to store the joint-matrices(or quaternion’s) per keyframe. An average humanoid game-skeleton only has 30 to 50 joints or so. Saves a lot of RAM, and makes the files smaller.
2- You can dynamically alter a single, or multiple bones, and all child-bones + their vertices will nicely follow. Very useful for aiming, looking at, ragdoll physics, IK, or other dynamic behavior that can't be stored in pre-calculated animation files.
3- You can easily combine animations. The legs run while the upper-body shoots bazooka's, while the face talks shit.


Morphing, the Revival
------------------------------
Ok, we know now why we shouldn't use Morphing, but yet we did. Look, if you make use of modern techniques, Morphing can still be a faster solution than skeletons, it's easier to implement, and it still suits better with organic shapes. I mean, how the hell would you make a suitable skeleton for this abomination?
I tried, but no…

We didn't need bullet-Time Trinity animations, just a disgusting sack of hydraulic blubber breathing a bit. So, Morphing would be a fine choice sir, the waiter said. But how to get it a bit fast? First we would need to get rid of the CPU. I'm not a fan of moving *everything* to the GPU just to say "Got a 100% GPU solution here!", those Quad-cores need to move their lazy asses as well. But it's just a fact that GPU's are much faster when it comes to process big array that require vector math. Updating vertices on the CPU would be a disaster anyway, as it would prevent you from using Vertex-Buffer-Objects, unless you stream back the updated vertices each cycle. No go.

The VBO just contains the monster in its original pose. When we render it, the Vertex-Shader will do the position-interpolation math, instead of a CPU. The math is simple, just "lerp" the vertex between the current and the next-frame vertex-position, then proceed as usual.
 // Get the vertex positions for the current and next frame 
 float3 frame1Pos= tex2D( positionTex, frame1TX ).xyz;
 float3 frame2Pos= tex2D( positionTex, frame2TX ).xyz;
 // Interpolate
 float3 vPos = lerp( frame1Pos, frame2Pos, frameDelta );
 // Output
 out.vPos = mul( modelViewProjMatrix, float4( vPos, 1.f ) );
But... how does the Vertex-Shader know what the current and next frame positions are? Easy Does It, we use a texture. This (16 bit floating point) texture contains ALL vertex-positions for ALL keyframes. That's sounds like a whole lot, but don't forget a whole lot pixels fit in a 2D image. A single RGB pixel can hold a XYZ position (in local space), so do the math:
* RadarBlob: 3100 vertices
* 256 x 256 2D texture = 65.536 pixels
* 65.536 / 3100 = 21

In other words, a 256 x 256 image would be able to store 21 keyframes for this particular model. When using a 512x512 texture the number quadruples, and don't forget you could use a different image for each animation eventually. Anyway, the Vertex-Shader has to fetch 2 of those pixels for each vertex. You can fill the image anyway you want, but I just filled it the optimal way. Each vertex gets an unique ID, a number between 0 and 3100 in this RadarBlob-case. This ID is stored along with the vertex in the VBO. In my case I stored it in the Texture coordinate.z. So the texture-lookup index of a vertex could be calculated as follow:
uniform int   frameNumber. // Current keyFrame index (0..x)
uniform float frameDelta // Current position between current and next frame (0..1)  
 
// index = Frame offset  +  vertex offset within frame
// index = 3100 * frameNumber + vertexID
int frame1Index = modelVertexCount * frameNumber +  in.vertexTexcoord.z;
int frame2Index = modelVertexCount * (frameNumber+1) +  in.vertexTexcoord.z;

// Change the 1D lookup index to a 2D texture coordinate, for a 256 x 256 pixel image
float2 frame1TX = float2( frame1Index % 256, floor(frame1Index / 256) );
float2 frame2TX = float2( frame2Index % 256, floor(frame2Index / 256) );
 
// Add half a texel to access the center of a pixel in the texture.
// You also may want to turn of linear-filtering for the 256x256 texture btw
const float2 HALFTEX  = float2( 0.5f / 256.f, 0.5f / 256.f );
 frame1TX += HALFTEX;
 frame2TX += HALFTEX;
 
// Get the vertex positions for the current and next frame 
float3 frame1Pos= tex2D( positionTex, frame1TX ).xyz;
float3 frame2Pos= tex2D( positionTex, frame2TX ).xyz;

// Interpolate
float3 vPos = lerp( frame1Pos, frame2Pos, frameDelta );
// Eventually you can also lerp between the original pose if you like dynamic control
// on the animation "influence"
 vPos = lerp( in.originalVpos.xyz, vPos, animationInfluence );
 
// Output
 out.vPos = mul( modelViewProjMatrix, float4( vPos, 1.f ) );

That's pretty much it. Feed the Vertex-Shader a texture that contains all animated positions, and you're good to go. Uhmmmm... how to get those textures? I quickly made a little program that imports a sequence OBJ files. Then it would just loop through all vertices and store it in an array that would be suitable to build a OpenGL texture with later on:
for each OBJfile
.....for each vertex in OBJfile
..........array[index++] = vertex.xyz

Sounds easy, and it is pretty easy, yet I'll have to WARN about a few things:
* Make sure all OBJ files have the exact same vertex-count and order. If one file has a different storage, your animation will turn into a polygon massacre.
* In case you want to smooth / share vertices to make use of indices, do it before storing them into this texture. The numbering and order must match with the model VBO in your (game)app later on.
* Centrate the OBJ files in the same way you would do in your program, or you'll get an offset on all coordinates.


Whaaa, FLAT shading?!
------------------------------
If you try the code above, it seems to work nicely at a first glance, but take your magnifier and flashlight Sherlock. See that? The lighting on the models seems.... weird. The RadarBlob breaths, but the lighting doesn't seem to change along with the movement. No shit Sherlock, that's because you didn't alter the normals yet (unless you already got suspicious and added some more code ;))!

If you rotate a polygon, the normal has to rotate with it in order to keep the lighting results correct. Only problem is that you can't do this in a vertex-shader, unless you know all neighbor vertex-positions as well. That's possible, but it requires a lot more sampling and duplicate calculations just to get the normal correct. Good thing we have Geometry Shaders these days. Geometry Shaders are actually aware of the entire polygon, as it takes primitives for breakfast. In other words, you'll get the three (morphed) vertex-positions, so you can relative easily recalculate a normal and eventually the (bi)Tangents as well.


Problem solved? If you love FLAT shading, then yes. Otherwise, you prepare to get shocked. The lighting will be correct, but the smoothing seems to be entirely gone. What happened?! Congratz, you just screwed up the smoothing and found out how flat shading works. Making a smooth shade basically involved bending/averaging normals on polygons that share the same vertices.
Your Geometry Shader however just calculated the (correct!) normal for each single triangle. What it should do is smooth the normals with neighbor triangles but... again, that is not possible unless you store & pass additional data for each vertex. By default, a GS has no access to neighbor primitives.

The good old CPU morphing methods didn't just store the altered vertexpositions for each keyframe, it also stored the (bended) normals, and interpolated between them. So, why not just take the easy route and do this as well? Make a second texture that contains the normals, in the same fashion as we did with the vertex-positions. Oh, and don't forget to smooth the model already BEFORE you insert the normals into this texture! Then in the vertex-shader, also sample the 2(or 3) normals and interpolate them.
float3 frame1Nrm= tex2D( normalTex, frame1TX ).xyz;
float3 frame2Nrm= tex2D( normalTex, frame2TX ).xyz;
float3 vNrm = lerp( frame1Nrm, frame2Nrm, frameDelta );
.......vNrm = normalize( vNrm ); // don't forget. You naughty boy.

Big chance you're using normalMapping as well, so you will also need the tangents and maybe biTangents. You could either make some more textures, but if you are concerned about having so many textures, you can also give the Geometry-Shader a second chance. Now that the GS received smoothed normals, it will calculate smoothed (bi)Tangents as well:
TRIANGLE
TRIANGLE_OUT
void main(  AttribArray iPos  : POSITION,
                AttribArray iTexcoord  : TEXCOORD0,
                AttribArray iNormal  : TEXCOORD1 // Smoothed!
         )
{
 // Just some remapping, lazy code
 float3 vert[3];
  vert[0] = iPos[0];
  vert[1] = iPos[1];
  vert[2] = iPos[2];
 float3 nrm[3];
  nrm[0] = iNormal[0];
  nrm[1] = iNormal[1];
  nrm[2] = iNormal[2];
 float2 tx[3];
  tx[0] = iTexcoord[0].xy;
  tx[1] = iTexcoord[1].xy;
  tx[2] = iTexcoord[2].xy;
 float3  tangent[3];
 float3  biTang[3];
  
 for ( int i=0; i<3; i++)
 {
  /* SORT */
  if ( tx[0].y < tx[1].y )
  {
   float3  tmpV = vert[0];
    vert[0] = vert[1];
    vert[1] = tmpV;
   float2 tmpTX = tx[0];
    tx[0] = tx[1];
    tx[1] = tmpTX;
  }
  if ( tx[0].y < tx[2].y )
  {
   float3  tmpV = vert[0];
    vert[0] = vert[2];
    vert[2] = tmpV;
   float2 tmpTX = tx[0];
    tx[0] = tx[2];
    tx[2] = tmpTX;
  }
  if ( tx[1].y < tx[2].y )
  {
   float3  tmpV = vert[1];
    vert[1] = vert[2];
    vert[2] = tmpV;
   float2 tmpTX = tx[1];
    tx[1] = tx[2];
    tx[2] = tmpTX;
  }  
  
  /* CALCULATE TANGENT */
  float interp;
  if ( abs(tx[2].y - tx[0].y) < 0.0001f ) 
   interp = 1.f; else
   interp = (tx[1].y - tx[0].y) / (tx[2].y - tx[0].y);
   
  float3 vt  = lerp( vert[0], vert[2], interp );
   interp = tx[0].x + (tx[2].x - tx[0].x) * interp;
   vt     -= vert[1];
   
  if (tx[1].x < interp) vt *= -1.f;
  float dt = dot( vt, nrm[i] );
   vt     -= nrm[i] * dt;
   tangent[i] = normalize(vt);
     
   
  /* SORT */  
  if ( tx[0].x < tx[1].x )
  {
   float3  tmpV = vert[0];
    vert[0] = vert[1];
    vert[1] = tmpV;
   float2 tmpTX = tx[0];
    tx[0] = tx[1];
    tx[1] = tmpTX;
  }
  if ( tx[0].x < tx[2].x )
  {
   float3  tmpV = vert[0];
    vert[0] = vert[2];
    vert[2] = tmpV;
   float2 tmpTX = tx[0];
    tx[0] = tx[2];
    tx[2] = tmpTX;
  }
  if ( tx[1].x < tx[2].x )
  {
   float3  tmpV = vert[1];
    vert[1] = vert[2];
    vert[2] = tmpV;
   float2 tmpTX = tx[1];
    tx[1] = tx[2];
    tx[2] = tmpTX;
  }
    
  
  /* CALCULATE BI-TANGENT */
  if ( abs(tx[2].x - tx[0].x) < 0.0001f ) 
   interp = 1.f; else
   interp = (tx[1].x - tx[0].x) / (tx[2].x - tx[0].x);
   
   vt  = lerp( vert[0], vert[2], interp );
   interp = tx[0].y + (tx[2].y - tx[0].y) * interp;
   vt     -= vert[1];
   
  if (tx[1].y < interp) vt *= -1.f;
   dt = dot( vt, nrm[i] );
   vt     -= nrm[i] * dt;
   biTang[i] = normalize(vt);  
 } // for
 
 // Output triangle
 emitVertex( iPos[0] : POSITION,  iTexcoord[0] : TEXCOORD0, iNormal[0] : 
                    TEXCOORD1, tangent[0] : TEXCOORD2, biTang[0] : TEXCOORD3 );
 emitVertex( iPos[1] : POSITION,  iTexcoord[1] : TEXCOORD0, iNormal[1] : 
                    TEXCOORD1, tangent[1] : TEXCOORD2, biTang[1] : TEXCOORD3 );
 emitVertex( iPos[2] : POSITION,  iTexcoord[2] : TEXCOORD0, iNormal[2] : 
                    TEXCOORD1, tangent[2] : TEXCOORD2, biTang[2] : TEXCOORD3 );
} // GP_AnimMorphUpdate

Hard to notice, but another little animation was oil streaming down. Just a timed fade-in of an oil texture. To make it "stream", the fade-mask moved from up to down.

Final tricks
------------------------------
We just made a Morphing solution that uses modern techniques to optimize performance such as VBO's (allowing to keep all data stored on the GPU instead of transferring vertex-data each time), without bothering the CPU to do the interpolation math,

Two more tricks I'd like to explain is having an "influence factor", and updating data inside a VBO. Morphing animations just aren't flexible when it comes to dynamic controls. But there is at least one simple trick you can apply: "influence". In the demo movie, you'll see the RadarBlob breathing much faster and more intense at the last seconds. We didn't make multiple animations though. We just speeded up the animation-timer, and increased this mysterious "influence factor". Well, if you took a good look at the code you already saw how it works: you just do a second interpolation between the original vertex pose, and the animated pose.

Last but not least, don't forget you can actually store the updated positions/normals into a (second) VBO. In the case of Tower22, this monster will get rendered many times per cycle. Three shadowcasting lamps are on its head, so it will appear in their depthMaps. Also the water reflection and glossy wall reflections will need this monster. All in all, this guy may get rendered up to 12 times in a single frame. Now the interpolation math isn't that hard, but the recalculation of the tangents and all the texture applies concerned me a bit. So instead of re-doing all those steps for each pass, I update the monster VBO first, using Vertex-Streaming / Transform-Feedback. So first store the morphed vertex-positions/normals/tangents for the current time into a secundary VBO, then for all passes just apply the 2nd VBO so we don't have to calculate anything anymore. See the links below for some details about this technique:
http://tower22.blogspot.com/2011/08/golden-particle-shower.html

Case closed.

No comments:

Post a Comment