Over the past year and a half, the SM64 hacking community has made significant progress on getting ROM hacks to work on the original N64 console. macN64
has even created a guide on the process to fix textures on some older hacks to get them to work on console, which is fantastic. You can see this thread
on the current progress on console compatibility.
A major problem we are having now is the unplayable amount of lag that larger levels cause on console. There are many approaches to try and reduce lag, and one approach that I'm going to introduce in this post is a way to optimize fast3d texture processing. I will make a simple level with 2 (32x32) textures as an example.
I first draw the grass, then the orange rock wall, and then finally I draw the top part with the grass texture. Once imported, the fast3d commands for loading the textures will look something like the image diagram below.
(TMEM = Texture memory cache)
As you can see each texture is loaded up and bonded to the following triangles that will be drawn. However, looking at the image above, we see that the game has to switch from Tex1 to Tex2 and then back to Tex1. Why does it do this when we know that all of the grass triangles are going to be drawn anyways? Also, why are we not taking full advantage of TMEM? Every time we want to switch textures, we always just load it up from RAM. This is just a small example level, but just imagine a huge level from Last Impact or Star Road. The game would have to keep loading textures into TMEM from RAM tens to hundreds of times per frame!
I believe that a large number of 0xF3 (G_LOADBLOCK) commands can cause a large level to lag, because of a delay between loading from RDRAM to TMEM on console. If we reduce the number of times the game has to load textures, then performance should improve to a semi-playable level. At the very least we can reduce the number of Fast3D commands the game has to process, which will be a definite benefit.
Does this really matter?
On console, I believe the answer is yes.
See the two paragraphs above.
On emulators, the answer is no.
The RAM on your modern machine is monumentally faster than anything from a game console from 1996, so moving data from a emulated N64 RAM to a emulated N64 TMEM cache is basically just copying memory around inside your computer.
Group by texture
The way levels are rendered is dependent on how you draw them in sketchup. The first triangles you draw usually get rendered first (assuming your textures are fully opague). In the simple level I made above I drew the floor first, so that is where the game starts rendering with. Then it switches over to the wall texture, because I started drawing the walls after the floor. The game then has to load up the first texture again, because I am using the grass texture for the top part. That is the reason why the game has to load up the grass texture twice.
If we want to minimize the number of times the textures have to switch, then we need to group all the triangles that use the same texture together. This way we only have to load up the texture once, and then it can draw all the triangles that go with that texture. I found a free sketchup plugin that can do this easily. It's called GroupByTexture
and was created by Rick Wilson. You can find a download for this plugin here: http://www.smustard.com/script/GroupByTexture
This plugin will explode all the groups in the model, and then regroup all the faces according to their texture. This will cause the exported .obj file to be organized properly according to the textures, and not draw order. I would only recommend using this plugin once your level is finalized, so you don't have to keep ungrouping all the faces you want to change.
Remember the unused 2KB of data in the TMEM? Well since there is enough room, why not use all of it? All we would have to do is change the number of texels (textured pixels) that are loaded with the 0xF3
command to account for both textures. Think of it like loading up a 32x64 texture, but we are only going to see half of the texture at a time. If we need to switch to the other texture, then we can call a 0xF5
command to change the TMEM offset to read from.
Using both the Group by texture and Multi-Texture loading techniques, we can reduce the amount of effort it takes to setup textures. Compare the diagram below to the previous one above. We went from 3 load texture block commands down to just 1, and the number of fast3D commands has also significantly been reduced from 21 commands down to just 8.
So we can only load two (32x32) textures at a time? Big deal.
It's true that RGBA textures are expensive in terms of data size, but think about the 4-bit & 8-bit texture formats. With the 8-bit texture formats, I8 and IA8, you can load up 4 (32x32) textures at a time. If you can somehow get away with using 4-bit textures like I4 and IA4, then you can load up 8 (32x32) textures at a time. CI textures are a little different as half of the TMEM is reserved for the color palettes, so only 2 (32x32) CI8 textures and only 4 (32x32) CI4 textures can fit inside the TMEM.
The actual number of textures you can load vary based on the resolution of the image and the bit depth. As long as the data doesn't exceed 4096 bytes, you can load it to the TMEM.
Testing on console hardware. (Does this actually work?)
Yes, but more testing is needed.
I did a quick test on my flash cartridge before writing this post, and the optimized Fast3D code above does seem to work. I cannot tell you how this affects performance yet, so take everything I say with a grain of salt. I'll try to do some more testing if I can once I'm done with my finals in less than two weeks.
If you have any questions or updates on the post, then please leave a reply below and I'll try to respond as soon as I can.