Heretic II: Progress Report

Go to the Linux-sectionGo to the Mac-section

06-January-2018: Progress update by Hans-Joerg Frieden


Click for a full-sized picture

Work in Progress

Currently, there are four main areas where work is still done. These areas include the hardware renderer, saving code, and memory consumption. Additionally, the sound system is undergoing a slight overhaul to make it compatible with sound cards.


Hardware Renderer (Hans-Joerg Frieden)

Work is currently done on implementing a subset of OpenGL as a separate library. Most of this work is already finished; there are just a few things left, like the automatic texture coordinate generation required for the reflection shrines, a few more reductions on memory requirements, and a few optimizations in the core pipeline.

We have considered some alternatives to a custom implementation, like using the Mesa Library, but decided to go this way for several reasons. The main reason is speed. We don't need all the features of OpenGL, which would only slow down the result. Additionally, the use of a custom OpenGL library allowed us to build in a few shortcuts and additional functions that give us additional speed. For example, the Heretic II renderer uses a glOrtho call for setting up a one-to-one pixel mapping for drawing things like the console characters. Adding a few new primitives that do not go through the OpenGL pipeline as usual gave us additional speed, most notably since we can entirely skip the complete rendering pipeline, including clipping.

Additionally, the currently available 3D Graphics chips on the Amiga do not support all the required blending modes. Most notably, the OpenGL blending mode (GL_ONE, GL_ONE) is desperately missing. We had to build a workaround for this, which was not so easy since this blending mode does not require an alpha channel, but any workaround does. Most textures that are additively blended don't have one, so I had to add a stage where texture alpha is added on-the-fly if need shall be. The result, however, looks reasonably well, and virtually no speed is lost.

Finally, complete control over our MiniGL implementation allows us to add CPU-Specific optimizations in the future. These include special routines for the AltiVec unit of the G4 processor that is supposed to be on the new Phase 5 boards (WarpUP support for these would be a prerequisite, though). Also anything that the G3 boards by Metabox and Escena might bring with them can be easily supported. The use of Warp3D as the underlying hardware rasterizer allows us to support additional features like S3TC or geometry setup when these become available.

Work is still being done on the hardware renderer, but we expect to have it fully working real soon now. The results are already looking excellent (see screenshots).

We expect the final game to run at around 25-30 FPS (constantly) on a PPC 200 with Permedia 2 graphics card at a resolution of 640x480. This limit is enforced by the fill rate of the Permedia 2 chip, in 320x240 the frame rate goes up to about 45 FPS. This may go up to around 60 FPS when the screen is resized to the minimum, which is a strong indication that the fill rate of the chip is actually the limit.

On the CyberVision64/3D, there is an additional problem caused by the limited accuracy of the Z Buffer, which is only 16 bits wide. The dual-pass rendering introduces artifacts that are caused by the difference in resolution between the level textures and the light maps, causing the pixels of the wall textures to "bleed" through the light maps. This problem is solved by adding a small amount to the Z coordinate of light maps that compensates this inaccuracy of the Z Buffer. It is, however, expected that the ViRGE chip on the CyberVision64/3D will be too slow to comfortably run Heretic II, even in the lowest resolution of 320x240. Work is being done to improve the performance, but the outcome is still doubtfull.

Click for a full-sized picture

Saving Code

Game Save code proved to be a very complicated matter due to the way the original Windows version handled this. In this system, a program is started in a virtual address space that always starts at address zero. The result of this is that each time a program is loaded, the code always ends up at the same (virtual) address.

On the Amiga, this is done in a different way. A program on the Amiga has additional relocation information stored in the executable file that allows the system's run-time loader to relocate the program to any address. There is no virtual addressing involved (one reason for this is that the Amiga's messaging system is entirely based on shared memory, making interprocess communications very efficient, one of the reasons for the Amiga's responsiveness).

On the downside, this also means that functions will almost never end up on the same place. The direct consequence is that it is impossible to simply store the function pointers in the save game file - they would be totally different the next time the program is loaded.

The only possibility is to identify each function pointer that is used and write a unique identification number into the file. Upon loading, this number is converted back into the function address as it is currently.

The main work has been to identify those function pointers that are actually used during saving, and assign each of them a unique number. Also, there are a few other issues that had to be addressed (like scripts etc.). Saving works in the meanwhile, and while not extensively tested, this should pose no further problems.

Click for a full-sized picture

Memory consumption (Peter Annuss)

Virtual Memory on the Amiga is only available on the 68k side. The dual-CPU architecture makes virtual memory difficult to handle on both CPU's, since the memory management units would need extensive synchronisations, since both CPU's also communicate via shared memory buffers. This communication is further complicated by the fact that due to the heterogenous nature of the system a cache coherency protocol is next to impossible, making a cache flush required every time program control is handed over to a different CPU.

Our initial intent was to make the game run in 32 Megabytes of memory. It turned out that this is impossible to achieve, due to the complexity of the levels and model data (for example, one instance of the Corvus model requires approximately 3.9 megabytes of memory).

Extensive work has been done on reducing the memory requirements. This included rewriting some of the memory management code that previously assumed virtual memory to use a more efficient means of allocating and handling memory. The new system now allocates smaller hunks of memory and keeps track of them in arrays that are automatically extended if more memory is required.

Furthermore, some of the static arrays have been replaced by dynamically allocated versions that pre-calculate the required size, and textures and sounds may now be forced out of memory and be reloaded again later.

In total, we have managed to get Heretic II to run comfortably in 64 Megabytes of physical memory. This may still require a shut down of Workbench, though, if large backdrop images or many additional background programs and commodities are used.

Click for a full-sized picture

Sound (Christian Sauer)

The sound system currently uses the audio device for sound output. This works well on systems with no sound card. An AHI version of the soundcode is being finalised. All in all, the sound system is more or less complete.


Additional improvements (Peter Annuss)

One of the obvious requirements for an action-packed game like Heretic II is high frame rates. This is especially true for network games. Therefore, a few speed improvements where done for the software renderer.

One of the major weaknesses of current Amiga technology is the rather slow graphics bus. There is an imposed limit of around 10-12 Megabytes per seconds on the standard Zorro-III-Bus. The CyberVisionPPC raises this limit to around 18 MB/s, but this still is a good deal slower than modern PC hardware can go.

Usually, the software renderer draws its stuff into a memory buffer which is then moved into graphics ram. This is done because the transparency effects require the framebuffer pixel to be read back and be blended with the incoming pixel (this is commonly called "alpha blending" because the pixel's red, green and blue values are usually weighted with a fourth value called the alpha value). Rendering directly over the graphics bus would slow things down, since these reads would also require to go through the slower bus.

However, the copy loop from system to graphics memory also take an amount of time. Tests have shown that direct writes to graphics memory are almost always faster than writing to a back buffer and copying, but only if overdraw or transparency in a frame is limited.

Therefore, the software renderer measures the amount of overdraw/pixel read-backs during scene traversal, and before it starts the real rendering it decides what method would be faster based on the information gathered during the building of the polygon list. It then draws directly into video ram if it thinks this would be faster.

Time-Refresh tests have shown that this indeed gives speed gains, sometimes as dramatic as 10 FPS or more.

Click for a full-sized picture

68K version (Christian Sauer, Hans-Joerg Frieden)

A 68060 version is still not totally ruled out, although it looks like the 68k line of CPU's will be too weak to cope with the computationally expensive Heretic II engine. A heavily (assembly) optimised version of the software renderer proved unplayable even on an overclocked 060/66.

Frame rate tests with the same overclocked 060 using the Permedia 2 board were quite impressive though, with peak values of 20 FPS (regardless of the resolution), but the frame rate can drop to as low as 5 FPS, which is no longer really playable. Optimizing the MiniGL library might be a solution and we already have some ideas how this can be done (in fact, there are a few shortcuts in the pipeline that have been disabled during the debugging phase). It is however unclear if this will have serious impact on the minimum frame rate.

Additional improvements (Michael Siegel)

The Smacker video codec port has been done by Michael Siegel. The port is finished, though Michael is still working on finalizing a few points. Smacker will work on 68k and PowerPC.

Click for a full-sized picture


To top of pageBack