In the words of Dr. Nick, “Hello everybody!”…
I suppose I better start by introducing myself. My name is Karl Churchill whom you may know as Karlos from one or more of the various Amiga forums. I’ve been an Amiga enthusiast since I first had a go of a friend’s A500 back in 1988. I didn’t actually own one until 1992, but from then on, I’ve not been without one. I’ve been asked to write a few words about my involvement with OS4.1 for classic. Well, there’s the short version and the long version. The short version is that I’ve been working as a contributor to add Warp3D support for the Permedia2. Work began in October last year as an evening project.
The version of the driver in the repository had not been touched in the best part of a decade and whilst it compiled, that was all it did. Any attempt to launch a 3D application would simply freeze the machine. The first challenge was to get it to a point where it would at least allow an application to start, even if it didn’t render anything. Working on low-level code is not without it’s complications. Thankfully, the kernel provides a debug printing service but I don’t have a serial line debugger (the serial port on that machine is not entirely reliable) so every time there was a crash, I’d have to reboot and run DumpDebugBuffer to see what had blown up. Progress was slow at first, but eventually I was rewarded with a blank screen and not a DSI. Over the following months I got all the basic drawing routines up and running, starting with the basic V1-V3 API calls. Along the way, I found some amusing undocumented bugs in the Permedia2 that were the cause of a lot of head scratching and finally I got up to the point where I could implement the V4 API. This is where everything changes and is a good time to start the “long” version of the story… as if this one wasn’t long enough already!
Back in 2001 or so, I was developing my own applications that were using Warp3D as an “advanced rasterizer” for 2D. If you’ve ever used RTG as a developer on OS3.x, you’ll doubtless be as dismayed as I was with the extent of it’s hardware acceleration. The functions provided by Warp3D were much more capable and efficient for 2D graphics work. Warp3D 4 was released and introduced it’s vertex array functions. For what I was doing at the time, these were perfect. I refactored my code to use them and then discovered to my dismay that various chunks of the advertised API just weren’t implemented by the driver, especially line and point rendering.
I badgered Hyperion about this at the time who were busy with other things but responded by giving me access to the source. It’s fair to say that after a few months, I had the only driver that implemented every drawing operation the API specified. The original V4 functions used templates (i.e. C macros) to generate a drawing function for each possible combination of primitive and supported vertex format. Conceptually, this is the most efficient way of doing it but in reality it produces a lot of code which is mostly redundant and not cache friendly. After adding the extra primitives, the driver had reached an unprecedented size (well over 1MB) which was clearly too big. The only thing that actually varied from one format to the next is the way in which the vertex data was fetched. So, I scrapped the existing code and started a new version that has drawing routines for each primitive, but uses a function pointer to invoke a fetcher that is specific to the format being used. The code was well within acceptable size again and ran considerably faster despite having to make up to 3 indirect function calls per vertex (one for geometry, one for colour and one for texture/fog).
I went on to fine tune this implementation by adding versions of the fetchers that were not only specific to the vertex format but also the currently selected states. This moves a lot of state-dependent conditional logic out of the fetch routine and as they represent the innermost level of the code, that’s always a good thing. Eventually, some of theV4 routines were over 2x faster than the original version, depending on the vertex format and state. Unfortunately, this driver never found it’s way into the wild. Mostly because I never considered it finished it due to the constant need to tweak and improve it; there was always some new experiment to try. DMA FIFO transport is still “the one that got away” but I’ve promised to behave and actually release versions this time. The vertex fetch idea caught on though and became the standard method used for all later drivers, so at least the work wasn’t entirely wasted. I was later asked if I’d be interested in making an OS4.0 version of it. Seeing as this meant I’d get to play with something new on my A1200 (the then upcoming OS4.0 classic), I naturally agreed and having got the beta version on my machine started work on it. Classic 4.0 was very much in beta at that time and I ran into a lot of issues just running it, development using it was even trickier. Then life took a series of increasingly difficult turns which ultimately left me out of the scene for many years. OS4.0 for the classic came and went in the meantime. I had thought things had hit an inflection point when a close uncle passed away suddenly but things reached a new low in 2007 when my mother was diagnosed with a life-limiting illness and passed away just two short years later in the spring of 2009. She was not to be the last. By the end of that year, I felt I needed to get back into my hobbies, doing something, anything, would be therapeutic.
So I bought OS4.1 for the inherited A1 that was sat under my desk for about 4 years unused to see how things had moved on since my early experiences with 4.0 beta for the Classic. Quite a bit, as it happens. As I got back into it, I got the SDK and started playing about with some code. And that’s where the long and the even longer versions of the story recombine. This time I have a nice stable machine to do the actual compilation on, which has helped considerably. The present driver for 4.1 has incorporated all the fundamental changes the unreleased v4 driver had and then some; even the old V1-V3 drawing routines now use a set of vertex fetchers, all designed around W3D_Vertex, but optimised for each state combination that has an effect on their operation. The FIFO code has been completely rewritten, as has the state handling. Finally there is as much support as is feasible for the V5 API. Unique challenges this time around are trying to appease some old applications that did naughty things (like poking the context instead of making the proper API calls) without sacrificing too much performance and trying to track down some really persistent bugs. The latter has led to having to work through the main library and even the RTG driver. It’s a frustrating task a lot of the time; you think you nailed a bug, only to discover the “it works for me” phenomenon is genuinely real and the same bug is still affecting other users. So, back to the drawing board. Still, seeing Quake3 run on my 20-year old A1200 was fun enough to make it worthwhile.
So, the Permedia2 driver is still in development and won’t be considered final until the various bugs are ironed out. In a case of history repeating itself, I also noticed that the R200 driver doesn’t draw everything advertised in the API either. I guess I’ll have to look into that one next… 😉