Due to a demand from Bod of Stax, here's a translation of an old article where I tried to describe the rotozoomer as found in the Fantasia demo, from Dune and Sector One. The original was in french and can be found here.
Once upon a time, a coder whose scene name was Oxbab, was still a member of Diamong Design. The crew made some joint-venture with Oxygene, before joining Oxygene to release Amiga AGA then PC demos.
Before doing anything with Oxygene, Diamond Design released the Brace. The demo was distributed as kind of a bunch of effects run from the AUTO folder. Well, quite simple and efficient. This demo features a nice 1 bitplane rotozoomer.
While coding in the Fantasia, I just watched the demo and was curious about the rotozoomer. Oh, surprise, Oxbab left the debugging symbols in ! So, I based the Fantasia rotozoomer upon Oxbab's code. You may tell I'm a ripper, but please read his kind comment on the original article. Much of the code has changed as it is now a 4 bitplane rotozoomer but some part and ideas of it remains. Simply look at the uppercase instructions in the supplied source code.
Image may be NSFW.
Clik here to view.
This article will not be about the rotozoomer itself, which finally is a "simple" offset effect. For more information about rotozoomers, click here. Now we'll focus on the chunky to planar (c2p) routine used in the effect, of course written in pure 68000. Also the article will try to get more into details than the french one.
First, the 4x4 c2p rout uses a 512KB table in order to be optimal. The c2p is base around this macro :
rotate macro moveq #0,d0 move.b "Du"(a0),d0 or.b "ne"(a1),d0 add.w d0,d0 ; align on word move.w (a6,d0.w),d0 ; use a table to shift 8 bits, we win 4 cycles upon lsl move.b "Du"(a0),d0 or.b "ne"(a1),d0 lsl.l d2,d0 ; #3, align for 8 bytes move.l a2,a3 add.l d0,a3 move.l (a3)+,(a4)+ move.l (a3),(a4)+ endm
Note: the original source code has french comments, I just translated them here.
As you may have guessed, the "Du" and "ne" offsets will be changed with some self-modifying code, they represent the displacements in the texture. This macro is repeated inside a small loop (see label _rotate_me_the_face).
So, before going further into this macro, let's start with the table used by
the macro. Open the source file and jump to the init_zoomer
label.
This part generates the 512 KB table. The goal here is to generate the planar
representation of every combination for a group of 4 chunky pixels in 16 colors
- oh, that makes 4 bits per pixels * 4 pixels, it takes 1 word. This means
65536 different combination, every combination being a pair of longwords,
meaning 8 bytes per entry.
Let's take a further look at the c2p rout. Let's start with the first instructions :
moveq #0,d0 move.b "Du"(a0),d0 or.b "ne"(a1),d0
First, we clear d0, because the high word might contains junk from a precedent iteration. That's 4 cycles.
Then, the first 4 bits chunky pixel if fetched from the chunky picture, pointed by a0. Result is put in d0. Second, we fetch the second 4 bits pixel chunky pixel, but this time shifting by 4 bits to the left, this particular structure is pointed by a1. Those instructions take 12 cycles each.
Then, we use a table in order shift the obtained byte by 8 bits to the left :
add.w d0,d0 ; align on word move.w (a6,d0.w),d0 ; use a table to shift 8 bits, we win 4 cycles upon lsl
The whole take 4 + 12 cycles and is apparently 4 cycles faster then
lsl
. This makes me think that the 68000 lacks a swap.w instruction
in order to swap the high and low byte in a word.
Then, same thing again, we fetch the chunky pixels in 24 cycles :
move.b "Du"(a0),d0 or.b "ne"(a1),d0
Then we have the final 5 instructions :
lsl.l d2,d0 ; #3, align for 8 bytes move.l a2,a3 add.l d0,a3 move.l (a3)+,(a4)+ move.l (a3),(a4)+
The first lsl
shift d0 for 3 bits, in order to align our
displacement on a 8 bytes boundary. Remember the init routine that generates a
pair of longwords ? This gives the displacement in the c2p table. Then, as
the 68000 has no (ax, dx.l)
and (d8, ax, dx.l)
displacement modes, we have to add the obtain the pointer to the planar entry
by adding the offset to an address register. Then we simply display the
obtained planar data to the screen, pointed by a4.
The macro is repeated 12 times per line and every lines are also copied 3 times with movem, as the Atari ST lacks a blitter or the STe dynamic video address registers. This gives us a 202 x 200 1 VBL rotozoomer. On the STe, the effect can be easily done in 320x200 by updating the video pointer every line in order to display 4 times the same line, as seen in the UFO. Leonard even made it fullscreen in the amazing We Were @. It don't know how his rotozoomer routine looks by the way.
Regarding the zoomer itself, the code is self-modified in the
calc_zoomer
routine, coming from original Oxbab's code. Two table
are used: DELTA
and TAB_TAIL
. DELTA
refers to table rotation table, I precalced it in GFA. I still have the source
code somewhere, but I don't have access to it. TAB_TAIL
is the
zooming factor table. It's also precalculated in GFA.
Hope you find this useful.
Greetings flies to :
- Cyclone for leading me the c2p idea and the various discussions about optimising
- Mic for bringing his usual touch to beautify this effet
- Chuck and his magic fingers
- ST Ghost for this precious advices and his init routine (made with Zerkman also)
- and Oxbab for having forgotten to remove the debug symbols ;)