The Big E-UAE JIT blog: April 2013

Sunday, April 28, 2013

Locations, locations, locations

In the last month I was trying to hang on my sanity while we were on house-hunting in Auckland (nuff' said). Wasn't easy and apparently it is not even close to be finished. :/

Anyway, I managed to do some work on the E-UAE JIT in the stolen moments.

In the update for this month you will find these little eggs:

Implementation of ADD.x Dy,mem, ADD.x mem,Dy, ADD.x #imm,mem, ADDA.x mem,Ay, ADDQ.x #imm,mem,
AND.x mem,Dy, AND.x reg,mem, ANDI.x #imm,mem,
BCLR.B #imm,mem, BCHG.B #imm,mem, BCHG.L #imm,reg, BSET.B #imm,mem,
NEG.x Dy,
OR.x Dy,mem, OR.x mem,Dy, ORI.x #imm,mem,
UNLK.x Ay instructions.

Fixed unintended modification of the source register for some register to memory operations.
Memory read helper tweaked to use R3 register as the result register, no need to copy the data back-and-forth. (More optimal compiled code.)
Memory reader and writer helper function cleaned up to be more independent from caller data.

It might seem a bit random how I choose which instructions are implemented, but there is always a recurring theme. Right now this theme was the memory access. As you can see most of these instructions are manipulating the memory, which was a little bit scary earlier but I came around creating some functions which can be reused for (almost) all memory accessing instructions.
The tricky part was accessing the memory while the allocated temporary registers remain accessible somehow. With a minor workaround for saving and occasionally reloading the temporary registers after the memory access this is solved now.

I am not too happy about how the whole register mapping works, unfortunately there are some limitations of the C language which makes it complicated to come up with a more robust solution. So, right now the whole thing is just a bit hacky and wacky. Maybe in the future it would need an overhaul.

I get the question most of the times: how many instructions are left to implement. There is an easy way to find out the progress: check the table68k_comp descriptor file.
Each (to be) supported instructions for the JIT compiling is already listed there, next to the name of the instruction there is a number: 0 or 1. The 1 means it is already done, 0 remains to be implemented.
The instructions which will not be supported by the JIT compiling (so the interpretive will handle these) are not listed in this file.

So, all we need to do is counting the instructions which are already supported and what remains to be done. The current state without the FPU instructions is: 181 is done out of 388 (~46% is done).
As you can see there is more work to do, but it is really hard to tell how long does it take. What I can see is that the time I have to spend with each instruction is shorter and shorter, due to the infrastructure which had to be built first but now it is mostly done. Also some instructions are very similar, I can simply reuse parts of an already finished instruction.

We are not there yet, but the donkey is not that stubborn anymore. Giddy-up buddy!

Wednesday, April 24, 2013

Mac and cheese... err... Linux

Big thanks to Mike Blackburn for some fixes for the Macintosh support and for implementing the Linux PPC support! Well done, Mike.

After this (and this) update Mac OSX 10.4 is supported too: the instruction cache flush needed a different implementation.

Also Linux PPC users can benefit from the PowerPC JIT.

The more the merrier.

Monday, April 1, 2013

After another bump on the bumpy road

I have spent some time fixing bugs and improving the performance of the compiled code and the JIT emulation overall. As a result here is the recent update:

Implementation of TST.x mem, NOT.x Dy, EXT.x Dy
Reverted the change for stopping the compiled block when the special flag is set. This change is not needed for the Kickstart and it doesn't seem to have any effect, but slows down the compiled code.
Removed hack for MOVE.z Ax,-(Ax) and MOVE.z Ax,(Ax)+ instruction implementations.
Fixed pre-decrement and post-increment addressing modes for destination. Re-enabled MOVE.x mem,mem and EOR.x reg,mem instructions, which were disabled previously to let the Kickstart running without the addressing mode fixes.
Fixed register dependency in some memory addressing modes.
Temporarily removed checking for the tiny blocks, ignoring these blocks is just not the right solution for avoiding the overhead of block calling.
Added ignoring blocks of pure unsupported instructions: these will be executed by the interpretive emulation.
Fixed blocking of small blocks, the block was not raised in the cache list.

Lots of small changes and fixes as you can see.

Probably, the most important fix was the one for the pre-decrement and post-increment addressing modes, this was blocking the Kickstart for a while from booting and this is why I had to remove the support of those two instructions I mentioned in the changes list.
As it turned out the root of this bug was a limitation of the implementation. Each addressing mode has two compiler function: one is called before the instruction compiling, one is called after that. But the situation is not always that simple, for example in this case:

MOVE.L (A7),-(A7)

This is a common instruction for copying the memory to a lower address (like moving the content of an array one step toward the beginning, in this case the content of the stack). Seems so innocent, isn't it? :)
Why this one was an issue: the handler for the destination address is called before the instruction and since it is a destination addressing mode it decremented the address in the emulated A7 register. But then the instruction was compiled, which used the address from the A7 register as a source. So, what actually happened was something like this:

MOVE.L -(A7),(A7)

Now you can see: this operation is (mostly) pointless, it copies the data from the one address to the same address. (Although, sometimes it might make sense in the communication with the hardware, but this wasn't the case right now, obviously.)

What was the fix? Pretty simple: I moved the destination address modification to the address handler which is executed after the instruction was compiled. This was a solution for this specific case, but I also had to make sure that all the other combinations are working which might be possible with the indirect addressing. One of the trickiest was:

MOVE.L (Ax)+,(Ax)

You can try to guess why. :)

Anyway, finally this bug is out of the way and I can go on with implementing the missing instructions. Some of them are done, but yet lots to go.

There was one more important change in this update: I removed the limit for the consecutive block length when a special condition was triggered by some instruction. I found it completely pointless, everything seems to be working without this condition. There was a bad side-effect of this limit: after an instruction triggered a special condition for the emulation all the following instructions are compiled one-by-one into separate blocks. The overhead for calling these tiny blocks was huge, this is why I introduced the rule of ignoring any block which was smaller than 3 instructions. But in this case lots of the code was not JIT compiled at all. (As some of you guys mentioned: the JIT LED was mostly dark - lots of blocks were not compiled.) This is fixed now, although I am a bit afraid of the side-effect of delaying the handling of the special conditions. We will see how it goes.

I also spent some time on updating my old tool: DiskDaisy. The recent updates for AmigaOS4.1 triggered a bug in that app. Sometimes it is nice doing something completely different for a while, you know. :)