Sunday, May 19, 2013

One small step for mankind, a giant leap for the project

I have no idea how did I manage to achieve this much in this update, but it is certainly a confident step forward. For this time the list is long and diversified:
  • Implementation of Bcc.x addr, BCHG.B Dx,mem, BCHG.L Dx,Dy, BCLR.B Dx,mem, BCLR.L Dx,Dy, BRA.x abs, BSET.B Dx,mem, BSET.L Dx,Dy, BSR.x abs, BTST.B #imm,mem, BTST.B Dx,#imm, BTST.B Dx,mem, BTST.L Dx,Dy, CMP.x #imm,mem, CMP.x mem,Dy, CMP.x reg,Dy, CMPA.L reg,Ax, CMPA.W reg,Ax, CMPA.x mem,Ay, DBF.W Dx,addr, EOR.x #imm,mem, JMP.L abs, JMP.L mem, JSR.L abs, JSR.L mem, NEG.x mem, NOT.x mem, RTS, TAS.B Dx, TAS.B mem instructions.
  • Cache invalidation fix for OSX 10.3.9 and below. (Thanks to Mike Blackburn again.)
  • Fixed mask handling in BCHG.B Dx,mem instruction.
  • Fixed missing register mapping in ASL.x #imm,Dy implementation.
  • Fixed input dependency overwriting in certain memory-related allocation functions.
  • Fixed dependency for destination memory pointer register in special memory reading.
  • Fixed post address handler for condition code addressing modes, previously it might crash or call some random handler from the other addressing modes.
  • Fixed instructions where temporary registers are allocated but not free'ed.
  • Optimized masking for register to register bit instruction.
  • Optimized the temporary register usage in helper_test_bit_register_register function.
  • Optimized flag extraction in several shifting operation.
  • Branch scheduling is more flexible: adding multiple interleaved branches is possible.
  • Comment on missing implementation for an exception on loading odd address into PC.

A few highlights

First of all, let me brag around a little bit about the number of freshly implemented instructions. Right now 237 instructions are implemented out of 388, a solid 61% is done. (Previously the ratio was ~46%.)

More MacOSX versions are supported now, Mike fixed up the cache flushing a little bit and added the pre-10.4 versions too. Please read the included README file regarding the compiling instructions.

While I was working on the instructions I discovered a few bugs and glitches, which are now fixed in this release thus improving the overall stability.

I have also managed to optimize the compiled code for some instructions. Together with the implementation of some yet missing instructions the results for the Mandelbrot test (mandel_though_hw.kick.gz among the test kick files) improved a bit compared to the previous results:

Interpretive: 108 seconds (no change there...);
JIT compiled without optimization: 44 seconds (previously it was 52 seconds);
JIT compiled with optimization: 27 seconds (previously it was 32 seconds).

That was the time for the self-polishing and now back to work...

Sunday, April 28, 2013

Locations, locations, locations

In the last month I was trying to hang on my sanity while we were on house-hunting in Auckland (nuff' said). Wasn't easy and apparently it is not even close to be finished. :/

Anyway, I managed to do some work on the E-UAE JIT in the stolen moments.

In the update for this month you will find these little eggs:
  • Implementation of ADD.x Dy,mem, ADD.x mem,Dy, ADD.x #imm,mem, ADDA.x mem,Ay, ADDQ.x #imm,mem,
    AND.x mem,Dy, AND.x reg,mem, ANDI.x #imm,mem,
    BCLR.B #imm,mem, BCHG.B #imm,mem, BCHG.L #imm,reg, BSET.B #imm,mem,
    NEG.x Dy,
    OR.x Dy,mem, OR.x mem,Dy, ORI.x #imm,mem,
    UNLK.x Ay instructions.
  • Fixed unintended modification of the source register for some register to memory operations.
     
  • Memory read helper tweaked to use R3 register as the result register, no need to copy the data back-and-forth. (More optimal compiled code.)
     
  • Memory reader and writer helper function cleaned up to be more independent from caller data.
It might seem a bit random how I choose which instructions are implemented, but there is always a recurring theme. Right now this theme was the memory access. As you can see most of these instructions are manipulating the memory, which was a little bit scary earlier but I came around creating some functions which can be reused for (almost) all memory accessing instructions.
The tricky part was accessing the memory while the allocated temporary registers remain accessible somehow. With a minor workaround for saving and occasionally reloading the temporary registers after the memory access this is solved now.

I am not too happy about how the whole register mapping works, unfortunately there are some limitations of the C language which makes it complicated to come up with a more robust solution. So, right now the whole thing is just a bit hacky and wacky. Maybe in the future it would need an overhaul.

I get the question most of the times: how many instructions are left to implement. There is an easy way to find out the progress: check the table68k_comp descriptor file.
Each (to be) supported instructions for the JIT compiling is already listed there, next to the name of the instruction there is a number: 0 or 1. The 1 means it is already done, 0 remains to be implemented.
The instructions which will not be supported by the JIT compiling (so the interpretive will handle these) are not listed in this file.

So, all we need to do is counting the instructions which are already supported and what remains to be done. The current state without the FPU instructions is: 181 is done out of 388 (~46% is done).
As you can see there is more work to do, but it is really hard to tell how long does it take. What I can see is that the time I have to spend with each instruction is shorter and shorter, due to the infrastructure which had to be built first but now it is mostly done. Also some instructions are very similar, I can simply reuse parts of an already finished instruction.

We are not there yet, but the donkey is not that stubborn anymore. Giddy-up buddy!

Wednesday, April 24, 2013

Mac and cheese... err... Linux

Big thanks to Mike Blackburn for some fixes for the Macintosh support and for implementing the Linux PPC support! Well done, Mike.

After this (and this) update Mac OSX 10.4 is supported too: the instruction cache flush needed a different implementation.

Also Linux PPC users can benefit from the PowerPC JIT.

The more the merrier.

Monday, April 1, 2013

After another bump on the bumpy road

I have spent some time fixing bugs and improving the performance of the compiled code and the JIT emulation overall. As a result here is the recent update:
  • Implementation of TST.x mem, NOT.x Dy, EXT.x Dy
  • Reverted the change for stopping the compiled block when the special flag is set. This change is not needed for the Kickstart and it doesn't seem to have any effect, but slows down the compiled code.
  • Removed hack for MOVE.z Ax,-(Ax) and MOVE.z Ax,(Ax)+ instruction implementations.
  • Fixed pre-decrement and post-increment addressing modes for destination. Re-enabled MOVE.x mem,mem and EOR.x reg,mem instructions, which were disabled previously to let the Kickstart running without the addressing mode fixes.
  • Fixed register dependency in some memory addressing modes.
  • Temporarily removed checking for the tiny blocks, ignoring these blocks is just not the right solution for avoiding the overhead of block calling.
  • Added ignoring blocks of pure unsupported instructions: these will be executed by the interpretive emulation.
  • Fixed blocking of small blocks, the block was not raised in the cache list.
Lots of small changes and fixes as you can see.

Probably, the most important fix was the one for the pre-decrement and post-increment addressing modes, this was blocking the Kickstart for a while from booting and this is why I had to remove the support of those two instructions I mentioned in the changes list.
As it turned out the root of this bug was a limitation of the implementation. Each addressing mode has two compiler function: one is called before the instruction compiling, one is called after that. But the situation is not always that simple, for example in this case:

MOVE.L (A7),-(A7)

This is a common instruction for copying the memory to a lower address (like moving the content of an array one step toward the beginning, in this case the content of the stack). Seems so innocent, isn't it? :)
Why this one was an issue: the handler for the destination address is called before the instruction and since it is a destination addressing mode it decremented the address in the emulated A7 register. But then the instruction was compiled, which used the address from the A7 register as a source. So, what actually happened was something like this:

MOVE.L -(A7),(A7)

Now you can see: this operation is (mostly) pointless, it copies the data from the one address to the same address. (Although, sometimes it might make sense in the communication with the hardware, but this wasn't the case right now, obviously.)

What was the fix? Pretty simple: I moved the destination address modification to the address handler which is executed after the instruction was compiled. This was a solution for this specific case, but I also had to make sure that all the other combinations are working which might be possible with the indirect addressing. One of the trickiest was:

MOVE.L (Ax)+,(Ax)

You can try to guess why. :)

Anyway, finally this bug is out of the way and I can go on with implementing the missing instructions. Some of them are done, but yet lots to go.

There was one more important change in this update: I removed the limit for the consecutive block length when a special condition was triggered by some instruction. I found it completely pointless, everything seems to be working without this condition. There was a bad side-effect of this limit: after an instruction triggered a special condition for the emulation all the following instructions are compiled one-by-one into separate blocks. The overhead for calling these tiny blocks was huge, this is why I introduced the rule of ignoring any block which was smaller than 3 instructions. But in this case lots of the code was not JIT compiled at all. (As some of you guys mentioned: the JIT LED was mostly dark - lots of blocks were not compiled.) This is fixed now, although I am a bit afraid of the side-effect of delaying the handling of the special conditions. We will see how it goes.

I also spent some time on updating my old tool: DiskDaisy. The recent updates for AmigaOS4.1 triggered a bug in that app. Sometimes it is nice doing something completely different for a while, you know. :)

Thursday, February 21, 2013

After the second year

Well, here we are again. Another year had passed - namely the second - since I started the project.

Obviously, it is not done yet, otherwise you would see Dancing Bananas around the blog. But we are slowly getting there. There are outstanding bugs and yet a fair share of work is to be done.
Since we passed the first year I gave up giving any estimate on how long will this take. As it seems I am pretty bad in estimates. Yet I hope that third is the charm! :)

Until then: Keep calm and Amiga On!

Sunday, February 17, 2013

Watch for the LED

Since there were too many complaints (too many > 2) about that it is hard to tell whether the JIT compiled code is active or not, I decided to implement a small on-screen indicator for it in this tiny update.

How does it work?

I extended the already available on-screen status line with one more "LED" which says: JIT.
If you turn on the OSD status line by adding the following lines into the configuration:

show_leds=true

Then you will find one more block at the end of the status line. This "LED" will lit up in bright greenish color as soon as the emulator executes JIT compiled code (instead of interpreted code).

Now, since the emulation is (and always will be) a mixture of JIT compiled and interpreted execution, it is not a simple task to find out how much of the executed code was JIT compiled and how much is interpreted.
To overcome of this complication the block will show you the ratio between the compiled and the interpreted code by changing the background color:
  • If more compiled code was executed then it is more vivid light green.
  • If the interpreted code dominated the execution then the color dims toward black
  • When the JIT is inactive (turned off by the configuration) or for any reason the compiled code is not used (cache turned off, blocks are too short, etc.) then the background color will be completely black.

Here you can see some example screenshots:

The indicator lits up light green: JIT compiled code is running mostly.

The indicator is still green, but more dark green:
a mixture of JIT compiled code and interpreted code is executed.
The darker means less JIT code.

The indicator is black:
JIT is turned off or the compiled code is not executed for some reason.
So, now you can tell by simply looking at the screen if the JIT is active and how much of the executed code makes use of the JIT compiling.

Tiny-tiny catch

I didn't want to add extensive statistical data collection into the compiled code, it would make it run a lot slower.
This implementation slows down the emulation a tiny bit, but not that much. Probably later on I will either remove it completely or add some configuration around it, so it could be turned off.

How does it really work?

To tell you the truth: the indicator doesn't show you exactly how much of the actual instructions are executed by the JIT or the interpretive, but it collects the data from executed code block types. This is cheating beacuse:
  1. The length of the blocks vary between 1 instruction and 20-30 (or sometimes even more) instructions. Still one block counts exactly once in the calculation, regardless of the size.
    This could be improved, but I didn't want to put too much effort into this implementation.
  2. Even inside the compiled blocks not all of the instructions are JIT compiled (as I described this in earlier posts). Thus it is possible that none of the instructions in the "compiled" block consist of actual JIT code, but simply calls to the interpretive implementation. It would be more fair to calculate the ration based on the executed instruction types instead of the block types.
    Now, I would rather not change this for sake of performance. As long as most of the instructions are implemented, this won't affect the results too much. (And that is not true just yet, but will be improved overtime.)
So, bottom line: take this indicator results with a grain of salt.

Still cool, eh? :)

Saturday, February 16, 2013

When things go bogo

I came around fixing the reported issue with the enabled bogomem (fake Fast Ram) setting, you can find the simple fix in the recent update:
  • Fixed MOVEA.W reg,Ax - source data was not sign-extended
Now the Kickstart starts with both bogomem enabled or disabled configuration.

Some details on the fix

Debugging an emulator needs a completely different approach than debugging any other application type, simply because even if one or two instructions were misbehaving it doesn't mean that the emulated program is not working at all. It just does weird things, but not on the good way.

In this specific case when the error was: if the bogomem configuration was enabled then the Kickstart went into the dreaded reboot-loop, which is basically the result of an internal crash, usually because of a wrong access of memory somewhere or an exception while another exception is executed.

I had a closer look on what is going on and I have found that an instruction is trying to write into a custom register for the disk controller which is read-only. Never a good sign, especially if the Kickstart is trying to do that which is always playing by the (hardware) rules.
It was an even more interesting fact that the wrong-doing instruction was only for reading from memory.
My reaction was a confused face with a hint of suspicious look. I am still a bit puzzled by this, even after the fix - this must never happen ever.

But at least we have a crash!
It is always easier with a crash, it gives a starting point (or so I thought at least).

First I tried to log the full execution and analyze it for a while, but the only thing I had found was that some hardware handling loop runs too long, probably this is why the Kickstart hits the custom register by accident. This is not helping at all, usually it means that some initialization or leaving condition for the loop went wrong, so I had to look further before the loop itself.

Luckily, the Kickstart with this configuration set was working when none of the instructions were compiled. There is a simple method for finding out which instruction(s) causing trouble: turn off compiling of all instructions and add them back one by one while start the emulation with the same settings over and over again.
This sounds tedious but actually it is much easier than scrolling through megabytes of debug logs and looking for something, because it is procedural. Unfortunately, this method does not work for every possible issue, especially when the combinations of the wrong instructions are causing the problem.

At the end I had found that the MOVEA.W AX/DX,AY instruction was the one to blame and a quick look on the compiled code confirmed that the emulation was wrong: for every operation where the target is an address register the involved data must be longword sized.
In this case this simply meant that the word sized source data must be sign-extended while it gets copied into the target address register. I had done this for every other similar instruction, but I missed one case.

Now, you can probably see why there is no way I could find this bug by looking on the execution logs.

Thanks to kas1e, Thunder and MickJT for reporting bugs!