The Big E-UAE JIT blog: 2013

Wednesday, December 25, 2013

Happy Holidays!

I wish Happy Holidays to all the Amigans around the globe!

Are you still hoping for a late Santa? Wink, wink... ;)

Monday, November 4, 2013

On a collision course

So, where are we at with the recent update? 349 out of 387: no, it is not 100% yet, but a nice, fat 90.1%!

Quickly, what was done this time and let's jump to conclusions right after that:

Implementation of BFINS Dx,Da{y,z}, BFFFO Dx{y,z},Da, BFEXTS Dx{y,z},Da, BFEXTU Dx{y,z},Da, BFCLR Dx{y,z}, BFTST Dx{y,z}, BFSET Dx{y,z}, BFCHG Dx{y,z}, ROXR.x Dy,Dz, ROXR.W mem, ROXR.x #imm,Dy, ROXL.x Dy,Dz, ROXL.W mem, ROXL.x #imm,Dy, ASR.W mem, ASL.W mem, LSR.W mem, LSL.W mem, ROR.W mem, ROL.W mem instructions.
Opcode compiler fucntions for different operation sizes are unified, unnecessary helper functions were removed.
Macroblock protos are generated from the opcode table file rather than used the manually prepared file.
Fixed wrong function name for the CNTLZW PowerPC instruction emitter.
Fixed missing input register in C and X flag extraction macroblock which is used in some shift instructions.
Fixed register dependency in ASR.x Dy,Dz and LSR.x Dy,Dz instructions.

(Slightly unrelated: I have found a bug in the original E-UAE code for BFINS and flags, if I will have enough patience I will fix that. And another one in Petunia, that will be fixed too...)

Now, we are getting dangerously close to the beta stage. It is really hard not to give you guys any promises what I cannot keep later on... (Summer is coming, you know... ;)

Anyway, my plans for the near future (read: when it is done) are:

After I have finished with the implementation of all instructions which were anticipated earlier I am going to stabilize the emulator a bit and clean up the code if needed.
I will try to release a compiled beta version and put together some documentation for using/testing it.
There are some outstanding issues, one of the most critical one is fixing the problems with the macroblock optimizer.

So much for the crystal ball and now...

Something completely different

I have to admit that my posts were not that interesting to read recently. Earlier I invested more effort into the posts and it was probably more fun to read, especially to the developers.
I would like to bring back that tradition, so for this update I came up with a few interesting thoughts on:

How to avoid the decisions

I know that many developers love to procrastinate anything, including (but not limited to) decisions. But what I am about to write is not how lazy my fellow developers are, but rather how to get around a situation when it is not ideal to do a comparison and branch according to the result.

Why would that be important at all?
There are a number of reasons why it is better to avoid branching, this article lists many examples for that and also explains the reasons. It basically boils down to the following reasons:

branching can cause cache misses;
the branching instruction is a useless overhead and it can be even slow on certain architectures;
conditional branching disrupts the pipelining of the execution (although there are pretty sophisticated techniques available to deal with that).

In our case none of these reasons apply, but unfortunately branching is not really possible due to the static flow analysis on the macroblock list.

To put it simply: each macroblock is depending on the results of the previously executed macroblocks, if we skip ahead then it is impossible to tell whether those required macroblocks were executed or not before the dependent macroblocks.

Why does this cause any trouble? Because sometimes it is pretty hard to avoid conditions inside the instruction implementations.

I already faced this issue earlier, when I started to work on the conditional branching and setting instructions (Bcc and Scc). There was no possible way to avoid the conditional branching for these instructions due to its very nature of the instructions.
I ended up creating a very specific macroblock which embeds the condition checking and branching, so the "inter-macroblock" flow analysis remains intact.

In the recent update you can find many bit field instructions which were complicated enough already and then the very same issue came up again:

For bit field instructions 0 (zero) bit field width means 32 bit actually. This doesn't sound that bad, however sometimes the bit field width is coming from an emulated data register instead of a statically encoded constant. In this case the decision must be done in the emulated code instead of in compiling time.

Not in every case, but quite often it is possible to calculate the result instead of making a comparison and branching.

For the bit field instructions the solution was (written in C code, just because it is easier to understand):

temp = width - 1;
width = width | (temp & 32);

(There is one condition, though: the width must be between 0 (zero) and 31 before this operation starts.)

Now, think about it a little bit and figure out on your own why does it work. I am not going to explain it. :)

See you soon.

Wednesday, October 2, 2013

Back on track

Finally, I am back on track with the development after the house move. There were two updates since the last post; one was a follow-up to the previous code refactoring; the other was a massive list of improvements:

Implementation of NEGX.x mem, NEGX.x Dy, SUBX.x -(Ay),-(Az), ADDX.x -(Ay),-(Az), MOVE CCR,mem, MOVE CCR,Dx, EOR #imm,CCR, OR #imm,CCR, AND #imm,CCR, RTR, MOVE mem,CCR, MOVE Dx,CCR, MOVE #imm,CCR, ASR.x Dy,Dz, RTD, MULU.W mem,Dx, MULU.W #imm,Dx, MULS.W mem,Dx, MULS.W #imm,Dx, SUBX.x Dy,Dz, ADDX.x Dy,Dz, MOVEM.x regs,mem, MOVEM.x mem,regs, MOVEM.x (Ay)+,regs, CMPM.x (Ax)+,(Ay)+ instructions.
Added dependency tracking for non-volatile PowerPC registers.
Fixed X flag handling in register-based shifting instructions, previously the X flag was cleared together with the C flag if the shift steps were zero.
Removed RTM from the list of the potentially supported opcodes.
Added RTR back to the list of the potentially supported opcodes.
Optimized temporary register usage in MULU.W Dx,Dy and MULS.W Dx,Dy instructions.
Introduced tracking of the extension words after the instructions, it is needed for adjusting the PC before certain addressing modes are processed.
Fixed register dependency and order of register storage for MOVEM.x regs,-(Ay) when direct memory access is enabled.
Implemented stack-like concept for register saving into the context.
Code cleanup: removed unused reference, fixed some warnings regarding misformatted and unused code lines.

So, things are getting together slowly. The number of the implemented instructions went up to 321 out of the planned 387! That is roughly 82%... Getting closer and closer... :)

Recently I faced an interesting problem, I am not quite sure how can I solve: division by zero. My beloved math teacher already told me: who is trying to divide by zero is an idiot. (Well, that is not quite right, as we know it.) Yet, some programs might try it.
Why is that a problem? Because it triggers an exception inside a compiled block. It also needs branching (skip the exception triggering if the divisor is not null, for a change) which contradicts the macroblock register flow tracking. Well, here is the challenge, but I am pretty sure I will solve it somehow.

'Til then the usual: watch this space.

Sunday, August 18, 2013

Moving stuff around

Long time no see! No wonder: we just moved to a new house, which usually means lots of things to deal with and boxes wherever I look. (I hate this soo much! I badly need my daily routine, but how can I find my stuff?!) Finally, we are getting settled again and I was able to dig up the good ol' Amiga again under the pile of boxes. And this new house is just awesome, so it worth all the pain we went through.

Interesting coincidence, but right before we started the whole move house craziness (seems like ages ago, BTW) I decided to refactor some code in E-UAE JIT. Namely the handling of the temporary registers.
It was such a bad design, or rather no design at all: I used the number of the allocated temporary registers to index a couple arrays with the relevant data in it. Can you believe this? In 2013? This was unacceptable even in the '70s. There are so many reasons why you shouldn't do that.
Also what is the point of using a strongly typed language, if we don't use distinct types? In this case all the temporary registers along with the actually mapped PowerPC processor registers are passed to the functions as integers. Very easy to misuse. Bad, bad, bad.

In the recent changes I have managed to introduce the concept of a temporary register-specific type structure, which is also able to carry around the mapped PowerPC processor register number and the register dependency map for the macroblocks.

This new structure can be used when the macroblocks are collected (so mostly in the helper functions), but the code emitters for the macroblocks are dealing with the mapped PowerPC registers only. Now, I still need to solve that one: at the moment these are still passed to the functions as integers.

And what is the visible output for you guys? Nothing, if we are lucky. ;)
Yea, I know. It is always hard to justify the time to the business owners what was spent on solving technical debts. But this was a long outstanding one and I always had the itch of fixing it.

So, enjoy your Summer while it lasts and bear with me.

Sunday, June 30, 2013

The Hot and the Cold

I just realized the last update was more than a month ago. Time flies, especially when we have fun, right? :)
Well, we had some fun at least: a short holiday in the (unusually) hot Hungarian summer. 38 degrees Celsius, it was almost unbearable for us after we got used to the cooler kiwi climate.
Then we came back to the (unusually) cold New Zealand Winter. 8 degrees Celsius, almost unbearable for us now. Hard to please, you might say. ;)

Anyway, back to business: since I was barely here this month the update is less impressive than the last time. Yet, I managed to implement some more missing instructions:

Implementation of ASL.x Dy,Dz, LSR.x Dy,Dz, LSL.x Dy,Dz, ROR.x Dy,Dz, ROL.x Dy,Dz, SUBA.x mem,Ay, SUB.x mem,Dy, SUBQ.x #imm,mem, SUB.x #imm,mem, SUB.x Dy,mem, SUBQ.x #imm,Ay, SUBQ.x #imm,Dy, SUB.x #imm,Dy instructions.
Removed not required TODO from ROL/ROR.x #imm, Dy instruction.

So, nothing much to see, move along.

Sunday, May 19, 2013

One small step for mankind, a giant leap for the project

I have no idea how did I manage to achieve this much in this update, but it is certainly a confident step forward. For this time the list is long and diversified:

Implementation of Bcc.x addr, BCHG.B Dx,mem, BCHG.L Dx,Dy, BCLR.B Dx,mem, BCLR.L Dx,Dy, BRA.x abs, BSET.B Dx,mem, BSET.L Dx,Dy, BSR.x abs, BTST.B #imm,mem, BTST.B Dx,#imm, BTST.B Dx,mem, BTST.L Dx,Dy, CMP.x #imm,mem, CMP.x mem,Dy, CMP.x reg,Dy, CMPA.L reg,Ax, CMPA.W reg,Ax, CMPA.x mem,Ay, DBF.W Dx,addr, EOR.x #imm,mem, JMP.L abs, JMP.L mem, JSR.L abs, JSR.L mem, NEG.x mem, NOT.x mem, RTS, TAS.B Dx, TAS.B mem instructions.
Cache invalidation fix for OSX 10.3.9 and below. (Thanks to Mike Blackburn again.)
Fixed mask handling in BCHG.B Dx,mem instruction.
Fixed missing register mapping in ASL.x #imm,Dy implementation.
Fixed input dependency overwriting in certain memory-related allocation functions.
Fixed dependency for destination memory pointer register in special memory reading.
Fixed post address handler for condition code addressing modes, previously it might crash or call some random handler from the other addressing modes.
Fixed instructions where temporary registers are allocated but not free'ed.
Optimized masking for register to register bit instruction.
Optimized the temporary register usage in helper_test_bit_register_register function.
Optimized flag extraction in several shifting operation.
Branch scheduling is more flexible: adding multiple interleaved branches is possible.
Comment on missing implementation for an exception on loading odd address into PC.

A few highlights

First of all, let me brag around a little bit about the number of freshly implemented instructions. Right now 237 instructions are implemented out of 388, a solid 61% is done. (Previously the ratio was ~46%.)

More MacOSX versions are supported now, Mike fixed up the cache flushing a little bit and added the pre-10.4 versions too. Please read the included README file regarding the compiling instructions.

While I was working on the instructions I discovered a few bugs and glitches, which are now fixed in this release thus improving the overall stability.

I have also managed to optimize the compiled code for some instructions. Together with the implementation of some yet missing instructions the results for the Mandelbrot test (mandel_though_hw.kick.gz among the test kick files) improved a bit compared to the previous results:

Interpretive: 108 seconds (no change there...);
JIT compiled without optimization: 44 seconds (previously it was 52 seconds);
JIT compiled with optimization: 27 seconds (previously it was 32 seconds).

That was the time for the self-polishing and now back to work...

Sunday, April 28, 2013

Locations, locations, locations

In the last month I was trying to hang on my sanity while we were on house-hunting in Auckland (nuff' said). Wasn't easy and apparently it is not even close to be finished. :/

Anyway, I managed to do some work on the E-UAE JIT in the stolen moments.

In the update for this month you will find these little eggs:

Implementation of ADD.x Dy,mem, ADD.x mem,Dy, ADD.x #imm,mem, ADDA.x mem,Ay, ADDQ.x #imm,mem,
AND.x mem,Dy, AND.x reg,mem, ANDI.x #imm,mem,
BCLR.B #imm,mem, BCHG.B #imm,mem, BCHG.L #imm,reg, BSET.B #imm,mem,
NEG.x Dy,
OR.x Dy,mem, OR.x mem,Dy, ORI.x #imm,mem,
UNLK.x Ay instructions.

Fixed unintended modification of the source register for some register to memory operations.
Memory read helper tweaked to use R3 register as the result register, no need to copy the data back-and-forth. (More optimal compiled code.)
Memory reader and writer helper function cleaned up to be more independent from caller data.

It might seem a bit random how I choose which instructions are implemented, but there is always a recurring theme. Right now this theme was the memory access. As you can see most of these instructions are manipulating the memory, which was a little bit scary earlier but I came around creating some functions which can be reused for (almost) all memory accessing instructions.
The tricky part was accessing the memory while the allocated temporary registers remain accessible somehow. With a minor workaround for saving and occasionally reloading the temporary registers after the memory access this is solved now.

I am not too happy about how the whole register mapping works, unfortunately there are some limitations of the C language which makes it complicated to come up with a more robust solution. So, right now the whole thing is just a bit hacky and wacky. Maybe in the future it would need an overhaul.

I get the question most of the times: how many instructions are left to implement. There is an easy way to find out the progress: check the table68k_comp descriptor file.
Each (to be) supported instructions for the JIT compiling is already listed there, next to the name of the instruction there is a number: 0 or 1. The 1 means it is already done, 0 remains to be implemented.
The instructions which will not be supported by the JIT compiling (so the interpretive will handle these) are not listed in this file.

So, all we need to do is counting the instructions which are already supported and what remains to be done. The current state without the FPU instructions is: 181 is done out of 388 (~46% is done).
As you can see there is more work to do, but it is really hard to tell how long does it take. What I can see is that the time I have to spend with each instruction is shorter and shorter, due to the infrastructure which had to be built first but now it is mostly done. Also some instructions are very similar, I can simply reuse parts of an already finished instruction.

We are not there yet, but the donkey is not that stubborn anymore. Giddy-up buddy!

Wednesday, April 24, 2013

Mac and cheese... err... Linux

Big thanks to Mike Blackburn for some fixes for the Macintosh support and for implementing the Linux PPC support! Well done, Mike.

After this (and this) update Mac OSX 10.4 is supported too: the instruction cache flush needed a different implementation.

Also Linux PPC users can benefit from the PowerPC JIT.

The more the merrier.

Monday, April 1, 2013

After another bump on the bumpy road

I have spent some time fixing bugs and improving the performance of the compiled code and the JIT emulation overall. As a result here is the recent update:

Implementation of TST.x mem, NOT.x Dy, EXT.x Dy
Reverted the change for stopping the compiled block when the special flag is set. This change is not needed for the Kickstart and it doesn't seem to have any effect, but slows down the compiled code.
Removed hack for MOVE.z Ax,-(Ax) and MOVE.z Ax,(Ax)+ instruction implementations.
Fixed pre-decrement and post-increment addressing modes for destination. Re-enabled MOVE.x mem,mem and EOR.x reg,mem instructions, which were disabled previously to let the Kickstart running without the addressing mode fixes.
Fixed register dependency in some memory addressing modes.
Temporarily removed checking for the tiny blocks, ignoring these blocks is just not the right solution for avoiding the overhead of block calling.
Added ignoring blocks of pure unsupported instructions: these will be executed by the interpretive emulation.
Fixed blocking of small blocks, the block was not raised in the cache list.

Lots of small changes and fixes as you can see.

Probably, the most important fix was the one for the pre-decrement and post-increment addressing modes, this was blocking the Kickstart for a while from booting and this is why I had to remove the support of those two instructions I mentioned in the changes list.
As it turned out the root of this bug was a limitation of the implementation. Each addressing mode has two compiler function: one is called before the instruction compiling, one is called after that. But the situation is not always that simple, for example in this case:

MOVE.L (A7),-(A7)

This is a common instruction for copying the memory to a lower address (like moving the content of an array one step toward the beginning, in this case the content of the stack). Seems so innocent, isn't it? :)
Why this one was an issue: the handler for the destination address is called before the instruction and since it is a destination addressing mode it decremented the address in the emulated A7 register. But then the instruction was compiled, which used the address from the A7 register as a source. So, what actually happened was something like this:

MOVE.L -(A7),(A7)

Now you can see: this operation is (mostly) pointless, it copies the data from the one address to the same address. (Although, sometimes it might make sense in the communication with the hardware, but this wasn't the case right now, obviously.)

What was the fix? Pretty simple: I moved the destination address modification to the address handler which is executed after the instruction was compiled. This was a solution for this specific case, but I also had to make sure that all the other combinations are working which might be possible with the indirect addressing. One of the trickiest was:

MOVE.L (Ax)+,(Ax)

You can try to guess why. :)

Anyway, finally this bug is out of the way and I can go on with implementing the missing instructions. Some of them are done, but yet lots to go.

There was one more important change in this update: I removed the limit for the consecutive block length when a special condition was triggered by some instruction. I found it completely pointless, everything seems to be working without this condition. There was a bad side-effect of this limit: after an instruction triggered a special condition for the emulation all the following instructions are compiled one-by-one into separate blocks. The overhead for calling these tiny blocks was huge, this is why I introduced the rule of ignoring any block which was smaller than 3 instructions. But in this case lots of the code was not JIT compiled at all. (As some of you guys mentioned: the JIT LED was mostly dark - lots of blocks were not compiled.) This is fixed now, although I am a bit afraid of the side-effect of delaying the handling of the special conditions. We will see how it goes.

I also spent some time on updating my old tool: DiskDaisy. The recent updates for AmigaOS4.1 triggered a bug in that app. Sometimes it is nice doing something completely different for a while, you know. :)

Thursday, February 21, 2013

After the second year

Well, here we are again. Another year had passed - namely the second - since I started the project.

Obviously, it is not done yet, otherwise you would see Dancing Bananas around the blog. But we are slowly getting there. There are outstanding bugs and yet a fair share of work is to be done.
Since we passed the first year I gave up giving any estimate on how long will this take. As it seems I am pretty bad in estimates. Yet I hope that third is the charm! :)

Until then: Keep calm and Amiga On!

Sunday, February 17, 2013

Watch for the LED

Since there were too many complaints (too many > 2) about that it is hard to tell whether the JIT compiled code is active or not, I decided to implement a small on-screen indicator for it in this tiny update.

How does it work?

I extended the already available on-screen status line with one more "LED" which says: JIT.
If you turn on the OSD status line by adding the following lines into the configuration:

show_leds=true

Then you will find one more block at the end of the status line. This "LED" will lit up in bright greenish color as soon as the emulator executes JIT compiled code (instead of interpreted code).

Now, since the emulation is (and always will be) a mixture of JIT compiled and interpreted execution, it is not a simple task to find out how much of the executed code was JIT compiled and how much is interpreted.
To overcome of this complication the block will show you the ratio between the compiled and the interpreted code by changing the background color:

If more compiled code was executed then it is more vivid light green.

If the interpreted code dominated the execution then the color dims toward black.

When the JIT is inactive (turned off by the configuration) or for any reason the compiled code is not used (cache turned off, blocks are too short, etc.) then the background color will be completely black.

Here you can see some example screenshots:

The indicator lits up light green: JIT compiled code is running mostly.

The indicator is still green, but more dark green:
a mixture of JIT compiled code and interpreted code is executed.
The darker means less JIT code.

The indicator is black:
JIT is turned off or the compiled code is not executed for some reason.

So, now you can tell by simply looking at the screen if the JIT is active and how much of the executed code makes use of the JIT compiling.

Tiny-tiny catch

I didn't want to add extensive statistical data collection into the compiled code, it would make it run a lot slower.
This implementation slows down the emulation a tiny bit, but not that much. Probably later on I will either remove it completely or add some configuration around it, so it could be turned off.

How does it really work?

To tell you the truth: the indicator doesn't show you exactly how much of the actual instructions are executed by the JIT or the interpretive, but it collects the data from executed code block types. This is cheating beacuse:

The length of the blocks vary between 1 instruction and 20-30 (or sometimes even more) instructions. Still one block counts exactly once in the calculation, regardless of the size.
This could be improved, but I didn't want to put too much effort into this implementation.
Even inside the compiled blocks not all of the instructions are JIT compiled (as I described this in earlier posts). Thus it is possible that none of the instructions in the "compiled" block consist of actual JIT code, but simply calls to the interpretive implementation. It would be more fair to calculate the ration based on the executed instruction types instead of the block types.
Now, I would rather not change this for sake of performance. As long as most of the instructions are implemented, this won't affect the results too much. (And that is not true just yet, but will be improved overtime.)

So, bottom line: take this indicator results with a grain of salt.

Still cool, eh? :)

Saturday, February 16, 2013

When things go bogo

I came around fixing the reported issue with the enabled bogomem (fake Fast Ram) setting, you can find the simple fix in the recent update:

Fixed MOVEA.W reg,Ax - source data was not sign-extended

Now the Kickstart starts with both bogomem enabled or disabled configuration.

Some details on the fix

Debugging an emulator needs a completely different approach than debugging any other application type, simply because even if one or two instructions were misbehaving it doesn't mean that the emulated program is not working at all. It just does weird things, but not on the good way.

In this specific case when the error was: if the bogomem configuration was enabled then the Kickstart went into the dreaded reboot-loop, which is basically the result of an internal crash, usually because of a wrong access of memory somewhere or an exception while another exception is executed.

I had a closer look on what is going on and I have found that an instruction is trying to write into a custom register for the disk controller which is read-only. Never a good sign, especially if the Kickstart is trying to do that which is always playing by the (hardware) rules.
It was an even more interesting fact that the wrong-doing instruction was only for reading from memory.
My reaction was a confused face with a hint of suspicious look. I am still a bit puzzled by this, even after the fix - this must never happen ever.

But at least we have a crash!
It is always easier with a crash, it gives a starting point (or so I thought at least).

First I tried to log the full execution and analyze it for a while, but the only thing I had found was that some hardware handling loop runs too long, probably this is why the Kickstart hits the custom register by accident. This is not helping at all, usually it means that some initialization or leaving condition for the loop went wrong, so I had to look further before the loop itself.

Luckily, the Kickstart with this configuration set was working when none of the instructions were compiled. There is a simple method for finding out which instruction(s) causing trouble: turn off compiling of all instructions and add them back one by one while start the emulation with the same settings over and over again.
This sounds tedious but actually it is much easier than scrolling through megabytes of debug logs and looking for something, because it is procedural. Unfortunately, this method does not work for every possible issue, especially when the combinations of the wrong instructions are causing the problem.

At the end I had found that the MOVEA.W AX/DX,AY instruction was the one to blame and a quick look on the compiled code confirmed that the emulation was wrong: for every operation where the target is an address register the involved data must be longword sized.
In this case this simply meant that the word sized source data must be sign-extended while it gets copied into the target address register. I had done this for every other similar instruction, but I missed one case.

Now, you can probably see why there is no way I could find this bug by looking on the execution logs.

Thanks to kas1e, Thunder and MickJT for reporting bugs!

Wednesday, February 6, 2013

Can I haz time machine?

Another busy month passed, here comes the new update with lots of bug fixes and some freshly implemented instructions:

Implementation of

AND.x reg,reg;
EOR.x reg,reg;
EOR.x reg,mem;
SUBA.x reg,reg;
SUBA.x #imm,reg;
BSET.L #imm,reg;
BCLR.L #imm,reg;
EXG.L reg,reg and
NOP :) instruction.

Reorganized MOVE.x mem,mem instruction: memory reading and writing is separated out into independently callable functions to support the implementation of other similar instructions.
Implementation of OR.x reg,reg instruction was adapted to a more generic form.
Fixed flag checking for AND(I).(W|B) #imm,reg instructions.
Removed confusing OR immediate macroblock and replaced by the already available OR low immediate.
Falling back from addressing mode d16(ax) to (ax) if the offset is zero.
Fixed depenency flag handling in result check helper function.
Cleaned up opcode description table. Removed instructions from the table and from source code which won't be supported by the JIT compiler. Added missing RTM instruction.
Temporary register storage slots were moved from the stack frame to the Regs structure (context).
Removed useless debug log that flooded the output.
Temporarily disabled MOVE.x mem,mem and EOR.x reg,mem instructions to let the Kickstart run.

I have good news and not-so-good news

Which one should we start with? Okay, let me choose for you: not-so-good news first.
At the moment (beside the yet unimplemented instructions) there are two bugs that prevent the OS from running properly:

For some unknown reason when an instruction reads and then writes the memory then the Kickstart goes back to the well-known reboot loop (actually it is a crash, sometimes you can even see the Guru message).
At the moment I don't have the slightest idea why this happens. I traced it back to the normal MOVE instruction, when it copies data from one address to another. There is nothing wrong with the instruction implementation itself - as far as I can tell. So, this is some weird sh*t again. Lovely.
When the optimize flag is turned on (comp_optimize = true in the config) then the Kickstart crashes very early.To tell you the truth, I am not surprised. When I tried to figure out the register dependency for each macroblock for each instruction then sometimes I mixed up stuff what I found later on. Due to the wrong register dependency settings for some macroblocks, these were accidentally removed, so some code is "optimized away". Most likely there are more mix-ups and missing dependencies in the code, it can be found just matter of time.

Good news:
As you can see in the last item for the update: I temporarily disabled the two already implemented instruction which tries to read and write the memory. The missing instructions are substituted by using the interpretive instructions.
Now, if you disable the optimize option in the configuration (comp_optimize=false) then the Kickstart seems working with some actual JIT compiled instructions! YAY! \o/ (I guess.)

Some more interesting problems

I have got some feedback about issues with running Kickstart 1.3 and some old Amiga500 games when the JIT is enabled. This is interesting indeed because if even if the JIT was turned on it did not emulate these old codes because the cache is never turned on (there was no cache in the Motorola 68000 processors when these programs were written).
As you can read it in the FAQ: the JIT compiling is depending on the cache emulation heavily. So, this is one more question mark. I haven't had much time to investigate it yet.

But most importantly: the Summer Sun is shining, let's go surfing! (Oh, wait. I don't do surfing. Last time when I jumped onto a bodyboard on the beach I bruised my ribs. Embarrassing and totally geeky. Let's just surf the net, shall we?)

Sunday, January 6, 2013

:dancing banana: (sort of)

After more than four months of chasing my own tail on this problem, I had managed to fix up the JIT compiling to let the Kickstart boot using compiled code. (See disclaimer below...)

*Phew*, there were times when I thought I am not going to write this down ever in the blog. I was this >> << close to give it up on some days. Since the world hasn't ended in 2012, I realized I have to go on, there is no escape.

I am proud to announce: the project stepped into Alpha stage on SourceForge with this update. Details of the changes are:

For the Kickstart boot these fixes were needed:

Added compiling stop (jump) flag to instructions which might trigger interrupt for supervisor mode: OR.W #imm,SR, AND.W #imm,SR, EOR.W #imm,SR
Temporary registers are flushed at the end of the compiling cycle, but before the code generation.
Stop the block processing when the special flags were set in the block (an interrupt might be triggered).
Reload the emulated program counter register when the block finishes with a supported instruction.
Fixed wrong function epilog implementation: the return address was read from the wrong position in the stack.
Old executed instruction pointer and emulated PC register is synchronized on PC reload.

Other fixes for the bugs I have discovered while I extensively debugged the emulation:

Fixed missing releasing of the compiling buffer on quitting the emulator (memory leak).
Prevent compiling of tiny blocks (less than 4 instructions in a row): the overhead of the block calling is too much.
Fixed compiling buffer overflow checking and misleading help text for the compiling buffer unit size.
Removed supported status for not-yet-implemented EOR.x reg,mem instruction, which was added accidentally before.

(Disclaimer) Before all of you rejoice in dancing banana overload on the user portal of your choice, there is a catch: the Kickstart is not able to start up with JIT compiled instructions, only if the original interpretive instructions are called one-by-one from the compiled code.
So, there is no practical use of the sources yet, but this was the big question: is the compiled code handling able to deal with something as complex as the Kickstart? And for a long time the answer was: no.

Some über-geeky details about the fixes (you like when I'm talking dirty, right?):

As you can see from the changes there were numerous problems around the code, all of these changes were needed for the final result. There were the usual ridiculous issues like a missing negative sign in the function Epilog (line 2308) when it tried to read back the return address from the stack - from the wrong offset.
It essentially means that the execution returned from every compiled block to the parent function instead of the block call loop. The funny familiar feeling that every coder experiences sooner or later: how on EARTH this thing ever worked? ;)

The trickiest part was finding the bug about the interrupt handling: I waded through a few hundred gigabytes of debug dump following the execution. Unfortunately, I was not able to compare the different execution sessions as I mentioned earlier in the comments for the previous post.

At last, I have found out that the OS is switching between User and Supervisor mode in the Exec.lib/Supervisor() function using the special OR immediate instruction by flipping the S flag in the Status Register. This step triggers an exception which is captured by the OS and the position of the triggering instruction identified in the ROM exactly.
The bug was: I never considered this OR instruction to be similar to a TRAP or an ILLEGAL instruction, which instructions change the Program Counter by raising an exception - which is essentially a jump. Thus the compiling hadn't stopped to give back the execution to the interpretive emulation.
As a result the compiled block contained the next instruction after the OR consecutively and later the exception was triggered separately: train-wrecking the boot completely. The only way to get out of there for the OS was to reboot, this is how the reboot-loop happened.

Promises?
Now, it gets a lot easier to fix up the different instruction implementations and implement the rest of the missing instructions. As soon as the former is done the OS will be usable and the latter can be done gradually.

I would like to thank to Toni Wilen for his hints regarding the possible ways of tracking down bugs inside UAE. His suggestions gave me ideas which eventually led to the required fixes.

Yet lot more work should be done, but at least I can see the light at the end of the tunnel. (What a cliché, man. Put yourself together!)

And finally, here is a picture of me, made by my wife to capture the moment when the freakin’ thing started up for the very first time:

OMG, who is this ork-face?

See you all soon(ish) – my holiday is over tomorrow (#sadface).