This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Showing posts with label restructuring. Show all posts
Showing posts with label restructuring. Show all posts

Wednesday, January 1, 2014

PPCJITBETA01 (Happy New Year 2014)

First of all:


I wish all of you guys an awesome happy new year for 2014!

We made it this far, there was no nuclear holocaust yet, which is an amazing achievement for the human race considering our lovely nature. Well done.

And now something completely different, but almost equally exciting (probably for much less human beings, can't speak for aliens)...

I woke up in the morning and had a look at the clock and it showed me:


BETA TIME!

Yes, it is unbelievable, I know. After this long-long waiting finally here it comes.
And it even comes with bugs! Lots of it!

Now seriously, if you are interested in testing the PowerPC JIT then go and get your binary release from the SourceForge page:


Tiny catch: at the moment only the AmigaOS4 version is available. See below.

You might also need a previous distribution which includes all the tools, like transrom, mousehack, make-hdf, etc. I tried to compile these, but somehow the cross-compiling failed and I ran out of patience.

What is included in beta01?

Before you fire up your favorite Amiga software in the emulator, please DO READ the README file! It will save you (and me) lots of wasted time, I guarantee.

The very first thing I have to mention is that although I had done my best I was not able to finish implementing all the planned instructions. We are very close, according to the statistics 95.09% of the instructions (368 out of 387) are done.
The reason is quite simple: the recently implemented instructions were the most complex ones. Well, you know this is exactly why you must not procrastinate the hardest part of the job just because it makes you nervous even simply thinking about it...
I have spent days on implementing BFINS and the exception handling for division by zero and I am still not convinced that it worth the effort. Anyway...
The following instructions are still emulated by the interpretive:
  • long versions of the division and multiplication instructions (DIVU.L/DIVS.L, MULU.L/MULS.L);
  • compare against bounds instructions (CMP2);
  • all the decimal data handling instructions (ABCD, SBCD, NBCD, PACK, UNPK).
The comparison and the decimal data handling is not that important, but the long division-multiplication are used quite often.
So that is sill a sore point, yet I have decided that I will release the beta without these instructions to get some feedback.
The missing instructions will be implemented soon, probably in the next beta release.

 

What is new in the sources?

 

Fixes and more

Since my last update I have managed to complete some more instructions and lots of bug fixes (thanks to Philippe Ferrucci and Davide Palombo, who insisted to demonstrate the JIT compiling at Alchimie and Pianeta Amiga shows, so I had to fix the most obvious bugs).
I don't want to bore you with the details regarding the current changes, have a look at this update if you are interested.

 

Source alignment

Another important change was: I have merged the final sources for E-UAE 0.8.29 over my changes.
Many thanks to Michael Trebilcock (MickJT) for driving my attention to the fact that I was using an outdated source version (0.8.29-WIP4) instead of the latest from the CVS (dated to 20/08/2008).
For the changes please have a look at these two updates: original source, fix for audio.

 

The LED (round#2)

Philippe Ferrucci pointed out that sometimes it is hard to tell whether the JIT is available and working or not, in spite of the already available JIT LED.
He suggested that the LED might also indicate other states of the JIT compiling, and I had found that a really useful idea.
Now, you can identify three distinct states of the JIT compiling from the LED colors:
1. Blinking green with "JIT" text on it: JIT compiling is active and the compiled code is executed.
The level of green shows you how active is the JIT compiled code compared to the interpretive-executed. (Same as before.)

2. Solid red with "JIT" text on it: the JIT compiling was set up, but the processor cache is turned off by the currently running software in the emulator, JIT compiling is not done while the cache is not turned on.

3. Solid black without "JIT" text on it: due to the emulator configuration the JIT compiling is not available.
Either no code cache was set up or specified processor type does not support the processor cache.



Some help needed

Unfortunately, I have no idea how to compile the sources for other platforms than AmigaOS4. If you feel like you know enough about how to compile these sources please get in touch with me. I am looking for MorphOS, MacOSX PPC, Linux PPC versions especially, but any other supported platforms are welcome. (Thunder? Tobias? Mike? :)

If you feel like there is a bug and you want to report it badly then please DO READ the How to report a bug section from the README before you jump on your mail client. Thanks.

Final words

It is a good start for a year, isn't it? I hope I can keep up and finally you can enjoy the benefits of the JIT compiling on your PowerPC machine.

Monday, November 4, 2013

On a collision course

So, where are we at with the recent update? 349 out of 387: no, it is not 100% yet, but a nice, fat 90.1%!

Quickly, what was done this time and let's jump to conclusions right after that:
  • Implementation of BFINS Dx,Da{y,z}, BFFFO Dx{y,z},Da, BFEXTS Dx{y,z},Da, BFEXTU Dx{y,z},Da, BFCLR Dx{y,z}, BFTST Dx{y,z}, BFSET Dx{y,z}, BFCHG Dx{y,z}, ROXR.x Dy,Dz, ROXR.W mem, ROXR.x #imm,Dy, ROXL.x Dy,Dz, ROXL.W mem, ROXL.x #imm,Dy, ASR.W mem, ASL.W mem, LSR.W mem, LSL.W mem, ROR.W mem, ROL.W mem instructions.
  • Opcode compiler fucntions for different operation sizes are unified, unnecessary helper functions were removed.
  • Macroblock protos are generated from the opcode table file rather than used the manually prepared file.
  • Fixed wrong function name for the CNTLZW PowerPC instruction emitter.
  • Fixed missing input register in C and X flag extraction macroblock which is used in some shift instructions.
  • Fixed register dependency in ASR.x Dy,Dz and LSR.x Dy,Dz instructions.
(Slightly unrelated: I have found a bug in the original E-UAE code for BFINS and flags, if I will have enough patience I will fix that. And another one in Petunia, that will be fixed too...)

Now, we are getting dangerously close to the beta stage. It is really hard not to give you guys any promises what I cannot keep later on... (Summer is coming, you know... ;)

Anyway, my plans for the near future (read: when it is done) are:
  1. After I have finished with the implementation of all instructions which were anticipated earlier I am going to stabilize the emulator a bit and clean up the code if needed.
  2. I will try to release a compiled beta version and put together some documentation for using/testing it.
  3. There are some outstanding issues, one of the most critical one is fixing the problems with the macroblock optimizer.
So much for the crystal ball and now...

Something completely different

I have to admit that my posts were not that interesting to read recently. Earlier I invested more effort into the posts and it was probably more fun to read, especially to the developers.
I would like to bring back that tradition, so for this update I came up with a few interesting thoughts on:

How to avoid the decisions

I know that many developers love to procrastinate anything, including (but not limited to) decisions. But what I am about to write is not how lazy my fellow developers are, but rather how to get around a situation when it is not ideal to do a comparison and branch according to the result.

Why would that be important at all?
There are a number of reasons why it is better to avoid branching, this article lists many examples for that and also explains the reasons. It basically boils down to the following reasons:
In our case none of these reasons apply, but unfortunately branching is not really possible due to the static flow analysis on the macroblock list.

To put it simply: each macroblock is depending on the results of the previously executed macroblocks, if we skip ahead then it is impossible to tell whether those required macroblocks were executed or not before the dependent macroblocks.

Why does this cause any trouble? Because sometimes it is pretty hard to avoid conditions inside the instruction implementations.

I already faced this issue earlier, when I started to work on the conditional branching and setting instructions (Bcc and Scc). There was no possible way to avoid the conditional branching for these instructions due to its very nature of the instructions.
I ended up creating a very specific macroblock which embeds the condition checking and branching, so the "inter-macroblock" flow analysis remains intact.

In the recent update you can find many bit field instructions which were complicated enough already and then the very same issue came up again:

For bit field instructions 0 (zero) bit field width means 32 bit actually. This doesn't sound that bad, however sometimes the bit field width is coming from an emulated data register instead of a statically encoded constant. In this case the decision must be done in the emulated code instead of in compiling time.

Not in every case, but quite often it is possible to calculate the result instead of making a comparison and branching.

For the bit field instructions the solution was (written in C code, just because it is easier to understand):

temp = width - 1;
width = width | (temp & 32);

(There is one condition, though: the width must be between 0 (zero) and 31 before this operation starts.)

Now, think about it a little bit and figure out on your own why does it work. I am not going to explain it. :)

See you soon.

Wednesday, October 2, 2013

Back on track

Finally, I am back on track with the development after the house move. There were two updates since the last post; one was a follow-up to the previous code refactoring; the other was a massive list of improvements:
  • Implementation of NEGX.x mem, NEGX.x Dy, SUBX.x -(Ay),-(Az), ADDX.x -(Ay),-(Az),  MOVE CCR,mem, MOVE CCR,Dx, EOR #imm,CCR, OR #imm,CCR, AND #imm,CCR, RTR, MOVE mem,CCR, MOVE Dx,CCR, MOVE #imm,CCR, ASR.x Dy,Dz, RTD, MULU.W mem,Dx, MULU.W #imm,Dx, MULS.W mem,Dx, MULS.W #imm,Dx, SUBX.x Dy,Dz, ADDX.x Dy,Dz, MOVEM.x regs,mem, MOVEM.x mem,regs, MOVEM.x (Ay)+,regs, CMPM.x (Ax)+,(Ay)+ instructions.
  • Added dependency tracking for non-volatile PowerPC registers.
  • Fixed X flag handling in register-based shifting instructions, previously the X flag was cleared together with the C flag if the shift steps were zero.
  • Removed RTM from the list of the potentially supported opcodes.
  • Added RTR back to the list of the potentially supported opcodes.
  • Optimized temporary register usage in MULU.W Dx,Dy and MULS.W Dx,Dy instructions.
  • Introduced tracking of the extension words after the instructions, it is needed for adjusting the PC before certain addressing modes are processed.
  • Fixed register dependency and order of register storage for MOVEM.x regs,-(Ay) when direct memory access is enabled.
  • Implemented stack-like concept for register saving into the context.
  • Code cleanup: removed unused reference, fixed some warnings regarding misformatted and unused code lines.
So, things are getting together slowly. The number of the implemented instructions went up to 321 out of the planned 387! That is roughly 82%... Getting closer and closer... :)

Recently I faced an interesting problem, I am not quite sure how can I solve: division by zero. My beloved math teacher already told me: who is trying to divide by zero is an idiot. (Well, that is not quite right, as we know it.) Yet, some programs might try it.
Why is that a problem? Because it triggers an exception inside a compiled block. It also needs branching (skip the exception triggering if the divisor is not null, for a change) which contradicts the macroblock register flow tracking. Well, here is the challenge, but I am pretty sure I will solve it somehow.

'Til then the usual: watch this space.

Sunday, August 18, 2013

Moving stuff around

Long time no see! No wonder: we just moved to a new house, which usually means lots of things to deal with and boxes wherever I look. (I hate this soo much! I badly need my daily routine, but how can I find my stuff?!) Finally, we are getting settled again and I was able to dig up the good ol' Amiga again under the pile of boxes. And this new house is just awesome, so it worth all the pain we went through.

Interesting coincidence, but right before we started the whole move house craziness (seems like ages ago, BTW) I decided to refactor some code in E-UAE JIT. Namely the handling of the temporary registers.
It was such a bad design, or rather no design at all: I used the number of the allocated temporary registers to index a couple arrays with the relevant data in it. Can you believe this? In 2013? This was unacceptable even in the '70s. There are so many reasons why you shouldn't do that.
Also what is the point of using a strongly typed language, if we don't use distinct types? In this case all the temporary registers along with the actually mapped PowerPC processor registers are passed to the functions as integers. Very easy to misuse. Bad, bad, bad.

In the recent changes I have managed to introduce the concept of a temporary register-specific type structure, which is also able to carry around the mapped PowerPC processor register number and the register dependency map for the macroblocks.

This new structure can be used when the macroblocks are collected (so mostly in the helper functions), but the code emitters for the macroblocks are dealing with the mapped PowerPC registers only. Now, I still need to solve that one: at the moment these are still passed to the functions as integers.

And what is the visible output for you guys? Nothing, if we are lucky. ;)
Yea, I know. It is always hard to justify the time to the business owners what was spent on solving technical debts. But this was a long outstanding one and I always had the itch of fixing it.

So, enjoy your Summer while it lasts and bear with me.