Saturday, March 15, 2014

PPCJITBETA02 (The Beginning of a Beautiful Friendship)

We have just arrived to another exciting milestone on the long road: all the important instructions for the initial release are implemented under the JIT compiling.*

Lots of bugs were fixed, the emulator is much more stable now than the initial beta release.

Some new features are added too: I have merged the SAM440EP/Flex support (thanks to Soft3) and the CGX overlay for MorphOS (thanks to Thunder and Fab). See configuration documentation regarding how to set the overlay up.

As I already mentioned in the previous post: this beta release was delayed for a couple weeks due to a bug that slipped into the code base long time ago. It was discovered on Mac first, but I was able to reproduce it on MorphOS too. Took me a while to figure out what was going on, but it is fixed now.
This was a very tricky bug, it could be blamed for random crashes and endless loops also, not only on Mac, but on all supported platforms. It was triggered randomly based on the distance between the main application code and the code buffer in memory. (Thanks to Mike for discovering this right before I released the beta.)

I have spent a significant amount of time on figuring out how can I do the build for all supported platforms (AmigaOS4, MorphOS and MacOSX-PPC) using my environments. It wasn't easy, but finally I managed to do most of the release on my own.
As it seems MorphOS SDK does not support G5 yet, so I was not able to do the compiling by myself, but thanks to Fab the G5 executable is also available in the release package.
So, as of now users on all major supported platforms can grab the package and start using the right version.
(Sorry, Linux and BSD folks, you are still on your own.)

 

And the World trembled...

 ...or at least that tiny part which I am involved in when I am wearing my crazy latex suit with a huge letter "A" on my chest for my secret identity: the Amiga Software Developer.

After the first beta release forum posts, emails, news sites, blogs had risen in an enormous unmanageable thunderstorm, struck on me with insane amount of communication. (While the rest of the World barely noticed what have just happened.)
Finally I crawled through messages from every possible (and impossible) source and answered the questions to my best knowledge, accepted the good advices, kindly rejected some nonsense.

 

Aftermath

Since I received tons of feedback (good and bad), I inclined to draw some conclusions from the reaction to the very first beta release. Here is the summary for your benefit:

Some people don't understand how the JIT works and what is the exact purpose of it. All I can say is: please read the documentation... Some other (knowledgeable) folks stood up on the forums and educated the others, well done! I hope this helps, because I really don't have time to deal with it.

Many of the users have irrational expectations for how much the JIT compiling will speed up the emulator. (According to somebody: it supposed to be "ten times faster than the interpretive"... Err... Not likely. How did they come up with any number anyway?)

Well, the implementation is not finished yet, some of you guys don't really understand the concept of "beta release". Okay, I admit I was cheating a bit: technically the JIT compiler wasn't feature-complete when the first beta was done. Yet the remaining pieces were related to not too often used instructions anyway.
For the second beta the instructions are done*, yet there is clearly room for improvement regarding some bugs. Probably as soon as I will be able to fix up the optimization of the register- and flag-flow there will be a significant bump for the speed. (No, not "ten times" fold.)

It is hard to measure how much faster the programs are running and some lovely people baffled on this too. Since there is usually no obvious visual clue for the speedup and a 30%-50% increase in the processor speed is probably hard to notice while your favorite jump-and-run game is jumping and running.
Yet, you can feel that the whole emulation is more snappy than before probably even when you simply run Workbench. Except when it crashes. But even then: it crashes 30%-50% faster! :)

Too bad that some good souls are obsessed with their favourite game/program and keep saying that the JIT is worthless because it doesn't make any difference for that particular piece of software. As it seems this JIT compiling is not for you then.

There was one more interesting thing what I have noticed too late unfortunately: G5 support for MorphOS. Since I don't have a G5 machine I never considered that there is a need for that. But there were some murmur about the speed of the MorphOS version on G5 on some forums. No wonder: it needs a special version, which can be compiled from the sources for some time now. (Thanks to Tobias Netzel and to Fab for the special build.)
Probably the same applies to the PA Semi processor and the Amiga X1000, but I don't have that one either. (Donations? :)
Also the mysterious support for SAM440EP/Flex, what I have never heard of before. No wonder it was missed.

Fun fact from the Outer World: I tried to explain to my colleagues how I spent my Summer vacation. However, I am significantly older than almost any of them, so they were looking at me with confusion in their eyes mixed with a little pity. "Yea, my father loves fooling around with those old machines too!" - was one of the comments. Well put, Sir, well put.

 

Anyway...

To make you (some other geeks around the World) happy: here is the new beta...


In case you stop reading here (or you already skipped the first cheesy part):
as always, please read the README for your comfort and safety. Thanks.

Since I bought an iBook for 50 NZD, now I can produce the MorphOS and the MacOSX versions too which were also included in this release together with the AmigaOS4 version. (And by buying a Mac I broke one of my principles: no Apple product crosses the door of my house. I hope you guys are content what you were doing...)

You can find the changes since the last beta in the README file, or in the changesets at the SourceForge repository from R67 down to R53.

 

Fragmentation

I must admit I have learned a lot in the past month about the sorry state of the E-UAE project. I didn't know what is the current situation of the various binary releases until I received some references to modified AmigaOS4 and MorphOS binary versions.
I guess this is the destiny of any abandoned open source project: lots of good people is trying to improve it, but nobody is standing up and takes over the maintenance of the project.

Well, I am of the same kind, as it seems. It was never my goal to take the ownership of the E-UAE project or fork it into a new iteration.
However, as soon as I released the first beta of the JIT compiled version the watching eye of the public turned to my little scared pet project and I received lot of questions about whether this-or-that particular fix from various developers were included or not. (Mostly not.)

To satisfy at least some part of the user base I tried to gather the various fixes from every corner of the Internet and applied them on the source code. This means no way new base source repository for the E-UAE project, but at least it will help whoever wants to grab the torch and probably it will be useful for you, dear user in the meanwhile in the form of the beta releases.

 

Progress indicator

As of now I switch from batch release strategy to immediate update. This means: I will commit each change one by one to the SourceForge repository as soon as the change is ready instead of buffering up lots of changes locally and commit them in a big changeset.
So, if you look for the repository changesets and the tickets then you can watch the progress of the project closely.

I also make use of the tickets in the completion of the various fixes and tasks:

https://sourceforge.net/p/euaeppcjit/tickets

I added milestones to the tickets, so you can get a feeling of the upcoming beta and the included changes, fixes.
Open tickets are defining the majority of the outstanding work. I am currently working on the accepted ticket, while pending tickets are already committed to the repository, but not released in binary form yet. Released tickets are the closed ones.

For PPCJITBET03 you can find the planned changes here:

https://sourceforge.net/p/euaeppcjit/tickets/milestone/PPCJITBETA03

There is also a milestone named "PARKED" which is a holding box for the various bugs and problems that are not considered for this project (yet).

 

Thanks

Finally, big thanks goes to: Thunder, kas1e, MickJT, Fab, Tobias Netzel and Mike Blackburn for helping me with lots of things regarding bug finding, fixing, platform support and constantly watching out for the updates on the repository.

I am still waiting for any (detailed) bug reports, just have a good read of the README file before you jump to your email client.

Footnote
*There is a fine print here: I was struggling with CMP2 instruction and finally I gave up after a couple days. The binary code for the instruction is bundled with CHK2 and I couldn't figure out how solve the exception handling for that. So, this instruction remains unimplemented for now, not a big deal luckily.

Sunday, February 23, 2014

Stepping into the fourth year

And here we are again: the development of the PowerPC JIT compiling for E-UAE
passed the third year, stepping into the fourth.

Original image by OpenClipart


But what a year! Yes, finally we are getting very close to the Real Thing(tm). Even a beta version is available from the project, although probably it is not particularly useful to anybody who is not the adventurous type. There are many bugs to chase and also there is room to improve.

I know everybody is eagerly waiting for the second beta. It was prepared more than two weeks ago, the sources are released to SourceForge, the post is ready to launch...

But then suddenly Mike Blackburn came back to me (big thanks for that) and mentioned a show-stopper bug on Mac. In the meanwhile I confirmed that the same bug does exist on MorphOS too, so probably it is just a coincidence that the emulation works on AmigaOS4.

Right now I am trying to figure out what is going on, but for that I needed access to a Mac. So, it took me a while to set up everything and getting ready to debug this. After spending a couple days on chasing my own tail already I am no closer to the solution yet, but I promise I put all my free time into this.

Please relax and enjoy the beautiful Summer at the Southern Hemisphere and try not to freeze to death in the Winter at the North.

Saturday, January 4, 2014

Wednesday, January 1, 2014

PPCJITBETA01 (Happy New Year 2014)

First of all:


I wish all of you guys an awesome happy new year for 2014!

We made it this far, there was no nuclear holocaust yet, which is an amazing achievement for the human race considering our lovely nature. Well done.

And now something completely different, but almost equally exciting (probably for much less human beings, can't speak for aliens)...

I woke up in the morning and had a look at the clock and it showed me:


BETA TIME!

Yes, it is unbelievable, I know. After this long-long waiting finally here it comes.
And it even comes with bugs! Lots of it!

Now seriously, if you are interested in testing the PowerPC JIT then go and get your binary release from the SourceForge page:


Tiny catch: at the moment only the AmigaOS4 version is available. See below.

You might also need a previous distribution which includes all the tools, like transrom, mousehack, make-hdf, etc. I tried to compile these, but somehow the cross-compiling failed and I ran out of patience.

What is included in beta01?

Before you fire up your favorite Amiga software in the emulator, please DO READ the README file! It will save you (and me) lots of wasted time, I guarantee.

The very first thing I have to mention is that although I had done my best I was not able to finish implementing all the planned instructions. We are very close, according to the statistics 95.09% of the instructions (368 out of 387) are done.
The reason is quite simple: the recently implemented instructions were the most complex ones. Well, you know this is exactly why you must not procrastinate the hardest part of the job just because it makes you nervous even simply thinking about it...
I have spent days on implementing BFINS and the exception handling for division by zero and I am still not convinced that it worth the effort. Anyway...
The following instructions are still emulated by the interpretive:
  • long versions of the division and multiplication instructions (DIVU.L/DIVS.L, MULU.L/MULS.L);
  • compare against bounds instructions (CMP2);
  • all the decimal data handling instructions (ABCD, SBCD, NBCD, PACK, UNPK).
The comparison and the decimal data handling is not that important, but the long division-multiplication are used quite often.
So that is sill a sore point, yet I have decided that I will release the beta without these instructions to get some feedback.
The missing instructions will be implemented soon, probably in the next beta release.

 

What is new in the sources?

 

Fixes and more

Since my last update I have managed to complete some more instructions and lots of bug fixes (thanks to Philippe Ferrucci and Davide Palombo, who insisted to demonstrate the JIT compiling at Alchimie and Pianeta Amiga shows, so I had to fix the most obvious bugs).
I don't want to bore you with the details regarding the current changes, have a look at this update if you are interested.

 

Source alignment

Another important change was: I have merged the final sources for E-UAE 0.8.29 over my changes.
Many thanks to Michael Trebilcock (MickJT) for driving my attention to the fact that I was using an outdated source version (0.8.29-WIP4) instead of the latest from the CVS (dated to 20/08/2008).
For the changes please have a look at these two updates: original source, fix for audio.

 

The LED (round#2)

Philippe Ferrucci pointed out that sometimes it is hard to tell whether the JIT is available and working or not, in spite of the already available JIT LED.
He suggested that the LED might also indicate other states of the JIT compiling, and I had found that a really useful idea.
Now, you can identify three distinct states of the JIT compiling from the LED colors:
1. Blinking green with "JIT" text on it: JIT compiling is active and the compiled code is executed.
The level of green shows you how active is the JIT compiled code compared to the interpretive-executed. (Same as before.)

2. Solid red with "JIT" text on it: the JIT compiling was set up, but the processor cache is turned off by the currently running software in the emulator, JIT compiling is not done while the cache is not turned on.

3. Solid black without "JIT" text on it: due to the emulator configuration the JIT compiling is not available.
Either no code cache was set up or specified processor type does not support the processor cache.



Some help needed

Unfortunately, I have no idea how to compile the sources for other platforms than AmigaOS4. If you feel like you know enough about how to compile these sources please get in touch with me. I am looking for MorphOS, MacOSX PPC, Linux PPC versions especially, but any other supported platforms are welcome. (Thunder? Tobias? Mike? :)

If you feel like there is a bug and you want to report it badly then please DO READ the How to report a bug section from the README before you jump on your mail client. Thanks.

Final words

It is a good start for a year, isn't it? I hope I can keep up and finally you can enjoy the benefits of the JIT compiling on your PowerPC machine.

Wednesday, December 25, 2013

Happy Holidays!

I wish Happy Holidays to all the Amigans around the globe!


Are you still hoping for a late Santa? Wink, wink... ;)

Monday, November 4, 2013

On a collision course

So, where are we at with the recent update? 349 out of 387: no, it is not 100% yet, but a nice, fat 90.1%!

Quickly, what was done this time and let's jump to conclusions right after that:
  • Implementation of BFINS Dx,Da{y,z}, BFFFO Dx{y,z},Da, BFEXTS Dx{y,z},Da, BFEXTU Dx{y,z},Da, BFCLR Dx{y,z}, BFTST Dx{y,z}, BFSET Dx{y,z}, BFCHG Dx{y,z}, ROXR.x Dy,Dz, ROXR.W mem, ROXR.x #imm,Dy, ROXL.x Dy,Dz, ROXL.W mem, ROXL.x #imm,Dy, ASR.W mem, ASL.W mem, LSR.W mem, LSL.W mem, ROR.W mem, ROL.W mem instructions.
  • Opcode compiler fucntions for different operation sizes are unified, unnecessary helper functions were removed.
  • Macroblock protos are generated from the opcode table file rather than used the manually prepared file.
  • Fixed wrong function name for the CNTLZW PowerPC instruction emitter.
  • Fixed missing input register in C and X flag extraction macroblock which is used in some shift instructions.
  • Fixed register dependency in ASR.x Dy,Dz and LSR.x Dy,Dz instructions.
(Slightly unrelated: I have found a bug in the original E-UAE code for BFINS and flags, if I will have enough patience I will fix that. And another one in Petunia, that will be fixed too...)

Now, we are getting dangerously close to the beta stage. It is really hard not to give you guys any promises what I cannot keep later on... (Summer is coming, you know... ;)

Anyway, my plans for the near future (read: when it is done) are:
  1. After I have finished with the implementation of all instructions which were anticipated earlier I am going to stabilize the emulator a bit and clean up the code if needed.
  2. I will try to release a compiled beta version and put together some documentation for using/testing it.
  3. There are some outstanding issues, one of the most critical one is fixing the problems with the macroblock optimizer.
So much for the crystal ball and now...

Something completely different

I have to admit that my posts were not that interesting to read recently. Earlier I invested more effort into the posts and it was probably more fun to read, especially to the developers.
I would like to bring back that tradition, so for this update I came up with a few interesting thoughts on:

How to avoid the decisions

I know that many developers love to procrastinate anything, including (but not limited to) decisions. But what I am about to write is not how lazy my fellow developers are, but rather how to get around a situation when it is not ideal to do a comparison and branch according to the result.

Why would that be important at all?
There are a number of reasons why it is better to avoid branching, this article lists many examples for that and also explains the reasons. It basically boils down to the following reasons:
In our case none of these reasons apply, but unfortunately branching is not really possible due to the static flow analysis on the macroblock list.

To put it simply: each macroblock is depending on the results of the previously executed macroblocks, if we skip ahead then it is impossible to tell whether those required macroblocks were executed or not before the dependent macroblocks.

Why does this cause any trouble? Because sometimes it is pretty hard to avoid conditions inside the instruction implementations.

I already faced this issue earlier, when I started to work on the conditional branching and setting instructions (Bcc and Scc). There was no possible way to avoid the conditional branching for these instructions due to its very nature of the instructions.
I ended up creating a very specific macroblock which embeds the condition checking and branching, so the "inter-macroblock" flow analysis remains intact.

In the recent update you can find many bit field instructions which were complicated enough already and then the very same issue came up again:

For bit field instructions 0 (zero) bit field width means 32 bit actually. This doesn't sound that bad, however sometimes the bit field width is coming from an emulated data register instead of a statically encoded constant. In this case the decision must be done in the emulated code instead of in compiling time.

Not in every case, but quite often it is possible to calculate the result instead of making a comparison and branching.

For the bit field instructions the solution was (written in C code, just because it is easier to understand):

temp = width - 1;
width = width | (temp & 32);

(There is one condition, though: the width must be between 0 (zero) and 31 before this operation starts.)

Now, think about it a little bit and figure out on your own why does it work. I am not going to explain it. :)

See you soon.

Wednesday, October 2, 2013

Back on track

Finally, I am back on track with the development after the house move. There were two updates since the last post; one was a follow-up to the previous code refactoring; the other was a massive list of improvements:
  • Implementation of NEGX.x mem, NEGX.x Dy, SUBX.x -(Ay),-(Az), ADDX.x -(Ay),-(Az),  MOVE CCR,mem, MOVE CCR,Dx, EOR #imm,CCR, OR #imm,CCR, AND #imm,CCR, RTR, MOVE mem,CCR, MOVE Dx,CCR, MOVE #imm,CCR, ASR.x Dy,Dz, RTD, MULU.W mem,Dx, MULU.W #imm,Dx, MULS.W mem,Dx, MULS.W #imm,Dx, SUBX.x Dy,Dz, ADDX.x Dy,Dz, MOVEM.x regs,mem, MOVEM.x mem,regs, MOVEM.x (Ay)+,regs, CMPM.x (Ax)+,(Ay)+ instructions.
  • Added dependency tracking for non-volatile PowerPC registers.
  • Fixed X flag handling in register-based shifting instructions, previously the X flag was cleared together with the C flag if the shift steps were zero.
  • Removed RTM from the list of the potentially supported opcodes.
  • Added RTR back to the list of the potentially supported opcodes.
  • Optimized temporary register usage in MULU.W Dx,Dy and MULS.W Dx,Dy instructions.
  • Introduced tracking of the extension words after the instructions, it is needed for adjusting the PC before certain addressing modes are processed.
  • Fixed register dependency and order of register storage for MOVEM.x regs,-(Ay) when direct memory access is enabled.
  • Implemented stack-like concept for register saving into the context.
  • Code cleanup: removed unused reference, fixed some warnings regarding misformatted and unused code lines.
So, things are getting together slowly. The number of the implemented instructions went up to 321 out of the planned 387! That is roughly 82%... Getting closer and closer... :)

Recently I faced an interesting problem, I am not quite sure how can I solve: division by zero. My beloved math teacher already told me: who is trying to divide by zero is an idiot. (Well, that is not quite right, as we know it.) Yet, some programs might try it.
Why is that a problem? Because it triggers an exception inside a compiled block. It also needs branching (skip the exception triggering if the divisor is not null, for a change) which contradicts the macroblock register flow tracking. Well, here is the challenge, but I am pretty sure I will solve it somehow.

'Til then the usual: watch this space.