This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Showing posts with label bugs. Show all posts
Showing posts with label bugs. Show all posts

Monday, November 3, 2014

PPCJITBETA05 (Final Countdown)

Yes, yes, my dear friends: we are very close now! This is the final countdown indeed: the last beta before the final release of 1.0.

Are you excited? I bet you are. In the meanwhile download and enjoy the new beta:


To sum up what you will get with the new beta: mainly bugfixes. All the features are locked down for the release, the ticket list for release 1.0 is practically empty. I am waiting for any bugs you find before the release.

So, please do report bugs you find. But again: it is very important to try to verify the issue before you decide about reporting it. Please follow the steps which are documented in the README file.

Your efforts are greatly appreciated.

JIT compatibility diagnostics

I have rejected a few bugs due to the fact that these programs are not compatible with the current JIT compiling implementation. The reason is very simple: if the program is trying to modify itself without flushing the instruction cache properly then the modified code won't be recompiled and the program will misbehave. Like this program: Where Time Stood Still.

These programs might work on a real processor and still fail with the JIT compiling because the cache handling is not emulated exactly the same how it would behave on a real processor. (Namely: the number of cache lines are much larger than on a real processor, so more code is "cached".)

Although this is not ideal, but for now only a handful of programs are depending on the cache size, so it won't cause too much trouble.

How could you tell that the program is compatible with the JIT compiler?

That is a very valid question. And here is the answer: there is a option for that! (At least there is now.)
I have wasted so much time on chasing errors coming from this issue that finally I have decided I implement a diagnostic configuration option for it. It is called:
comp_test_consistency

Usual disclaimer: read the documentation and if you didn't understand what does it do then don't turn it on.

Actually, it is pretty simple: in addition to every compiled block of instructions the compiler also compiles a check which compares the original content of the memory which was used for the compiling to the current content. If it doesn't match then the emulation stops.

Basically, it can be used for verifying if the program misbehaves because it does not flush the instruction cache properly, or there is some other reason.

It is safe to keep it turned on, but it slows down the emulation (sometimes considerably), so use it only when it is needed.

X1000

As I already announced in the previous post (couple months ago): the Amiga X1000 optimized version is available in the package. Please use only that version on Amiga X1000, the generic build won't work properly.

060

Due to the "popular demand" (more than one request ;)  I have fixed the cache handling of the emulated 68060 processor type.
Please note: it is stated in the documentation from the original E-UAE that the 060 support is not implemented and all I had done was: fixed the cache bits to let the emulator turn on the caches, so the JIT can be activated. However, it is highly likely that there will be problems with some programs running on 060, since other important aspects of the 060 is not implemented (like proper stackframe).
So, watch your steps while using it. (And please don't report bugs for it. kthxbai)

RuninUAE

ChrisH kindly implemented the support for the betas in RuninUAE. You can turn on the JIT emulation from the menu now. (You guys are too spoiled!) Thanks Chris!

WinUAE PPC

In case you haven't heard: WinUAE is capable of emulating PowerPC hardware through QEmu on intel compatible processors and run PPC apps and even AmigaOS4.

The obvious question: how much better would it be running PowerPC programs on actual PowerPC processor under hardware emulation? :)
Probably it would be possible to make use of the native PowerPC processor and it would be fast. Very fast. 
Not to mention it would open the door for AmigaOS4 running on Macintosh iBooks for example.

Do I have plans implementing it? No, sorry. 
Thanks for asking.

Thanks

Finally, some thank-you's to the lovely people who helped me in this beta.

Big thanks goes to: MickJT, Mike Blackburn, Chris Handley, Luigi Burdo, Samir Hawamdeh, Raziel and Cass.

Some guys just can't stay away as it seems. :)

See you soon.

Sunday, July 20, 2014

PPCJITBETA04 (FourFOurfOurfoUR!)


Four is a nice, round number. Power of 2, not too many, not too less. You know, four is referring to many good things, like: the Fantastic Four, 4th of July, AmigaOS4, The Magnificent Four... Err.. Maybe not that, scratch that last one.

Therefore, without further ado, here is Beta #4:


I had an irresistible urge to RickRoll you guys with the link, but that video has been blocked recently on YouTube in many countries, so maybe next time.

Tickets please

I had my sweet time with a very weird bug related to Quake, which is not resolved yet and probably related to the failure to implement the soft cache flushing. These two tickets are pushed back to Beta #5 for now. Previously I had no intention to do one more beta release before the first Release Candidate, but as it seems I need one more round of testing period.

I had to shuffle around some tickets while I was rethinking the upcoming Release Candidate. You know, changing priorities, agile development, whatnot. What is listed in the milestones now is the plan, although it is not set in (mile)stone... Heheh... (Huh, that was a really lame pun. You should do better than that!)

The idea is: I am going to fix every issue which is known and give you guys some time to test before the final release (candidate).

SAM440

After a few rounds of pushing and pulling some SAM440/Flex related codes hopefully we have sorted out all the various problems related to those machines. If you were still experiencing issues then please let me know.

G5 again

Thanks to Tobias Netzel, the flag extraction on G5 is fixed, this will resolve a number of problems with various programs.

The compiling for G5 is still not resolved for MorphOS, no G5-optimized binary again, sorry guys. If you could tell me how can I (easily) compile the files for G5 on my iBook then I give it a go, I promise.

What's next

As you can see: Beta #5 is coming, there are already a handful things lined up for it.

Please do test Beta #4 and please do report bugs you have found. It is important to sort out as many problems as I could. It is equally important to help me reproducing the bug. So, please read the instructions. Thanks a bunch!

Best Boys

I would like to give a big thanks to Luigi Burdo. He helped me with a great deal of things, reported lots of bugs and he is so enthusiastic that he inspired me to keep walking on the road. Thanks a lot, Luigi! Keep up the good spirit!

And, of course, my dearest sidekicks, who just couldn't stay away from the project:
MickJT and Tobias Netzel. Cheers!

Also would like to thank the helps, bug reports and overall support to:
Samir Hawamdeh, Kicko, Chris Handley, Allan Ullmann.

Friday, May 30, 2014

Here is your Captain speaking

Ladies and Gentlemen, I would like to have a word with you about bug reporting...

Please do not report E-UAE JIT compiling related bugs to:
  • a forum at your favorite portal (because they are going to give you advices, unless it is a cooking portal, but they won't fix it anyway);
  • your "friends" on Facebook as a status post (because your hot ex-classmate doesn't care, not to mention that she put you to the acquaintances list for a long time and I am prettty sure she won't fix it anyway);
  • Runinuae author (because although Chris is a good guy, almost certainly he won't fix it anyway - hey Chris!);
  • your fellow Amiga-enthusiasts at the Club (because they might listen to your theory on what is the root cause of the bug, but they won't fix it anyway);
  • your neighbor's cat (because the poor thing doesn't want to hear anything about "fixing", I guarantee).
Why? Quite simple: if you try to report bugs anywhere else than my mailbox there is zero guarantee that your bug report ever lands on my computer.

You know, there is some chance that I wake up one day and realize:

Ah, some Amiga-fan while playing Superfrog on level 6 encountered a graphical glitch which is caused by a mis-used flag dependency in the JIT compiler optimization, so I must fix it. I need coffee!

Yes, there is some chance, very-very-very low chance. (As opposed to there is high chance that I wake up one day and realize: Hrrgrhh... I need coffee!)

Golden Rule

If you want to get the bug fixed then report it to me, preferably by e-mail.

Or you can create a ticket at the projects SourceForge page.

(If you really-really must then you can use the contact box on this blog at the right to the post.)

One more tiny thing

Please do your due-diligence before reporting, if I might ask you to.

What would I like to ask from you is summarized in the README file (yea, like anybody would read a README file). Here is a link to the current version on SourceForge code repository.

You can also find my e-mail address at the end of that file.

Thank you for your attention! Now go back sipping your cocktail, the dinner will be served at 6pm. Wearing Boing Ball pin is mandatory.

Wednesday, May 21, 2014

PPCJITBETA03 (Switch to Ludicrous Speed)

Welcome back, long time no see, my friend. Please be seated.

I can tell you I have some good news to you again: here comes Beta #3!

Get it here if you must:


I held this release back for a while just to fix the emulated cache checksum feature, but I have been chasing a bug for two weeks without any success. So, that fix is postponed to the next beta, in the meanwhile you can enjoy some significant speed enhancement and increased stability.

Without going into the details regarding the changes (see the included README for all changes) I would like to mention the most important change:

Vroom-vroom

The major feature of this beta release is the register and flag optimization fix. You can turn it on in the configuration, just set comp_optimize to true.

If you are interested in the details I explained it already how the optimization works in an earlier post, but in case you are too lazy to read through that post: here is my old diagram (just because it is beautiful, you know):

Code translation flow diagram

Let me summarize it for you: the JIT compiler is collecting information about data-flow dependencies between the various macroblocks and tries to remove the ones which won't have any effect on the outcome of a certain block of macroblocks.
This is not a new feature in the JIT implementation, but previously a few (tons of) bugs prevented it from working on more complex codes than my Mandelbrot test.

In this release I have fixed every issue I have found so far with the optimization and it seems working quite nicely. You can boot the AmigaOS and it runs just fine, also games and demos will benefit from this feature too.
I was planing to do a comparison video where the speedup is clearly shown, but I haven't had too much time yet, so this is your job now, dear EUAEPPCJIT fans! Just post the links to the videos into the comments here. :)

PPC970 aka G5

Not everything is sunshine and happiness, though. Supporting G5 processor architecture target turned to be much more complicated than I thought, especially because I don't have any hardware to test on.

In the previous release the MacOSX G5 binary was not working properly on G5 (neither on any other PowerPC as matter of fact). Thanks to Luigi Burdo for the report and Tobias Netzel (again) for the help with the compiler. This is fixed in this release, hopefully. (Fingers crossed, I still don't have hardware to test on.)

While the situation with the MorphOS G5 version is not that hunky-dory: as it seems there is no official compiler with G5 support yet in the MorphOS SDK and it is rather complicated and unreliable to compile any source for that processor. Until this situation is improved the G5 version for MorphOS won't be available from the beta binaries.

However, nothing stop you from compiling your own version from the sources, as these are always available at SourceForge.

Upcoming

As I mentioned: I postponed the fix for the block checksum to the next release and also picked up some things to do. You can find the planned list here:


I also had a look on what is planned for the first stable release and moved some items around the various milestones. If you are curious just click at the milestones on the Sourceforge page.

Saturday, March 15, 2014

PPCJITBETA02 (The Beginning of a Beautiful Friendship)

We have just arrived to another exciting milestone on the long road: all the important instructions for the initial release are implemented under the JIT compiling.*

Lots of bugs were fixed, the emulator is much more stable now than the initial beta release.

Some new features are added too: I have merged the SAM440EP/Flex support (thanks to Soft3) and the CGX overlay for MorphOS (thanks to Thunder and Fab). See configuration documentation regarding how to set the overlay up.

As I already mentioned in the previous post: this beta release was delayed for a couple weeks due to a bug that slipped into the code base long time ago. It was discovered on Mac first, but I was able to reproduce it on MorphOS too. Took me a while to figure out what was going on, but it is fixed now.
This was a very tricky bug, it could be blamed for random crashes and endless loops also, not only on Mac, but on all supported platforms. It was triggered randomly based on the distance between the main application code and the code buffer in memory. (Thanks to Mike for discovering this right before I released the beta.)

I have spent a significant amount of time on figuring out how can I do the build for all supported platforms (AmigaOS4, MorphOS and MacOSX-PPC) using my environments. It wasn't easy, but finally I managed to do most of the release on my own.
As it seems MorphOS SDK does not support G5 yet, so I was not able to do the compiling by myself, but thanks to Fab the G5 executable is also available in the release package.
So, as of now users on all major supported platforms can grab the package and start using the right version.
(Sorry, Linux and BSD folks, you are still on your own.)

 

And the World trembled...

 ...or at least that tiny part which I am involved in when I am wearing my crazy latex suit with a huge letter "A" on my chest for my secret identity: the Amiga Software Developer.

After the first beta release forum posts, emails, news sites, blogs had risen in an enormous unmanageable thunderstorm, struck on me with insane amount of communication. (While the rest of the World barely noticed what have just happened.)
Finally I crawled through messages from every possible (and impossible) source and answered the questions to my best knowledge, accepted the good advices, kindly rejected some nonsense.

 

Aftermath

Since I received tons of feedback (good and bad), I inclined to draw some conclusions from the reaction to the very first beta release. Here is the summary for your benefit:

Some people don't understand how the JIT works and what is the exact purpose of it. All I can say is: please read the documentation... Some other (knowledgeable) folks stood up on the forums and educated the others, well done! I hope this helps, because I really don't have time to deal with it.

Many of the users have irrational expectations for how much the JIT compiling will speed up the emulator. (According to somebody: it supposed to be "ten times faster than the interpretive"... Err... Not likely. How did they come up with any number anyway?)

Well, the implementation is not finished yet, some of you guys don't really understand the concept of "beta release". Okay, I admit I was cheating a bit: technically the JIT compiler wasn't feature-complete when the first beta was done. Yet the remaining pieces were related to not too often used instructions anyway.
For the second beta the instructions are done*, yet there is clearly room for improvement regarding some bugs. Probably as soon as I will be able to fix up the optimization of the register- and flag-flow there will be a significant bump for the speed. (No, not "ten times" fold.)

It is hard to measure how much faster the programs are running and some lovely people baffled on this too. Since there is usually no obvious visual clue for the speedup and a 30%-50% increase in the processor speed is probably hard to notice while your favorite jump-and-run game is jumping and running.
Yet, you can feel that the whole emulation is more snappy than before probably even when you simply run Workbench. Except when it crashes. But even then: it crashes 30%-50% faster! :)

Too bad that some good souls are obsessed with their favourite game/program and keep saying that the JIT is worthless because it doesn't make any difference for that particular piece of software. As it seems this JIT compiling is not for you then.

There was one more interesting thing what I have noticed too late unfortunately: G5 support for MorphOS. Since I don't have a G5 machine I never considered that there is a need for that. But there were some murmur about the speed of the MorphOS version on G5 on some forums. No wonder: it needs a special version, which can be compiled from the sources for some time now. (Thanks to Tobias Netzel and to Fab for the special build.)
Probably the same applies to the PA Semi processor and the Amiga X1000, but I don't have that one either. (Donations? :)
Also the mysterious support for SAM440EP/Flex, what I have never heard of before. No wonder it was missed.

Fun fact from the Outer World: I tried to explain to my colleagues how I spent my Summer vacation. However, I am significantly older than almost any of them, so they were looking at me with confusion in their eyes mixed with a little pity. "Yea, my father loves fooling around with those old machines too!" - was one of the comments. Well put, Sir, well put.

 

Anyway...

To make you (some other geeks around the World) happy: here is the new beta...


In case you stop reading here (or you already skipped the first cheesy part):
as always, please read the README for your comfort and safety. Thanks.

Since I bought an iBook for 50 NZD, now I can produce the MorphOS and the MacOSX versions too which were also included in this release together with the AmigaOS4 version. (And by buying a Mac I broke one of my principles: no Apple product crosses the door of my house. I hope you guys are content what you were doing...)

You can find the changes since the last beta in the README file, or in the changesets at the SourceForge repository from R67 down to R53.

 

Fragmentation

I must admit I have learned a lot in the past month about the sorry state of the E-UAE project. I didn't know what is the current situation of the various binary releases until I received some references to modified AmigaOS4 and MorphOS binary versions.
I guess this is the destiny of any abandoned open source project: lots of good people is trying to improve it, but nobody is standing up and takes over the maintenance of the project.

Well, I am of the same kind, as it seems. It was never my goal to take the ownership of the E-UAE project or fork it into a new iteration.
However, as soon as I released the first beta of the JIT compiled version the watching eye of the public turned to my little scared pet project and I received lot of questions about whether this-or-that particular fix from various developers were included or not. (Mostly not.)

To satisfy at least some part of the user base I tried to gather the various fixes from every corner of the Internet and applied them on the source code. This means no way new base source repository for the E-UAE project, but at least it will help whoever wants to grab the torch and probably it will be useful for you, dear user in the meanwhile in the form of the beta releases.

 

Progress indicator

As of now I switch from batch release strategy to immediate update. This means: I will commit each change one by one to the SourceForge repository as soon as the change is ready instead of buffering up lots of changes locally and commit them in a big changeset.
So, if you look for the repository changesets and the tickets then you can watch the progress of the project closely.

I also make use of the tickets in the completion of the various fixes and tasks:

https://sourceforge.net/p/euaeppcjit/tickets

I added milestones to the tickets, so you can get a feeling of the upcoming beta and the included changes, fixes.
Open tickets are defining the majority of the outstanding work. I am currently working on the accepted ticket, while pending tickets are already committed to the repository, but not released in binary form yet. Released tickets are the closed ones.

For PPCJITBET03 you can find the planned changes here:

https://sourceforge.net/p/euaeppcjit/tickets/milestone/PPCJITBETA03

There is also a milestone named "PARKED" which is a holding box for the various bugs and problems that are not considered for this project (yet).

 

Thanks

Finally, big thanks goes to: Thunder, kas1e, MickJT, Fab, Tobias Netzel and Mike Blackburn for helping me with lots of things regarding bug finding, fixing, platform support and constantly watching out for the updates on the repository.

I am still waiting for any (detailed) bug reports, just have a good read of the README file before you jump to your email client.

Footnote
*There is a fine print here: I was struggling with CMP2 instruction and finally I gave up after a couple days. The binary code for the instruction is bundled with CHK2 and I couldn't figure out how solve the exception handling for that. So, this instruction remains unimplemented for now, not a big deal luckily.

Sunday, February 23, 2014

Stepping into the fourth year

And here we are again: the development of the PowerPC JIT compiling for E-UAE
passed the third year, stepping into the fourth.

Original image by OpenClipart


But what a year! Yes, finally we are getting very close to the Real Thing(tm). Even a beta version is available from the project, although probably it is not particularly useful to anybody who is not the adventurous type. There are many bugs to chase and also there is room to improve.

I know everybody is eagerly waiting for the second beta. It was prepared more than two weeks ago, the sources are released to SourceForge, the post is ready to launch...

But then suddenly Mike Blackburn came back to me (big thanks for that) and mentioned a show-stopper bug on Mac. In the meanwhile I confirmed that the same bug does exist on MorphOS too, so probably it is just a coincidence that the emulation works on AmigaOS4.

Right now I am trying to figure out what is going on, but for that I needed access to a Mac. So, it took me a while to set up everything and getting ready to debug this. After spending a couple days on chasing my own tail already I am no closer to the solution yet, but I promise I put all my free time into this.

Please relax and enjoy the beautiful Summer at the Southern Hemisphere and try not to freeze to death in the Winter at the North.

Wednesday, January 1, 2014

PPCJITBETA01 (Happy New Year 2014)

First of all:


I wish all of you guys an awesome happy new year for 2014!

We made it this far, there was no nuclear holocaust yet, which is an amazing achievement for the human race considering our lovely nature. Well done.

And now something completely different, but almost equally exciting (probably for much less human beings, can't speak for aliens)...

I woke up in the morning and had a look at the clock and it showed me:


BETA TIME!

Yes, it is unbelievable, I know. After this long-long waiting finally here it comes.
And it even comes with bugs! Lots of it!

Now seriously, if you are interested in testing the PowerPC JIT then go and get your binary release from the SourceForge page:


Tiny catch: at the moment only the AmigaOS4 version is available. See below.

You might also need a previous distribution which includes all the tools, like transrom, mousehack, make-hdf, etc. I tried to compile these, but somehow the cross-compiling failed and I ran out of patience.

What is included in beta01?

Before you fire up your favorite Amiga software in the emulator, please DO READ the README file! It will save you (and me) lots of wasted time, I guarantee.

The very first thing I have to mention is that although I had done my best I was not able to finish implementing all the planned instructions. We are very close, according to the statistics 95.09% of the instructions (368 out of 387) are done.
The reason is quite simple: the recently implemented instructions were the most complex ones. Well, you know this is exactly why you must not procrastinate the hardest part of the job just because it makes you nervous even simply thinking about it...
I have spent days on implementing BFINS and the exception handling for division by zero and I am still not convinced that it worth the effort. Anyway...
The following instructions are still emulated by the interpretive:
  • long versions of the division and multiplication instructions (DIVU.L/DIVS.L, MULU.L/MULS.L);
  • compare against bounds instructions (CMP2);
  • all the decimal data handling instructions (ABCD, SBCD, NBCD, PACK, UNPK).
The comparison and the decimal data handling is not that important, but the long division-multiplication are used quite often.
So that is sill a sore point, yet I have decided that I will release the beta without these instructions to get some feedback.
The missing instructions will be implemented soon, probably in the next beta release.

 

What is new in the sources?

 

Fixes and more

Since my last update I have managed to complete some more instructions and lots of bug fixes (thanks to Philippe Ferrucci and Davide Palombo, who insisted to demonstrate the JIT compiling at Alchimie and Pianeta Amiga shows, so I had to fix the most obvious bugs).
I don't want to bore you with the details regarding the current changes, have a look at this update if you are interested.

 

Source alignment

Another important change was: I have merged the final sources for E-UAE 0.8.29 over my changes.
Many thanks to Michael Trebilcock (MickJT) for driving my attention to the fact that I was using an outdated source version (0.8.29-WIP4) instead of the latest from the CVS (dated to 20/08/2008).
For the changes please have a look at these two updates: original source, fix for audio.

 

The LED (round#2)

Philippe Ferrucci pointed out that sometimes it is hard to tell whether the JIT is available and working or not, in spite of the already available JIT LED.
He suggested that the LED might also indicate other states of the JIT compiling, and I had found that a really useful idea.
Now, you can identify three distinct states of the JIT compiling from the LED colors:
1. Blinking green with "JIT" text on it: JIT compiling is active and the compiled code is executed.
The level of green shows you how active is the JIT compiled code compared to the interpretive-executed. (Same as before.)

2. Solid red with "JIT" text on it: the JIT compiling was set up, but the processor cache is turned off by the currently running software in the emulator, JIT compiling is not done while the cache is not turned on.

3. Solid black without "JIT" text on it: due to the emulator configuration the JIT compiling is not available.
Either no code cache was set up or specified processor type does not support the processor cache.



Some help needed

Unfortunately, I have no idea how to compile the sources for other platforms than AmigaOS4. If you feel like you know enough about how to compile these sources please get in touch with me. I am looking for MorphOS, MacOSX PPC, Linux PPC versions especially, but any other supported platforms are welcome. (Thunder? Tobias? Mike? :)

If you feel like there is a bug and you want to report it badly then please DO READ the How to report a bug section from the README before you jump on your mail client. Thanks.

Final words

It is a good start for a year, isn't it? I hope I can keep up and finally you can enjoy the benefits of the JIT compiling on your PowerPC machine.

Monday, November 4, 2013

On a collision course

So, where are we at with the recent update? 349 out of 387: no, it is not 100% yet, but a nice, fat 90.1%!

Quickly, what was done this time and let's jump to conclusions right after that:
  • Implementation of BFINS Dx,Da{y,z}, BFFFO Dx{y,z},Da, BFEXTS Dx{y,z},Da, BFEXTU Dx{y,z},Da, BFCLR Dx{y,z}, BFTST Dx{y,z}, BFSET Dx{y,z}, BFCHG Dx{y,z}, ROXR.x Dy,Dz, ROXR.W mem, ROXR.x #imm,Dy, ROXL.x Dy,Dz, ROXL.W mem, ROXL.x #imm,Dy, ASR.W mem, ASL.W mem, LSR.W mem, LSL.W mem, ROR.W mem, ROL.W mem instructions.
  • Opcode compiler fucntions for different operation sizes are unified, unnecessary helper functions were removed.
  • Macroblock protos are generated from the opcode table file rather than used the manually prepared file.
  • Fixed wrong function name for the CNTLZW PowerPC instruction emitter.
  • Fixed missing input register in C and X flag extraction macroblock which is used in some shift instructions.
  • Fixed register dependency in ASR.x Dy,Dz and LSR.x Dy,Dz instructions.
(Slightly unrelated: I have found a bug in the original E-UAE code for BFINS and flags, if I will have enough patience I will fix that. And another one in Petunia, that will be fixed too...)

Now, we are getting dangerously close to the beta stage. It is really hard not to give you guys any promises what I cannot keep later on... (Summer is coming, you know... ;)

Anyway, my plans for the near future (read: when it is done) are:
  1. After I have finished with the implementation of all instructions which were anticipated earlier I am going to stabilize the emulator a bit and clean up the code if needed.
  2. I will try to release a compiled beta version and put together some documentation for using/testing it.
  3. There are some outstanding issues, one of the most critical one is fixing the problems with the macroblock optimizer.
So much for the crystal ball and now...

Something completely different

I have to admit that my posts were not that interesting to read recently. Earlier I invested more effort into the posts and it was probably more fun to read, especially to the developers.
I would like to bring back that tradition, so for this update I came up with a few interesting thoughts on:

How to avoid the decisions

I know that many developers love to procrastinate anything, including (but not limited to) decisions. But what I am about to write is not how lazy my fellow developers are, but rather how to get around a situation when it is not ideal to do a comparison and branch according to the result.

Why would that be important at all?
There are a number of reasons why it is better to avoid branching, this article lists many examples for that and also explains the reasons. It basically boils down to the following reasons:
In our case none of these reasons apply, but unfortunately branching is not really possible due to the static flow analysis on the macroblock list.

To put it simply: each macroblock is depending on the results of the previously executed macroblocks, if we skip ahead then it is impossible to tell whether those required macroblocks were executed or not before the dependent macroblocks.

Why does this cause any trouble? Because sometimes it is pretty hard to avoid conditions inside the instruction implementations.

I already faced this issue earlier, when I started to work on the conditional branching and setting instructions (Bcc and Scc). There was no possible way to avoid the conditional branching for these instructions due to its very nature of the instructions.
I ended up creating a very specific macroblock which embeds the condition checking and branching, so the "inter-macroblock" flow analysis remains intact.

In the recent update you can find many bit field instructions which were complicated enough already and then the very same issue came up again:

For bit field instructions 0 (zero) bit field width means 32 bit actually. This doesn't sound that bad, however sometimes the bit field width is coming from an emulated data register instead of a statically encoded constant. In this case the decision must be done in the emulated code instead of in compiling time.

Not in every case, but quite often it is possible to calculate the result instead of making a comparison and branching.

For the bit field instructions the solution was (written in C code, just because it is easier to understand):

temp = width - 1;
width = width | (temp & 32);

(There is one condition, though: the width must be between 0 (zero) and 31 before this operation starts.)

Now, think about it a little bit and figure out on your own why does it work. I am not going to explain it. :)

See you soon.

Sunday, May 19, 2013

One small step for mankind, a giant leap for the project

I have no idea how did I manage to achieve this much in this update, but it is certainly a confident step forward. For this time the list is long and diversified:
  • Implementation of Bcc.x addr, BCHG.B Dx,mem, BCHG.L Dx,Dy, BCLR.B Dx,mem, BCLR.L Dx,Dy, BRA.x abs, BSET.B Dx,mem, BSET.L Dx,Dy, BSR.x abs, BTST.B #imm,mem, BTST.B Dx,#imm, BTST.B Dx,mem, BTST.L Dx,Dy, CMP.x #imm,mem, CMP.x mem,Dy, CMP.x reg,Dy, CMPA.L reg,Ax, CMPA.W reg,Ax, CMPA.x mem,Ay, DBF.W Dx,addr, EOR.x #imm,mem, JMP.L abs, JMP.L mem, JSR.L abs, JSR.L mem, NEG.x mem, NOT.x mem, RTS, TAS.B Dx, TAS.B mem instructions.
  • Cache invalidation fix for OSX 10.3.9 and below. (Thanks to Mike Blackburn again.)
  • Fixed mask handling in BCHG.B Dx,mem instruction.
  • Fixed missing register mapping in ASL.x #imm,Dy implementation.
  • Fixed input dependency overwriting in certain memory-related allocation functions.
  • Fixed dependency for destination memory pointer register in special memory reading.
  • Fixed post address handler for condition code addressing modes, previously it might crash or call some random handler from the other addressing modes.
  • Fixed instructions where temporary registers are allocated but not free'ed.
  • Optimized masking for register to register bit instruction.
  • Optimized the temporary register usage in helper_test_bit_register_register function.
  • Optimized flag extraction in several shifting operation.
  • Branch scheduling is more flexible: adding multiple interleaved branches is possible.
  • Comment on missing implementation for an exception on loading odd address into PC.

A few highlights

First of all, let me brag around a little bit about the number of freshly implemented instructions. Right now 237 instructions are implemented out of 388, a solid 61% is done. (Previously the ratio was ~46%.)

More MacOSX versions are supported now, Mike fixed up the cache flushing a little bit and added the pre-10.4 versions too. Please read the included README file regarding the compiling instructions.

While I was working on the instructions I discovered a few bugs and glitches, which are now fixed in this release thus improving the overall stability.

I have also managed to optimize the compiled code for some instructions. Together with the implementation of some yet missing instructions the results for the Mandelbrot test (mandel_though_hw.kick.gz among the test kick files) improved a bit compared to the previous results:

Interpretive: 108 seconds (no change there...);
JIT compiled without optimization: 44 seconds (previously it was 52 seconds);
JIT compiled with optimization: 27 seconds (previously it was 32 seconds).

That was the time for the self-polishing and now back to work...

Sunday, April 28, 2013

Locations, locations, locations

In the last month I was trying to hang on my sanity while we were on house-hunting in Auckland (nuff' said). Wasn't easy and apparently it is not even close to be finished. :/

Anyway, I managed to do some work on the E-UAE JIT in the stolen moments.

In the update for this month you will find these little eggs:
  • Implementation of ADD.x Dy,mem, ADD.x mem,Dy, ADD.x #imm,mem, ADDA.x mem,Ay, ADDQ.x #imm,mem,
    AND.x mem,Dy, AND.x reg,mem, ANDI.x #imm,mem,
    BCLR.B #imm,mem, BCHG.B #imm,mem, BCHG.L #imm,reg, BSET.B #imm,mem,
    NEG.x Dy,
    OR.x Dy,mem, OR.x mem,Dy, ORI.x #imm,mem,
    UNLK.x Ay instructions.
  • Fixed unintended modification of the source register for some register to memory operations.
     
  • Memory read helper tweaked to use R3 register as the result register, no need to copy the data back-and-forth. (More optimal compiled code.)
     
  • Memory reader and writer helper function cleaned up to be more independent from caller data.
It might seem a bit random how I choose which instructions are implemented, but there is always a recurring theme. Right now this theme was the memory access. As you can see most of these instructions are manipulating the memory, which was a little bit scary earlier but I came around creating some functions which can be reused for (almost) all memory accessing instructions.
The tricky part was accessing the memory while the allocated temporary registers remain accessible somehow. With a minor workaround for saving and occasionally reloading the temporary registers after the memory access this is solved now.

I am not too happy about how the whole register mapping works, unfortunately there are some limitations of the C language which makes it complicated to come up with a more robust solution. So, right now the whole thing is just a bit hacky and wacky. Maybe in the future it would need an overhaul.

I get the question most of the times: how many instructions are left to implement. There is an easy way to find out the progress: check the table68k_comp descriptor file.
Each (to be) supported instructions for the JIT compiling is already listed there, next to the name of the instruction there is a number: 0 or 1. The 1 means it is already done, 0 remains to be implemented.
The instructions which will not be supported by the JIT compiling (so the interpretive will handle these) are not listed in this file.

So, all we need to do is counting the instructions which are already supported and what remains to be done. The current state without the FPU instructions is: 181 is done out of 388 (~46% is done).
As you can see there is more work to do, but it is really hard to tell how long does it take. What I can see is that the time I have to spend with each instruction is shorter and shorter, due to the infrastructure which had to be built first but now it is mostly done. Also some instructions are very similar, I can simply reuse parts of an already finished instruction.

We are not there yet, but the donkey is not that stubborn anymore. Giddy-up buddy!

Monday, April 1, 2013

After another bump on the bumpy road

I have spent some time fixing bugs and improving the performance of the compiled code and the JIT emulation overall. As a result here is the recent update:
  • Implementation of TST.x mem, NOT.x Dy, EXT.x Dy
  • Reverted the change for stopping the compiled block when the special flag is set. This change is not needed for the Kickstart and it doesn't seem to have any effect, but slows down the compiled code.
  • Removed hack for MOVE.z Ax,-(Ax) and MOVE.z Ax,(Ax)+ instruction implementations.
  • Fixed pre-decrement and post-increment addressing modes for destination. Re-enabled MOVE.x mem,mem and EOR.x reg,mem instructions, which were disabled previously to let the Kickstart running without the addressing mode fixes.
  • Fixed register dependency in some memory addressing modes.
  • Temporarily removed checking for the tiny blocks, ignoring these blocks is just not the right solution for avoiding the overhead of block calling.
  • Added ignoring blocks of pure unsupported instructions: these will be executed by the interpretive emulation.
  • Fixed blocking of small blocks, the block was not raised in the cache list.
Lots of small changes and fixes as you can see.

Probably, the most important fix was the one for the pre-decrement and post-increment addressing modes, this was blocking the Kickstart for a while from booting and this is why I had to remove the support of those two instructions I mentioned in the changes list.
As it turned out the root of this bug was a limitation of the implementation. Each addressing mode has two compiler function: one is called before the instruction compiling, one is called after that. But the situation is not always that simple, for example in this case:

MOVE.L (A7),-(A7)

This is a common instruction for copying the memory to a lower address (like moving the content of an array one step toward the beginning, in this case the content of the stack). Seems so innocent, isn't it? :)
Why this one was an issue: the handler for the destination address is called before the instruction and since it is a destination addressing mode it decremented the address in the emulated A7 register. But then the instruction was compiled, which used the address from the A7 register as a source. So, what actually happened was something like this:

MOVE.L -(A7),(A7)

Now you can see: this operation is (mostly) pointless, it copies the data from the one address to the same address. (Although, sometimes it might make sense in the communication with the hardware, but this wasn't the case right now, obviously.)

What was the fix? Pretty simple: I moved the destination address modification to the address handler which is executed after the instruction was compiled. This was a solution for this specific case, but I also had to make sure that all the other combinations are working which might be possible with the indirect addressing. One of the trickiest was:

MOVE.L (Ax)+,(Ax)

You can try to guess why. :)

Anyway, finally this bug is out of the way and I can go on with implementing the missing instructions. Some of them are done, but yet lots to go.

There was one more important change in this update: I removed the limit for the consecutive block length when a special condition was triggered by some instruction. I found it completely pointless, everything seems to be working without this condition. There was a bad side-effect of this limit: after an instruction triggered a special condition for the emulation all the following instructions are compiled one-by-one into separate blocks. The overhead for calling these tiny blocks was huge, this is why I introduced the rule of ignoring any block which was smaller than 3 instructions. But in this case lots of the code was not JIT compiled at all. (As some of you guys mentioned: the JIT LED was mostly dark - lots of blocks were not compiled.) This is fixed now, although I am a bit afraid of the side-effect of delaying the handling of the special conditions. We will see how it goes.

I also spent some time on updating my old tool: DiskDaisy. The recent updates for AmigaOS4.1 triggered a bug in that app. Sometimes it is nice doing something completely different for a while, you know. :)

Thursday, February 21, 2013

After the second year

Well, here we are again. Another year had passed - namely the second - since I started the project.

Obviously, it is not done yet, otherwise you would see Dancing Bananas around the blog. But we are slowly getting there. There are outstanding bugs and yet a fair share of work is to be done.
Since we passed the first year I gave up giving any estimate on how long will this take. As it seems I am pretty bad in estimates. Yet I hope that third is the charm! :)

Until then: Keep calm and Amiga On!

Saturday, February 16, 2013

When things go bogo

I came around fixing the reported issue with the enabled bogomem (fake Fast Ram) setting, you can find the simple fix in the recent update:
  • Fixed MOVEA.W reg,Ax - source data was not sign-extended
Now the Kickstart starts with both bogomem enabled or disabled configuration.

Some details on the fix

Debugging an emulator needs a completely different approach than debugging any other application type, simply because even if one or two instructions were misbehaving it doesn't mean that the emulated program is not working at all. It just does weird things, but not on the good way.

In this specific case when the error was: if the bogomem configuration was enabled then the Kickstart went into the dreaded reboot-loop, which is basically the result of an internal crash, usually because of a wrong access of memory somewhere or an exception while another exception is executed.

I had a closer look on what is going on and I have found that an instruction is trying to write into a custom register for the disk controller which is read-only. Never a good sign, especially if the Kickstart is trying to do that which is always playing by the (hardware) rules.
It was an even more interesting fact that the wrong-doing instruction was only for reading from memory.
My reaction was a confused face with a hint of suspicious look. I am still a bit puzzled by this, even after the fix - this must never happen ever.

But at least we have a crash!
It is always easier with a crash, it gives a starting point (or so I thought at least).

First I tried to log the full execution and analyze it for a while, but the only thing I had found was that some hardware handling loop runs too long, probably this is why the Kickstart hits the custom register by accident. This is not helping at all, usually it means that some initialization or leaving condition for the loop went wrong, so I had to look further before the loop itself.

Luckily, the Kickstart with this configuration set was working when none of the instructions were compiled. There is a simple method for finding out which instruction(s) causing trouble: turn off compiling of all instructions and add them back one by one while start the emulation with the same settings over and over again.
This sounds tedious but actually it is much easier than scrolling through megabytes of debug logs and looking for something, because it is procedural. Unfortunately, this method does not work for every possible issue, especially when the combinations of the wrong instructions are causing the problem.

At the end I had found that the MOVEA.W AX/DX,AY instruction was the one to blame and a quick look on the compiled code confirmed that the emulation was wrong: for every operation where the target is an address register the involved data must be longword sized.
In this case this simply meant that the word sized source data must be sign-extended while it gets copied into the target address register. I had done this for every other similar instruction, but I missed one case.

Now, you can probably see why there is no way I could find this bug by looking on the execution logs.

Thanks to kas1e, Thunder and MickJT for reporting bugs!

Wednesday, February 6, 2013

Can I haz time machine?

Another busy month passed, here comes the new update with lots of bug fixes and some freshly implemented instructions:
  • Implementation of
    • AND.x reg,reg;
    • EOR.x reg,reg;
    • EOR.x reg,mem;
    • SUBA.x reg,reg;
    • SUBA.x #imm,reg;
    • BSET.L #imm,reg;
    • BCLR.L #imm,reg;
    • EXG.L reg,reg and
    • NOP :) instruction.
  • Reorganized MOVE.x mem,mem instruction: memory reading and writing is separated out into independently callable functions to support the implementation of other similar instructions.
  • Implementation of OR.x reg,reg instruction was adapted to a more generic form.
  • Fixed flag checking for AND(I).(W|B) #imm,reg instructions.
  • Removed confusing OR immediate macroblock and replaced by the already available OR low immediate.
  • Falling back from addressing mode d16(ax) to (ax) if the offset is zero.
  • Fixed depenency flag handling in result check helper function.
  • Cleaned up opcode description table. Removed instructions from the table and from source code which won't be supported by the JIT compiler. Added missing RTM instruction.
  • Temporary register storage slots were moved from the stack frame to the Regs structure (context).
  • Removed useless debug log that flooded the output.
  • Temporarily disabled MOVE.x mem,mem and EOR.x reg,mem instructions to let the Kickstart run.

I have good news and not-so-good news

Which one should we start with? Okay, let me choose for you: not-so-good news first.
At the moment (beside the yet unimplemented instructions) there are two bugs that prevent the OS from running properly:
  1. For some unknown reason when an instruction reads and then writes the memory then the Kickstart goes back to the well-known reboot loop (actually it is a crash, sometimes you can even see the Guru message).
    At the moment I don't have the slightest idea why this happens. I traced it back to the normal MOVE instruction, when it copies data from one address to another. There is nothing wrong with the instruction implementation itself - as far as I can tell. So, this is some weird sh*t again. Lovely.
  2. When the optimize flag is turned on (comp_optimize = true in the config) then the Kickstart crashes very early.To tell you the truth, I am not surprised. When I tried to figure out the register dependency for each macroblock for each instruction then sometimes I mixed up stuff what I found later on. Due to the wrong register dependency settings for some macroblocks, these were accidentally removed, so some code is "optimized away". Most likely there are more mix-ups and missing dependencies in the code, it can be found just matter of time.
Good news:
As you can see in the last item for the update: I temporarily disabled the two already implemented instruction which tries to read and write the memory. The missing instructions are substituted by using the interpretive instructions.
Now, if you disable the optimize option in the configuration (comp_optimize=false) then the Kickstart seems working with some actual JIT compiled instructions! YAY! \o/ (I guess.)

Some more interesting problems

I have got some feedback about issues with running Kickstart 1.3 and some old Amiga500 games when the JIT is enabled. This is interesting indeed because if even if the JIT was turned on it did not emulate these old codes because the cache is never turned on (there was no cache in the Motorola 68000 processors when these programs were written).
As you can read it in the FAQ: the JIT compiling is depending on the cache emulation heavily. So, this is one more question mark. I haven't had much time to investigate it yet.

But most importantly: the Summer Sun is shining, let's go surfing! (Oh, wait. I don't do surfing. Last time when I jumped onto a bodyboard on the beach I bruised my ribs. Embarrassing and totally geeky. Let's just surf the net, shall we?)

Sunday, January 6, 2013

:dancing banana: (sort of)

After more than four months of chasing my own tail on this problem, I had managed to fix up the JIT compiling to let the Kickstart boot using compiled code(See disclaimer below...)

*Phew*, there were times when I thought I am not going to write this down ever in the blog. I was this >> << close to give it up on some days. Since the world hasn't ended in 2012, I realized I have to go on, there is no escape.

I am proud to announce: the project stepped into Alpha stage on SourceForge with this update. Details of the changes are:

For the Kickstart boot these fixes were needed:
  • Added compiling stop (jump) flag to instructions which might trigger interrupt for supervisor mode: OR.W #imm,SR, AND.W #imm,SR, EOR.W #imm,SR
  • Temporary registers are flushed at the end of the compiling cycle, but before the code generation.
  • Stop the block processing when the special flags were set in the block (an interrupt might be triggered).
  • Reload the emulated program counter register when the block finishes with a supported instruction.
  • Fixed wrong function epilog implementation: the return address was read from the wrong position in the stack.
  • Old executed instruction pointer and emulated PC register is synchronized on PC reload.

Other fixes for the bugs I have discovered while I extensively debugged the emulation:
  • Fixed missing releasing of the compiling buffer on quitting the emulator (memory leak).
  • Prevent compiling of tiny blocks (less than 4 instructions in a row): the overhead of the block calling is too much.
  • Fixed compiling buffer overflow checking and misleading help text for the compiling buffer unit size.
  • Removed supported status for not-yet-implemented EOR.x reg,mem instruction, which was added accidentally before.
(Disclaimer) Before all of you rejoice in dancing banana overload on the user portal of your choice, there is a catch: the Kickstart is not able to start up with JIT compiled instructions, only if the original interpretive instructions are called one-by-one from the compiled code.
So, there is no practical use of the sources yet, but this was the big question: is the compiled code handling able to deal with something as complex as the Kickstart? And for a long time the answer was: no.

Some über-geeky details about the fixes (you like when I'm talking dirty, right?):

As you can see from the changes there were numerous problems around the code, all of these changes were needed for the final result. There were the usual ridiculous issues like a missing negative sign in the function Epilog (line 2308) when it tried to read back the return address from the stack - from the wrong offset.
It essentially means that the execution returned from every compiled block to the parent function instead of the block call loop. The funny familiar feeling that every coder experiences sooner or later: how on EARTH this thing ever worked? ;)

The trickiest part was finding the bug about the interrupt handling: I waded through a few hundred gigabytes of debug dump following the execution. Unfortunately, I was not able to compare the different execution sessions as I mentioned earlier in the comments for the previous post.

At last, I have found out that the OS is switching between User and Supervisor mode in the Exec.lib/Supervisor() function using the special OR immediate instruction by flipping the S flag in the Status Register. This step triggers an exception which is captured by the OS and the position of the triggering instruction identified in the ROM exactly.
The bug was: I never considered this OR instruction to be similar to a TRAP or an ILLEGAL instruction, which instructions change the Program Counter by raising an exception - which is essentially a jump. Thus the compiling hadn't stopped to give back the execution to the interpretive emulation.
As a result the compiled block contained the next instruction after the OR consecutively and later the exception was triggered separately: train-wrecking the boot completely. The only way to get out of there for the OS was to reboot, this is how the reboot-loop happened.

Promises?
Now, it gets a lot easier to fix up the different instruction implementations and implement the rest of the missing instructions. As soon as the former is done the OS will be usable and the latter can be done gradually.

I would like to thank to Toni Wilen for his hints regarding the possible ways of tracking down bugs inside UAE. His suggestions gave me ideas which eventually led to the required fixes.

Yet lot more work should be done, but at least I can see the light at the end of the tunnel. (What a cliché, man. Put yourself together!)

And finally, here is a picture of me, made by my wife to capture the moment when the freakin’ thing started up for the very first time:

OMG, who is this ork-face?
See you all soon(ish) – my holiday is over tomorrow (#sadface).

Wednesday, November 7, 2012

Flag yeah

I am slowly working my way through the unimplemented instructions. Another bunch is done, here are the details of the recent update:
  • Implementation of flag condition "addressing" modes.
  • Implementation of ST.B Dx, SF.B Dx, Scc.B Dx, ST.B mem, SF.B mem, Scc.B mem, PEA.x, MOVE.x #imm,mem, CLR.x Dy, CLR.x mem and LEA instructions.
  • Fixed BTST instruction: testing bits higher than 15 was wrong.
  • Merged multiple condition code-related lines in the 68k instruction descriptor table.
  • Removed unnecessary parameter load for cache_miss function from the translated code PC verification code.
  • Added cache miss check for normal execute handler.
  • Code cleanup: removed the TODO label from immediate addressing modes used as destination and added meaningful error message instead.
  • Removed ignored parameter from the unsupported opcode macroblock push function.
The highlight of this change set is the implementation of the flag checking "addressing" modes. These are not real addressing modes; more like a simple way of implementing the numerous conditional instructions which are checking the arithmetic flags.
This important change opens the gate for the conditional branching instructions (Bcc and DBcc), that are essential for any average loops and iterations. For now only the conditional set instructions are implemented, because it was much more easy to test these.

I also took some weekday nights for dealing with some simple instructions, like LEA, PEA and CLR. On the working days I am too tired, this is all I can afford.

There is still no fix for the OS booting issue. I have tried to trace it again, at least now found out it is not about the simulated cache manipulation, because there is no cache-incoherency detected. I am still puzzled by this whole problem.

Thursday, September 6, 2012

Apple from the Tree

I try to keep this post short. New update is available:
  • MacOSX Darwin PowerPC support is implemented.
  • Fixed address distance calculation for the PowerPC native relative branch instructions.
  • Refactored the boolean values to use TRUE/FALSE preprocessor defines.
Big thanks goes to Tobias Netzel, who implemented the MacOSX support for the JIT compiling and helped me chasing down one more sneaky bug.

Some details on the bug: previously the negative relative branch calculation was completely wrong, which caused jumping to invalid addresses among certain situations and made the application crash.

END-OF-TRANSMISSION

Sunday, September 2, 2012

Bug! *Splat*

Every developer knows the feeling when finally he/she finds a bug and slaps to the forehead while mumbling: "How on earth was this thing ever been working?..."

Well, it just happened to me, I have fixed a bug that stopped the ROM from booting. It was a rather stupid mistake (as usually); for the details check out the update.
In this other minor update I have fixed one more nuance with the wrong addresses in the dumped PowerPC code log.

Right now the emulation advances even further in the booting process than before, when it stops with this cryptic message:

Compiling error: instruction or addressing mode is not implemented, but marked as implemented: 0x323b

Unfortunately, this is true: this is a move instruction with complex addressing mode (68020), which one is not implemented yet.
Since the move instruction itself is marked as supported and all the addressing mode is listed in the descriptor it busts me big time and calls me a liar. Fair enough.

I promise that I implement all the missing addressing modes soon. Honest.

I was so excited that I tried to run the ROM without compiling the instructions but in this case I got back to the previous problem: the reboot loop. :(

At least one bug was squashed again.

In the meanwhile Tobias managed to port the JIT to PowerPC MacOSX. For the speed check out his comment. I hope he sends me the changes soon and I can add it into the main source repository.

Thursday, August 30, 2012

Optimize It

I am back from the holidays for sometime now, but I got swamped instantly by the work in my daytime job. I had very little energy on the project in the nights, which was spent on two things: chasing that #@!% bug which is killing me and implementing the data-flow optimization.

Needless to say that I failed to track down the bug again, just like every time I spent hours on looking at couple megabytes of log dumps. Next time I will try a new approach, suggested by Stephen Fellner: reducing the code complexity while the bug can be reproduced.
We will see how it goes. The processor cache emulation is certainly complicates things, that can be eliminated at least.

While I acknowledged the failure again, I gathered all my previous thoughts and implemented the data-flow based optimization which does a really nice job.

The update is small, but highly important again:
  • Implementation of code optimization.
  • New configuration was introduced: comp_optimize to turn it on/off.
  • Bugs in the register input/output flag specifications are fixed.
  • Fixed too small string buffer in the 68k disassembler, previously the debugger crashed every time when memory was disassembled to the screen.
  • Implementation of MOVEM.x regs,-(Ay)

What, how and why?

As I tried to explain earlier in this post some of the compiled code is completely useless. The emulated instruction consists of the following parts usually:
  • Initialization of certain temporary registers;
  • Executing the actual operation;
  • Alter the arithmetic flags according to the result;
  • Save the result somewhere (into memory)
Some of these operations are not common or can be done for a series of instructions ahead, like loading the previous data for the emulated registers into temporary registers, but there are parts that cannot be avoided if we examine the emulated instruction out of its context.

For example the arithmetic flags are overwritten quite often by the following instructions, what means: if the next instruction(s) are not depending directly on the flag results then we can remove the code that generates these.

Typically, if we (as mighty humans) look at the instructions in a block we can easily identify the context, we can tell mostly which instructions produce usable results and what is not important. Like in most of the cases simple for a human, but it is a really complex job for the code compiler.

As I previously described: the emulated code is broken up into macroblocks by the precompiling. These blocks represent an "atomic" operation, like load data into temporary register, compare two registers or calculate certain arithmetic flags from the previous result.
When I started working on this concept I figured out that if I would be able to identify exactly for each macroblocks the previous result(s) that it is depending on and the result(s) that it produces then I can evaluate the dependencies between the subsequent macroblocks.

Going with the flow

For Petunia I already implemented a similar concept, but that was limited to the flag usage and it is not able to split up the emulation of the arithmetic flags into individual flag registers: it is either emulated completely or removed completely. (Which means that even if we needed only the C register later the rarely used X register will be calculated too, usually after the C register was already done. If you don't understand the reason behind this: don't worry - you need to know more about 68k assembly.)

For E-UAE the implementation is radically different. For any macroblock there is the possibility of specifying each and every emulated register, flag or temporary register as dependecy (input) and/or as result (output).

In the recent changes I implemented the rather simple solution for calculating the data-flow for each registers after all macroblocks are collected for a block of instructions.

It is capable of finding out for each macroblock whether the produced results are relevant for the following instructions in the block or not. If not then that macroblock can be eliminated from the compiled code because no instruction is depending on its results.

 

How to use it

Although, the optimization is completely safe (it won't remove any code that is essential) while the emulation is not stable enough there might be some bugs. So, I introduced a new configuration for turning it on/off, called: comp_optimize.
It replaces the comp_nf configuration from the x86 JIT implementation, because it is not just about the flags (nf = no flags).

If it was set to true then the data-flow calculation is done and some macroblocks will be removed. By setting it to false the emulation compiles all the instructions fully into the buffer.

 

The results

And finally some speed tests... Compared to the previously published Mandelbrot test results the actual numbers are:

Interpretive: 108 seconds;
JIT compiled without optimization: 52 seconds;
JIT compiled with optimization: 32 seconds.

That is roughly 40% speed increase in the case of this (heavily arithmetic) test.

The test system was: Micro AmigaOne (G3/800 MHz) - let's compare it to WinUAE that is running on my laptop (Intel Core i3 M350/2.27 GHz):

JIT compiled with all the possible optimizations turned on: 9 seconds.

It would be a tough job to compare these two computers, but I am pretty sure that my laptop is more than 3.5x faster than that poor old G3 machine.

I am really content with these results for now.

Sunday, July 8, 2012

JIT Goes Blue

Although, it was just a small update, but highly important:
  • Thanks to Anonymous #1 Thore and Anonymous #2 itix (from the comments section for the previous post) MorphOS support for the JIT compiling is now implemented. (I had no possibility to test it, but fingers crossed...)
  • A bug is fixed in the memory read/write handling. It caused illegal memory access when the 3.x Kickstart was running, the stackframe was trashed due to a wrong offset calculation for the register saving.
    Unfortunately, this is not the fix what is needed for let the AmigaOS boot yet, but at least one more baby step toward that direction.
Enjoy!

P.S.: Anonymous MorphOS devs, don't you want to reveal yourselves? :)