This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Wednesday, May 21, 2014

PPCJITBETA03 (Switch to Ludicrous Speed)

Welcome back, long time no see, my friend. Please be seated.

I can tell you I have some good news to you again: here comes Beta #3!

Get it here if you must:


I held this release back for a while just to fix the emulated cache checksum feature, but I have been chasing a bug for two weeks without any success. So, that fix is postponed to the next beta, in the meanwhile you can enjoy some significant speed enhancement and increased stability.

Without going into the details regarding the changes (see the included README for all changes) I would like to mention the most important change:

Vroom-vroom

The major feature of this beta release is the register and flag optimization fix. You can turn it on in the configuration, just set comp_optimize to true.

If you are interested in the details I explained it already how the optimization works in an earlier post, but in case you are too lazy to read through that post: here is my old diagram (just because it is beautiful, you know):

Code translation flow diagram

Let me summarize it for you: the JIT compiler is collecting information about data-flow dependencies between the various macroblocks and tries to remove the ones which won't have any effect on the outcome of a certain block of macroblocks.
This is not a new feature in the JIT implementation, but previously a few (tons of) bugs prevented it from working on more complex codes than my Mandelbrot test.

In this release I have fixed every issue I have found so far with the optimization and it seems working quite nicely. You can boot the AmigaOS and it runs just fine, also games and demos will benefit from this feature too.
I was planing to do a comparison video where the speedup is clearly shown, but I haven't had too much time yet, so this is your job now, dear EUAEPPCJIT fans! Just post the links to the videos into the comments here. :)

PPC970 aka G5

Not everything is sunshine and happiness, though. Supporting G5 processor architecture target turned to be much more complicated than I thought, especially because I don't have any hardware to test on.

In the previous release the MacOSX G5 binary was not working properly on G5 (neither on any other PowerPC as matter of fact). Thanks to Luigi Burdo for the report and Tobias Netzel (again) for the help with the compiler. This is fixed in this release, hopefully. (Fingers crossed, I still don't have hardware to test on.)

While the situation with the MorphOS G5 version is not that hunky-dory: as it seems there is no official compiler with G5 support yet in the MorphOS SDK and it is rather complicated and unreliable to compile any source for that processor. Until this situation is improved the G5 version for MorphOS won't be available from the beta binaries.

However, nothing stop you from compiling your own version from the sources, as these are always available at SourceForge.

Upcoming

As I mentioned: I postponed the fix for the block checksum to the next release and also picked up some things to do. You can find the planned list here:


I also had a look on what is planned for the first stable release and moved some items around the various milestones. If you are curious just click at the milestones on the Sourceforge page.

24 comments:

  1. Hi Almos thank you for add me in your comment :)
    About this last beta : before was 9 mips now sysinfo gave me from 32 to 48mips .
    The workbench overall speed is really fast .
    The AmigaOs4 version continue be the fastest beta 2 (52-58mips) that i had been tested im waiting Michael SDL version for compare this last your release .
    Just reporting:
    in workbench in Rtg cpu0 (on quad g5) euaejit using 60% of the Cpu system much faster compared my 4000 060@66mhz and 604e@366 on CyberstormPPC
    Using Aga screens = cpu0 use 10% and all become really slow .
    I will make more repots as my possibility if can help your great work

    ReplyDelete
    Replies
    1. Hi Luigi,

      Thanks for the report. Could you please send me the config files you were using for these tests?
      You can find my email address in the README file around the end.

      It sounds very weird why the AGA screen emulation uses only 10% of the CPU time. Have you added all the configuration items what was suggested in the README file?

      Delete
  2. Hi Almos,
    comp_optimize=true if i good remember is set , now im not at home i will check it later and report plus i will sent you the configurations .
    If you need i can suggest to stay in touch with Teen4fox developers because they can help for multi threading on G5 , they make teen4woolf optimized for 970mp using all 2/4 cpu with the same process.
    i think this will improve really much the speed of Euaejit on this cpus.
    The same was made by Yabause (saturn emulator) on LinuxPPC

    ReplyDelete
    Replies
    1. I was referring to all the various configuration items I listed in the README file:

      http://sourceforge.net/p/euaeppcjit/code-0/HEAD/tree/trunk/README#l191

      Optimizing the emulator for multithreading would require significant changes in the structure of the emulation and possible complete rewrite of some modules. I have no plans for changing any other part than implementing the JIT compiling. Maybe somebody will take this over once.

      Delete
  3. Hi, Almos , yes sorry i make a wrong past and copy , yes i use the link posted by you as usual configuration on G5 and on Pegasos 2 ... but in any way i will check it :) and report
    Because i had been check the hitoro default.uae.rc and modded by me for the Jit bat can be possible i left something .

    ReplyDelete
    Replies
    1. Almos , i clean all the lines extra made by HiToro and i rised the cache to 16192 and ... 98 mips 98% of use of cpu0 plus for test i put immediate blits on = 96mips ... Aga screen are working ok too 98% use of cpu 25%(1 core) of the system :) Problem solved and now is the fastest euae jit .. amigos 3.9 loading max 2 seconds :) if you need i can sent you my config.nc and include in the OsX g5 euae :P ... Another step is done :P

      Delete
    2. Hi Luigi,
      the key to speed on OS X is activating the SDL OpenGL backend.
      Add the following line to your default.uaerc to do this:
      sdl.use_gl=true

      Tobias

      Delete
    3. Hi Luigi,
      I am glad that you have resolved the configuration problems. As Tobias suggested: if you don't have OpenGL turned on yet give it a go.

      Thanks Tobias!

      Maybe we need a config FAQ, but I don't know much about the various supported platforms and how the performance can be fine-tuned on each.

      Delete
    4. Hi tobias i can suggest the other guys of macosx to install hitoro for make there the first steps of installation after when it make the default.uaerc mod it with textedit and add the jit lines .
      @tobias
      Thanks for the tip, in any way hi toro put that line as default on :)

      Delete
    5. OK, let's do a bug report!

      ToolsDaemon (v22, patched for OS3.9) cannot create new menus when JIT is off (regardless of comp_optimize setting).

      AmiDock can't render backdrop images when comp_optimize=true (SDL build, using P96/uaegfx screen). Works OK when not on a P96 screen.

      On the plus side, it is faster than beta 2, and the first time I've noticed a small speed up in a couple of games I've tried.

      Delete
    6. So, do you mean that ToolsDaemon works when the JIT is on, but doesn't work when it is off?

      I will check what is going on with AmiDock.

      I have a bad news for you: the games you are testing probably won't benefit from JIT compiling then. Because this was the biggest speed bump what we could achieve in the foreseeable future. There will be some minor optimizations later on probably, but this is almost done.

      Delete
    7. I know I already responded to this below, but just wanted to direct anyone reading this to check that comment. Also, I didn't realise until now that I had made a typo. I meant to say "on", not "off".

      Delete
  4. Almos just tested the last Michael Sdl build for AmigaOs on my G4 1266mhz Pegasos2 and i have 66,50 mips :)
    Means the G5 will need a better optimization :P
    Note: The G5 version have a problem with visualization of Wb clock on G4 and AmigaOS not.
    i continue have problem with Adoom graphic on the two machine .. All the textature are glitched in not jit mode not
    i tested Quake 68k by Click boom but have problem in loading pak0 in not jit mode not

    ReplyDelete
    Replies
    1. Please note that the reported mips values using any tool which is running inside the emulator is pretty much meaningless. The processor clock/real time ratio is skewed, most likely the time measuring is completely inaccurate.

      If you want to test the speed then try to find some program which produces something, like a 3D render or some picture processing and measure the processing time using a stop watch. Don't trust measured FPS either, since that is depending on the time inside the emulator too.

      I am unable to fix any G5-related issue simply because I don't have the hardware to test on. Maybe Tobias could do something about it... ;)

      Please do not report any bug which is not related to the JIT compiling. I am not going to fix any of these.
      Regarding how to report bugs please refer to the README file which is included in the beta package.
      Here is a link to the relevant part of the README file on SourceForge:

      https://sourceforge.net/p/euaeppcjit/code-0/HEAD/tree/trunk/README#l232

      Delete
    2. You should also know that all the G5 version does is avoid generating code that would lead to a very slow emulation. That way on a G5 2.1 GHz it runs very little faster than on a G4 1.5 GHz.
      In order to gain more speed, the JIT code generation would have to take the G5's pipeline length into account and would have to generate branch instructions in a way the G5 can predict correctly.
      Maybe Cameron Kaiser from the TenFourFox project would like to do something in that direction?!

      Delete
    3. Tobias, Cameron Kaiser say that you know how contact him and because of his message i understand who are you ... the man who gave me the opportunity to brwowse internet with all the plugin enabled on my g5 :) :) thank you :)

      Delete
  5. Ah, okay, I misunderstood your bug reports.

    I have created a ticket for ADoom:
    https://sourceforge.net/p/euaeppcjit/tickets/39/

    I will check Quake too.

    ReplyDelete
  6. I couldn't find any demo download for the Click Boom version of Quake, is there a version available?

    ReplyDelete
    Replies
    1. http://aminet.net/game/demo/QuakePlayer096.lha

      Delete
    2. Thanks, I will give it a go. I have tried Quake68k, but that one stops with an error.
      Unfortunately, a different version probably won't suffer from the same issues, but we will see.

      Delete
  7. I didn't realise I had accidentally replied to another post instead of making a new one before.

    ToolsDaemon is fine when JIT is turned off, but extra menus are not shown if JIT is turned on (i.e "Tools" is the last menu you see when you right-click in WB).

    As for games, I haven't tested many, and I'm not expecting unrealistic things from you, so ;-)

    Perhaps VistaPro would be a good program to test JIT speed, instead of SysInfo.

    ReplyDelete
  8. Hi Almos,
    i dont know if there is a click boom demo some ware , i have it original from 199x , on aminet probably ?

    In any way i can test the other 68k build on the net and report what will be the result ,
    in any way im tiny certain if you will fix the Adoom gfx problems probably the quake will working great too.

    About benchmarking .
    I have my personal own measurement.
    and it is really simple how much time is needed by a real amiga 4000 with a Cybervision 64 to load a jpg of 1024x768x32 as a background in workbench and with scale well option active.

    Amiga 4000 040@25 need about 12 seconds
    Amiga 4000 060@50 need about 6 seconds
    Amiga 4000 060@50 and 604e233 with (ppc datatype) is immediate
    Euae jit is immediate.
    Euae not jit need about 7 seconds

    This means sysinfo can say the true about mips measuring :P

    ReplyDelete
  9. I could only made a quick test but it seems more stable. When I find some time I will do more tests :) Thanks for the update, I appreciate it! *bow to the master*

    ReplyDelete