This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Sunday, May 19, 2013

One small step for mankind, a giant leap for the project

I have no idea how did I manage to achieve this much in this update, but it is certainly a confident step forward. For this time the list is long and diversified:
  • Implementation of Bcc.x addr, BCHG.B Dx,mem, BCHG.L Dx,Dy, BCLR.B Dx,mem, BCLR.L Dx,Dy, BRA.x abs, BSET.B Dx,mem, BSET.L Dx,Dy, BSR.x abs, BTST.B #imm,mem, BTST.B Dx,#imm, BTST.B Dx,mem, BTST.L Dx,Dy, CMP.x #imm,mem, CMP.x mem,Dy, CMP.x reg,Dy, CMPA.L reg,Ax, CMPA.W reg,Ax, CMPA.x mem,Ay, DBF.W Dx,addr, EOR.x #imm,mem, JMP.L abs, JMP.L mem, JSR.L abs, JSR.L mem, NEG.x mem, NOT.x mem, RTS, TAS.B Dx, TAS.B mem instructions.
  • Cache invalidation fix for OSX 10.3.9 and below. (Thanks to Mike Blackburn again.)
  • Fixed mask handling in BCHG.B Dx,mem instruction.
  • Fixed missing register mapping in ASL.x #imm,Dy implementation.
  • Fixed input dependency overwriting in certain memory-related allocation functions.
  • Fixed dependency for destination memory pointer register in special memory reading.
  • Fixed post address handler for condition code addressing modes, previously it might crash or call some random handler from the other addressing modes.
  • Fixed instructions where temporary registers are allocated but not free'ed.
  • Optimized masking for register to register bit instruction.
  • Optimized the temporary register usage in helper_test_bit_register_register function.
  • Optimized flag extraction in several shifting operation.
  • Branch scheduling is more flexible: adding multiple interleaved branches is possible.
  • Comment on missing implementation for an exception on loading odd address into PC.

A few highlights

First of all, let me brag around a little bit about the number of freshly implemented instructions. Right now 237 instructions are implemented out of 388, a solid 61% is done. (Previously the ratio was ~46%.)

More MacOSX versions are supported now, Mike fixed up the cache flushing a little bit and added the pre-10.4 versions too. Please read the included README file regarding the compiling instructions.

While I was working on the instructions I discovered a few bugs and glitches, which are now fixed in this release thus improving the overall stability.

I have also managed to optimize the compiled code for some instructions. Together with the implementation of some yet missing instructions the results for the Mandelbrot test (mandel_though_hw.kick.gz among the test kick files) improved a bit compared to the previous results:

Interpretive: 108 seconds (no change there...);
JIT compiled without optimization: 44 seconds (previously it was 52 seconds);
JIT compiled with optimization: 27 seconds (previously it was 32 seconds).

That was the time for the self-polishing and now back to work...

15 comments:

  1. Where there is a will, there is a way. ;-)

    ReplyDelete
  2. @Almos

    Do some tests and:

    Aladdin and LionKing games: with jit slower on 10%.

    Technological Death demo: start to works with jit, slower on 10% too, and freezes somewhere on third scene (about 20 seconds from run). I.e. bugs still here, but it start to works now.

    Whdload stuff: after few seconds of loading black screen and uae freezes.

    ReplyDelete
    Replies
    1. Is the JIT led bright green while these are running?

      Delete
    2. @Almos
      Its pretty much green all the time, mostly "light green", like there is a lot of jit happens

      Delete
    3. Ok, I will check the demo why is that slower than it should be. Although not all insturctions are done yet, but it should be at least the same speed as the interpreted.

      Delete
    4. I had a look on the Technological Death: first of all, this is an OCS demo which is heavily using the chipset (the blitter mostly), but not the processor. The JIT won't help in this case, because the demo is slow due to the sluggish chipset emulation.
      I am not sure how did you figure out that it is slower by 10%, I see no difference at all. And yes, it doesn't work fully, there are still bugs to be fixed.

      It is pretty easy to find out whether the processor emulation is the bottleneck or the chipset emulation: if on large movements on the screen the emulation slows down then it is the chipset emulation to blame. For these games/apps the JIT won't help much (or at all).

      I would suggest to test something where the processor emulation is more important, like some painter (Personal Paint for example) or ray tracer which is working on 68020 without FPU. Or maybe Quake1 which is compiled for 68020. Anything which is running on emulated graph card and not on ECS/AGA screens is going to gain more speedup mostly likely.

      Delete
    5. @Almos
      Yep, it can be not faster, but it should't be slower with jit imho too ? I check the differences by just running cpu_metter on background, and i can see by it that with enabled jit cpu loading is more on 10% in compare with no enabled jit.

      At least in theory everything should be the same, and not slower (but i assume even in those old aga games which use chipset and so on, its still should speedup few parts a bit there and there , right ?)

      Delete
    6. I am not sure I understood this correctly: you tried to measure how the free time of your PPC processor changes? That is meaningless, because the JIT emulates a non-existing processor which is "much faster" than any real one. It adjusts the chipset timing balance to increase the load on the processor emulation against the custom chip emulation, to let the emulated processor execute more code. So, it skips more custom emulation in favor to the processor emulation.
      The side effect of this might be that the free PPC processor time drops, but why would you measure that? Most of the cases (especially for the old demos and games) the processor is barely doing anything else, but waiting. In this case it is waiting more faster. :)

      What worth checking if the frame rate of a processor intense game is higher or some calculation takes less time when the emulation runs on maximum speed. For these tests you have to set up these configuration items:

      cpu_speed=max
      cpu_compatible=false
      cpu_cycle_exact=false

      Delete
  3. Thank you, sounds good. I tried it with MorphOS and here I have two issues:
    1. It seems that only the 68020 will do JIT, no other processor
    2. On _every_ game now I get the message "JIT temporary register '1' is not allocated, but mapping info is requested"
    So I could not try it yet ;)

    ReplyDelete
    Replies
    1. JIT is only working with CPU cache enabled. (See FAQ.)

      The other sounds like a bug in one of the new instructions. Could you please tell me which games are these? Or maybe run it while the comp_log=true is set and send me the logs?

      Delete
  4. Okay. Do you still have this bfd... address?
    I tried it with gods, apidya, Siedler and Lotus2. But I don't think it's the games, but the kickstart itself. When I comment the adf section, the error also appears.
    For Kick3.x the last lines are:
    JIT: Compiled code start: 2dff3b58
    JIT: Comp: 00f81a0e 4aaa 000a TST.L (A2,$000a) == $00c0005a
    JIT: Comp: 00f81a12 67f4 BEQ.B #$fffffff4 == 00f81a08 (FALSE)
    JIT: Done compiling
    JIT: Init compiling
    JIT: Compiled code start: 2dff3c6c
    JIT: Comp: 00f81a14 206a 000a MOVEA.L (A2,$000a) == $00c0005a,A0
    JIT: Comp: 00f81a18 2241 MOVEA.L D1,A1
    JIT: Comp: 00f81a1a b308 CMPM.B (A0)+,(A1)+
    JIT: Unsupported opcode: 0xb308
    JIT: Comp: 00f81a1c 66ea BNE.B #$ffffffea == 00f81a08 (TRUE)

    Then the error message appears.

    For the CPU cache, it seems that the other processor types don't enable the cache by default.

    ReplyDelete
    Replies
    1. Um, no, that mail address is ancient. I put a contact field to the right column on this page, just send me a message and I will reply.

      I don't see any problem with my kickstart, probably you got a different version.

      From the log I don't see anything particular which might trigger this error. Unfortunately, it is hard to tell where this error is coming from exactly because it is printed by one of the most widely used functions.

      Maybe if you could send me the kick file I would be able to debug it.
      Or you can also narrow it down by removing the supported instructions from the table68k_comp file until the error disappears. Just change the number '1' next to the instruction name to '0' and rebuild the code. Doing it one-by-one might take a while, but you can do this in bigger blocks, the refine it when the error disappears.

      Delete