This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Sunday, July 8, 2012

JIT Goes Blue

Although, it was just a small update, but highly important:
  • Thanks to Anonymous #1 Thore and Anonymous #2 itix (from the comments section for the previous post) MorphOS support for the JIT compiling is now implemented. (I had no possibility to test it, but fingers crossed...)
  • A bug is fixed in the memory read/write handling. It caused illegal memory access when the 3.x Kickstart was running, the stackframe was trashed due to a wrong offset calculation for the register saving.
    Unfortunately, this is not the fix what is needed for let the AmigaOS boot yet, but at least one more baby step toward that direction.
Enjoy!

P.S.: Anonymous MorphOS devs, don't you want to reveal yourselves? :)

19 comments:

  1. Anonymous #1: My Name is Thore.
    Unfortunately, I still get random crashes when trying mandel_hw.kick or iamalive. Sometimes it works, sometimes not. And sometimes mandel crashes after drawing some lines.
    Btw: You forgot to include proto/exec.h and exec/system.h for MorphOS in memory.c.

    ReplyDelete
    Replies
    1. Thanks, includes are fixed now.

      Do you have a crashlog or something to go on with? If it happens randomly then most likely it is some trashed register or the stackframe is still wrong somehow.

      Delete
    2. You got rid of the anonymous function? ;) Okay.
      I do not have a crash log but I know when it happens:
      1. The bi->handler is set to "compile_p_at_start" (compemu_support.c/compile_block)
      2. CL List is updated (This one is importand, without the update it works, but not in JIT mode)
      3. m68k_run_2a (newcpu.c) tries to execute the code (I compared the handler addresses, so this is exactly the compile_p_at_start)
      4. It won't return from this code, but other handlers worked 4 times before this happens.
      So it could indeed be a stackframe or register issue.

      Delete
    3. What you described here is the calling of the compiled code chunk.
      Could you please turn on the JIT logs (comp_log=true and comp_log_compiled), this way you can check what is the exact code chunk that was translated before the crash.

      Delete
    4. The log is this, right after it the window stays gray and crashes. It's the mandelbrot.
      JIT: Init compiling
      JIT: Compiled code start: 2b984f48
      JIT: Comp: 00f8005e 12d8 MOVE.B (A0)+,(A1)+
      JIT: Comp: 00f80060 51c8 fffc DBF.W D0,#$fffc == 00f8005e (FALSE)
      JIT: Unsupported opcode: 0x51c8
      JIT: M68k: 00f8005e 12d8 MOVE.B (A0)+,(A1)+
      JIT: Mblk: load_memory_long
      JIT: Dism: 2b984fac: lwz r15,64(r14)
      JIT: Mblk: load_memory_long
      JIT: Dism: 2b984fb0: lwz r3,68(r14)
      JIT: Mblk: rotate_and_copy_bits
      JIT: Dism: 2b984fb4: rlwimi r15,r3,16,26,26
      JIT: Mblk: load_memory_long
      JIT: Dism: 2b984fb8: lwz r3,32(r14)
      JIT: Mblk: copy_register_long
      JIT: Dism: 2b984fbc: mr r4,r3
      JIT: Mblk: add_register_imm
      JIT: Dism: 2b984fc0: addi r3,r3,1
      JIT: Mblk: load_memory_long
      JIT: Dism: 2b984fc4: lwz r5,36(r14)
      JIT: Mblk: copy_register_long
      JIT: Dism: 2b984fc8: mr r6,r5
      JIT: Mblk: add_register_imm
      JIT: Dism: 2b984fcc: addi r5,r5,1
      JIT: Mblk: save_reg_stack
      JIT: Dism: 2b984fd0: stw r6,16(r1)
      JIT: Mblk: save_memory_long
      JIT: Dism: 2b984fd4: stw r3,32(r14)
      JIT: Mblk: save_memory_long
      JIT: Dism: 2b984fd8: stw r5,36(r14)
      JIT: Mblk: load_memory_spec
      JIT: Dism: 2b984fdc: mr r3,r4
      JIT: Dism: 2b984fe0: rlwinm r0,r3,18,14,29
      JIT: Dism: 2b984fe4: lis r5,10669
      JIT: Dism: 2b984fe8: ori r5,r5,40152
      JIT: Dism: 2b984fec: lwzx r5,r5,r0
      JIT: Dism: 2b984ff0: lwz r5,8(r5)
      JIT: Dism: 2b984ff4: mtlr r5
      JIT: Dism: 2b984ff8: blrl
      JIT: Dism: 2b984ffc: mr r4,r3
      JIT: Mblk: load_reg_stack
      JIT: Dism: 2b985000: lwz r3,16(r1)
      JIT: Mblk: check_byte_register
      JIT: Dism: 2b985004: extsb. r0,r4
      JIT: Mblk: copy_nz_flags_to_register
      JIT: Dism: 2b985008: mfcr r5
      JIT: Mblk: rotate_and_copy_bits
      JIT: Dism: 2b98500c: rlwimi r15,r5,0,0,2
      JIT: Mblk: rotate_and_mask_bits
      JIT: Dism: 2b985010: rlwinm r15,r15,0,11,8
      JIT: Mblk: save_memory_spec
      JIT: Dism: 2b985014: rlwinm r0,r3,18,14,29
      JIT: Dism: 2b985018: lis r5,10669
      JIT: Dism: 2b98501c: ori r5,r5,40152
      JIT: Dism: 2b985020: lwzx r5,r5,r0
      JIT: Dism: 2b985024: lwz r5,20(r5)
      JIT: Dism: 2b985028: mtlr r5
      JIT: Dism: 2b98502c: blrl
      JIT: M68k: 00f80060 51c8 fffc DBF.W D0,#$fffc == 00f8005e (FALSE)
      JIT: Mblk: save_memory_long
      JIT: Dism: 2b985030: stw r15,64(r14)
      JIT: Mblk: save_memory_word
      JIT: Dism: 2b985034: sth r15,68(r14)
      JIT: Mblk: load_register_long
      JIT: Dism: 2b985038: lis r3,10543
      JIT: Dism: 2b98503c: ori r3,r3,53816
      JIT: Mblk: save_memory_long
      JIT: Dism: 2b985040: stw r3,76(r14)
      JIT: Mblk: opcode_unsupported
      JIT: Dism: 2b985044: li r3,20936
      JIT: Dism: 2b985048: lis r4,10669
      JIT: Dism: 2b98504c: ori r4,r4,39904
      JIT: Dism: 2b985050: bl 0x2cf64ee4
      JIT: Done compiling

      Delete
    5. That is the very first compiled code chunk. As it seems when it tries to execute it then some goes wrong.
      As far as I can tell there might be two possibilities:
      1. the cache flushing for the translated code area is not working,
      2. the allocated memory for the code cache requires some special flag or MMU mapping.

      Unfortunately, I have no access to any MOS device, but you might try to cook up a simple test app that copies a few lines of code into an allocated memory area flushes the cache then tries to execute it and see what happens.

      Delete
    6. Very frustrating. My test program did not crash, but stopped at the cash flush. When I disabled cash flush, the code was done multiple times correctly without stopping.
      So I tried some stuff in uae.
      I disabled cash flushing. Then the mandel will work, even multiple times.
      After running the iamalive, which can work once, it begins to be instable. iamalive will not run a second time.
      With cache flush enabled, I cannot run any demo, it will crash immediately. Is there really no way around this cash flush?

      Here is the log for iamalive after the second attempt to run it (first one worked, second one crashed), note this is without cache flush!

      JIT: Compiled code start: 37ec4be8
      JIT: Comp: 0000001a 5281 ADD.L #$00000001,D1
      JIT: Comp: 0000001c 3141 0180 MOVE.W D1,(A0,$0180) == $00dff180
      JIT: Comp: 00000020 60f8 BT.B #$fffffff8 == 0000001a (TRUE)
      JIT: Unsupported opcode: 0x60f8
      JIT: M68k: 0000001a 5281 ADD.L #$00000001,D1
      JIT: Mblk: load_memory_long
      JIT: Dism: 37ec4c4c: lwz r15,64(r14)
      JIT: Mblk: load_memory_long
      JIT: Dism: 37ec4c50: lwz r3,68(r14)
      JIT: Mblk: rotate_and_copy_bits
      JIT: Dism: 37ec4c54: rlwimi r15,r3,16,26,26
      JIT: Mblk: load_memory_long
      JIT: Dism: 37ec4c58: lwz r3,4(r14)
      JIT: Mblk: load_register_long
      JIT: Dism: 37ec4c5c: li r4,1
      JIT: Mblk: add_with_flags
      JIT: Dism: 37ec4c60: addco. r3,r3,r4
      JIT: Mblk: copy_nzcv_flags_to_register
      JIT: Dism: 37ec4c64: mcrxr cr2
      JIT: Dism: 37ec4c68: mfcr r15
      JIT: Mblk: rotate_and_copy_bits
      JIT: Dism: 37ec4c6c: rlwimi r15,r15,16,26,26
      JIT: M68k: 0000001c 3141 0180 MOVE.W D1,(A0,$0180) == $00dff180
      JIT: Mblk: load_memory_long
      JIT: Dism: 37ec4c70: lwz r4,32(r14)
      JIT: Mblk: add_register_imm
      JIT: Dism: 37ec4c74: addi r5,r4,384
      JIT: Mblk: check_word_register
      JIT: Dism: 37ec4c78: extsh. r0,r3
      JIT: Mblk: copy_nz_flags_to_register
      JIT: Dism: 37ec4c7c: mfcr r6
      JIT: Mblk: rotate_and_copy_bits
      JIT: Dism: 37ec4c80: rlwimi r15,r6,0,0,2
      JIT: Mblk: rotate_and_mask_bits
      JIT: Dism: 37ec4c84: rlwinm r15,r15,0,11,8
      JIT: Mblk: save_memory_long
      JIT: Dism: 37ec4c88: stw r3,4(r14)
      JIT: Mblk: save_memory_spec
      JIT: Dism: 37ec4c8c: mr r4,r3
      JIT: Dism: 37ec4c90: mr r3,r5
      JIT: Dism: 37ec4c94: rlwinm r0,r3,18,14,29
      JIT: Dism: 37ec4c98: lis r5,12549
      JIT: Dism: 37ec4c9c: ori r5,r5,45400
      JIT: Dism: 37ec4ca0: lwzx r5,r5,r0
      JIT: Dism: 37ec4ca4: lwz r5,16(r5)
      JIT: Dism: 37ec4ca8: mtlr r5
      JIT: Dism: 37ec4cac: blrl
      JIT: M68k: 00000020 60f8 BT.B #$fffffff8 == 0000001a (TRUE)
      JIT: Mblk: save_memory_long
      JIT: Dism: 37ec4cb0: stw r15,64(r14)
      JIT: Mblk: save_memory_word
      JIT: Dism: 37ec4cb4: sth r15,68(r14)
      JIT: Mblk: load_register_long
      JIT: Dism: 37ec4cb8: lis r3,13673
      JIT: Dism: 37ec4cbc: ori r3,r3,32640
      JIT: Mblk: save_memory_long
      JIT: Dism: 37ec4cc0: stw r3,76(r14)
      JIT: Mblk: opcode_unsupported
      JIT: Dism: 37ec4cc4: li r3,24824
      JIT: Dism: 37ec4cc8: lis r4,12549
      JIT: Dism: 37ec4ccc: ori r4,r4,45152
      JIT: Dism: 37ec4cd0: bl 0x398467f4
      JIT: Done compiling

      Delete
    7. There is no way around the cache flush. The data cache must be written back to the memory and the instruction cache must be invalidated to let it read the new instructions from memory.
      This should work unless we don't know some very important detail.

      Delete
    8. I checked the code several times and am confused, why this should crash. I found a site on which they describe the usage of the registers, maybe here something is messed up?
      http://library.morphzone.org/An_Introduction_to_MorphOS_PPC_Assembly

      Delete
    9. From that description MorphOS is also SysV ABI compilant (which is not too suprising). I don't think that anything is wrong with the register layout. Maybe the stackframe is different somehow, but I doubt that.

      Delete
    10. Finally I found out the MorphOS "bug". I thought about the random crashes, and your hint "stackframe" brought me to the stack itself. So I decreased the cachesize (e.g.512) in the config and increased the stacksize of the CLI (e.g. stack 1000000) and then, the mysterious crashes were gone. Even after the crash of trying to boot the kickstart rom, the demos still work.
      So it is indeed just the amout of free stack space. Now we can go further in looking for the "one mysterious bug" which prevents the OS from booting. Nice vacations :)

      Delete
    11. Sounds weird. In Snoopium and SnoopDos where small PPC assembly code is used to patch system calls it works just fine. You could try querying L2 cache line size or MMU page size if it makes any difference. Or just use some fixed value (4k or 8k). OS4 code is using MEMF_HW_ALIGNED which is aligned to MMU page size. It shouldnt make difference but at least you could try.

      This stack problem is different issue. It looks like UAE JIT is using 68k stack. There are two stacks (stack pointers) in MorphOS: one is legacy 68k stack pointer found in REG_A7 for 68k code (tc_SPReg, tc_SPLower, tc_SPUpper) and PPC stack pointer found in r1 and maintained in struct ETask. It is interesting that UAE JIT is using 68k stack pointer but it is not necessarily bad. Only if EUAE JIT assumes mixed 68k/PPC stack it can be problem.

      int __stack = 1000000; only sets PPC stack because 68k stack is rarely used and defaults to 2048 bytes. To fix this you should use StackSwap() call in Exec to set properly sized 68k stack or modify EUAE JIT to use PPC stack. I dont know what method is better. OS4 is using mixed stack (ppc and 68k together) so maybe that stack layout is assumed in JIT code and that is why it crashes. But I couldnt find any references to tc_SPReg or tc_SPLower or anything like that... definitely this stack is not working in MorphOS as it is intended to work.

      Delete
    12. I don't see why would the JIT use the 68k stack on MOS. The code is the same on both MOS and OS4, it depends on the pointer in r1. On OS4 if the app was made in PPC then there are no 68k emulated access to the stack and you can follow the SysV ABI stack frames all along the whole stack.
      E-UAE is a purely PPC app (in our case), so it won't try to access the 68k stack or any 68k emulation dependency.

      Since the JIT code is single threaded and never gets called through the GCC compiled code directly I can move the saved registers to the global context array structure instead of storing it on the stack. Probably it is even possible to remove the SysV ABI stack frame creation all together somehow, in this case there won't be any stack usage at all.

      Delete
    13. Lowering the code cache size to 512 bytes might cause side effects, like frequent flushing of the compiled code or simply skip compiling all-together. Do not do that, it makes no sense to use less than 1 MB code cache.
      Maybe I should introduce checking for this setting.

      Delete
    14. Sorry, I just realized that the code cache size is specified in KBytes. So, specifying 512 as cache is 0.5 MB code cache, which should be fine.

      Delete
  2. Wait what? I'm not any of the anonymous', nor do I think I helped with anything. (Incase at the time of this post the blog article has been edited with my name removed, my name is/was listed as one of the people who helped with MorphOS support. I am not one of the anonymous people, nor do I have any skills in coding in C, nor do I have MorphOS at this time).

    ReplyDelete
    Replies
    1. Sorry about that, somehow I connected your name with Anonymous #2.
      The real Anonymous #2, please stand up!

      Delete
  3. Anon #2 is me, an ex-Kiwi ;)

    ReplyDelete