The Big E-UAE JIT blog: Peeking behind the curtain

What's new

I have just uploaded a small update again to the Subversion repository. In this update you can find the following changes:

Implementation of all form of the following addressing modes for both as source and as destination addressing:

immediate long, word, byte;
indirect addressing: (Ax);
indirect addressing with pre-decrement: -(Ax);
indirect addressing with post-increment: (Ax)+;

Implementation of instructions:

MOVE.x #imm,Dy;
MOVEA.x #imm,Ay;
MOVEQ #imm,Rx;
MOVE.x Rx,Ry

Code was refactored to avoid repetition of the same code chunks in the addressing modes and instructions.
Implementation dumping the compiled code to the console.
New configuration (comp_log_compiled) is also added to turn it on/off.

Featuring...

It is always exciting to find out some magic details about the internal behavior of a complex system. I remember when I realized for the first time how the texts are stored in the Commodore Plus/4 games, back in the days. I was amazed how I can change the text that showed up on the bottom scroll of Tycoon Tex.

Let me offer some small excitement to you all: I just implemented a funny little feature in the E-UAE JIT engine. Now we can turn on dumping the compiled code to the console, together with the original Motorola 68k instruction that was compiled and the macroblocks that describe the intermediate translation form.

The purpose of this feature wasn’t (purely) the entertainment, but I was really fed up with the situation that the generated code cannot be debugged properly. Previously, I added a trap instruction (tw) into the translated code at some point, so I was able to have a look on the output from the Grim Reaper window (which was awesome, but let’s not mix it up with actual debugging).

Too bad that GDB is so limited: it cannot debug into any code segment that wasn’t loaded by DOS (like generated code). Not to mention how cumbersome the console interface is... (Or am I missing something? Enlight me please.)

I would like to thank Frank Wille the sources for the PowerPC disassembler that makes it possible to list the translated code.

How to turn on this feature: there are two settings that control the logging. These are:

comp_log – if it was set to “true” or “yes” then the JIT logging is turned on and dumped to the standard output.
comp_log_compiled – if it was set to “true” or “yes” then the compiled code is listed through the JIT logs.

Let’s see a small demonstration of this feature, shall we? (Not for the faint-hearted!)

The following list is the output from the very simple test code: iamalive.asm, slightly edited and formatted for educational purposes...

M68k: ADD.L #$00000001,D1

Mblk: load_memory_long
Dism: lwz r15,64(r14)
Mblk: load_memory_long
Dism: lwz r3,68(r14)
Mblk: rotate_and_copy_bits
Dism: rlwimi r15,r3,16,26,26
Mblk: load_memory_long
Dism: lwz r3,4(r14)
Mblk: load_register_long
Dism: li r4,1
Mblk: add_with_flags
Dism: addco. r3,r3,r4
Mblk: copy_nzcv_flags_to_register
Dism: mcrxr cr2
Dism: mfcr r15
Mblk: rotate_and_copy_bits
Dism: rlwimi r15,r15,16,26,26

M68k: MOVE.W D1,(A0,$0180) == $00dff180

Mblk: load_memory_long
Dism: lwz r4,32(r14)
Mblk: add_register_imm
Dism: addi r5,r4,384
Mblk: check_word_register
Dism: extsh. r0,r3
Mblk: copy_nz_flags_to_register
Dism: mfcr r6
Mblk: rotate_and_copy_bits
Dism: rlwimi r15,r6,0,0,2
Mblk: rotate_and_mask_bits
Dism: rlwinm r15,r15,0,11,8
Mblk: save_memory_long
Dism: stw r3,4(r14)
Mblk: save_memory_spec
Dism: mr r4,r3
Dism: mr r3,r5
Dism: rlwinm r0,r3,18,14,29
Dism: lis r5,27315
Dism: ori r5,r5,23016
Dism: lwzx r5,r5,r0
Dism: lwz r5,16(r5)
Dism: mtlr r5
Dism: blrl

M68k: BT.B #$fffffff8 == 0000001a (TRUE)

Mblk: save_memory_long
Dism: stw r15,64(r14)
Mblk: save_memory_word
Dism: sth r15,68(r14)
Mblk: load_register_long
Dism: lis r3,27606
Dism: ori r3,r3,45096
Mblk: save_memory_long
Dism: stw r3,76(r14)
Mblk: opcode_unsupported
Dism: li r3,24824
Dism: lis r4,27315
Dism: ori r4,r4,21752
Dism: bl 0x7f91acc0

Done compiling

Colorful, isn't it? :)

Okay, let's try to understand what is going on.

I marked the three Motorola 68k instruction that was compiled here with orange color, the code roughly looks like this:

1. Increase register D0 by one;

2. Put the content of register D0 to the address that is calculated by using register A0 plus offset of 0x180 (A0 was initialized previously with the value: 0xDFF000, which is the base of the custom chipset memory area) - in layman terms: load it to the background color.

3. Go back to step 1.

Now, let's see the second level of the list:

First of all the prefix "Mblk:" marks the macroblocks (white), "Dism:" is the actual PowerPC code (yellow).

As I already mentioned earlier: some macroblocks can be optimized away (although it is not implemented yet), and a macroblock means at least one PowerPC instruction, but it can be a series of instructions also.

The steps can be interpreted roughly as:

1.1. Load the arithmetic flags from the memory where the interpretive emulator stores them.

1.2. Load the previous content of the emulated D0 register into a PPC register.

1.3. Load the constant for the add instruction (one) into a PPC register.

1.4. Add the second register to the first one (increase D0 by one).

1.5. Save the arithmetic flags after the operation.

2.1. Load the previous content of the emulated A0 register into a PPC register.

2.2. Add the offset (0x180) to the content of A0 and load it into a new PPC register.

2.3. Check the content of the emulated D0 register to set up the arithmetic flags according to it.

2.4. Save back the modified D0 register to the memory for the interpretive emulator.

2.5. Calculate the offset and load the function address for the memory write operation handler and call it (namely the custom chipset write handler). This is a function from the interpretive emulation and it was written in C, therefore we must store all volatile registers back to the memory, the C code won't preserve these. (This is why we stored the D0 register in step 2.4.)

3.1. Save the arithmetic flags back to memory where the interpretive emulator stores them. (These were kept in a non-volatile register, so these were preserved while we called the helper function in step 2.5.)

3.2. Update the emulated PC register to the current state for the following instructions.

3.3. Call the interpretive emulation for the branch instruction (because it is not implemented yet, so we reuse the interpretive implementation).

4. Done. Phew.

Funny, eh? :)

If you are not familiar with assmebly then don't stretch yourself too much by trying to understand this techno-blahblah.

For the rest: who can spot what can be optimized on the compiled code?

4 comments:

kas1e16 May, 2012 03:32
Thumbs up for techno-blahblah !

Btw, its me, or PPC code in end are much larger/longer in compare with 68k originals ?
The Rainbow UI23 May, 2012 03:42
Nice. Keep up with the good work.

I know we are talking about UAE, but could be possible to realize an entire emulator (A sort of VM) that have all the chipset jit emulated? I mean: We have petutia... but what about to add the blitter and the other chipset components to act in the same way as petunia?

Thank you very much.

Tuesday, May 15, 2012

Peeking behind the curtain

What's new

Featuring...

Let’s see a small demonstration of this feature, shall we? (Not for the faint-hearted!)

4 comments: