This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Saturday, February 16, 2013

When things go bogo

I came around fixing the reported issue with the enabled bogomem (fake Fast Ram) setting, you can find the simple fix in the recent update:
  • Fixed MOVEA.W reg,Ax - source data was not sign-extended
Now the Kickstart starts with both bogomem enabled or disabled configuration.

Some details on the fix

Debugging an emulator needs a completely different approach than debugging any other application type, simply because even if one or two instructions were misbehaving it doesn't mean that the emulated program is not working at all. It just does weird things, but not on the good way.

In this specific case when the error was: if the bogomem configuration was enabled then the Kickstart went into the dreaded reboot-loop, which is basically the result of an internal crash, usually because of a wrong access of memory somewhere or an exception while another exception is executed.

I had a closer look on what is going on and I have found that an instruction is trying to write into a custom register for the disk controller which is read-only. Never a good sign, especially if the Kickstart is trying to do that which is always playing by the (hardware) rules.
It was an even more interesting fact that the wrong-doing instruction was only for reading from memory.
My reaction was a confused face with a hint of suspicious look. I am still a bit puzzled by this, even after the fix - this must never happen ever.

But at least we have a crash!
It is always easier with a crash, it gives a starting point (or so I thought at least).

First I tried to log the full execution and analyze it for a while, but the only thing I had found was that some hardware handling loop runs too long, probably this is why the Kickstart hits the custom register by accident. This is not helping at all, usually it means that some initialization or leaving condition for the loop went wrong, so I had to look further before the loop itself.

Luckily, the Kickstart with this configuration set was working when none of the instructions were compiled. There is a simple method for finding out which instruction(s) causing trouble: turn off compiling of all instructions and add them back one by one while start the emulation with the same settings over and over again.
This sounds tedious but actually it is much easier than scrolling through megabytes of debug logs and looking for something, because it is procedural. Unfortunately, this method does not work for every possible issue, especially when the combinations of the wrong instructions are causing the problem.

At the end I had found that the MOVEA.W AX/DX,AY instruction was the one to blame and a quick look on the compiled code confirmed that the emulation was wrong: for every operation where the target is an address register the involved data must be longword sized.
In this case this simply meant that the word sized source data must be sign-extended while it gets copied into the target address register. I had done this for every other similar instruction, but I missed one case.

Now, you can probably see why there is no way I could find this bug by looking on the execution logs.

Thanks to kas1e, Thunder and MickJT for reporting bugs!

5 comments:

  1. @Almos
    Cool !

    But question is: should we use bogomem or not at all to make JIT working better ? Is it any matter for speed of JIT and how it works ?

    ReplyDelete
    Replies
    1. Bogomem is required for certain old programs, which are not able to handle any type of Fast RAM. It has nothing to do with JIT at all.

      On the other side: the fix in MOVEA.W instruction was very important, that instruction is often used in compiled C code.
      So, the working bogomem setting is just coming as a bonus to the fixed instruction.

      Delete
  2. Yes, the address target must always be longword, only the size of data which is written to the target is of the specific size. this is not only with move.w but also move.b. And also with move.b/w data,(An)+ and such stuff.
    I tested it on MorphOS and now, I just have to turn caches on, to enable the jit.
    Better than last time, very good. But Apidya crashes when the action starts to begin. Without JIT it works.
    Funny thing, on kick 1.3 the JIT led shows green ;)

    ReplyDelete
  3. Congratulations on all the progress, I've only been following this every couple of months or so, but look forward to trying out the results.

    "There is a simple method for finding out which instruction(s) causing trouble: turn off compiling of all instructions and add them back one by one while start the emulation with the same settings over and over again."

    Would it be quicker to iteratively subtract/add half the instructions at a time until the offending one is identified?

    ReplyDelete
    Replies
    1. Thanks.

      Usually I follow that method: similarly to binary search I halve the enabled instructions and choose the half which shows the sympthoms. This method works if only one instruction causes the bug, though. Sometimes a combination of different instructions together are not working properly, especially when the automatic optimization is involved. Yet it is much easier to find the root of the issue than browsing through gigabytes of logs.

      Delete