- Implementation of TST.x mem, NOT.x Dy, EXT.x Dy
- Reverted the change for stopping the compiled block when the special flag is set. This change is not needed for the Kickstart and it doesn't seem to have any effect, but slows down the compiled code.
- Removed hack for MOVE.z Ax,-(Ax) and MOVE.z Ax,(Ax)+ instruction implementations.
- Fixed pre-decrement and post-increment addressing modes for destination. Re-enabled MOVE.x mem,mem and EOR.x reg,mem instructions, which were disabled previously to let the Kickstart running without the addressing mode fixes.
- Fixed register dependency in some memory addressing modes.
- Temporarily removed checking for the tiny blocks, ignoring these blocks is just not the right solution for avoiding the overhead of block calling.
- Added ignoring blocks of pure unsupported instructions: these will be executed by the interpretive emulation.
- Fixed blocking of small blocks, the block was not raised in the cache list.
Probably, the most important fix was the one for the pre-decrement and post-increment addressing modes, this was blocking the Kickstart for a while from booting and this is why I had to remove the support of those two instructions I mentioned in the changes list.
As it turned out the root of this bug was a limitation of the implementation. Each addressing mode has two compiler function: one is called before the instruction compiling, one is called after that. But the situation is not always that simple, for example in this case:
MOVE.L (A7),-(A7)
This is a common instruction for copying the memory to a lower address (like moving the content of an array one step toward the beginning, in this case the content of the stack). Seems so innocent, isn't it? :)
Why this one was an issue: the handler for the destination address is called before the instruction and since it is a destination addressing mode it decremented the address in the emulated A7 register. But then the instruction was compiled, which used the address from the A7 register as a source. So, what actually happened was something like this:
MOVE.L -(A7),(A7)
Now you can see: this operation is (mostly) pointless, it copies the data from the one address to the same address. (Although, sometimes it might make sense in the communication with the hardware, but this wasn't the case right now, obviously.)
What was the fix? Pretty simple: I moved the destination address modification to the address handler which is executed after the instruction was compiled. This was a solution for this specific case, but I also had to make sure that all the other combinations are working which might be possible with the indirect addressing. One of the trickiest was:
MOVE.L (Ax)+,(Ax)
You can try to guess why. :)
Anyway, finally this bug is out of the way and I can go on with implementing the missing instructions. Some of them are done, but yet lots to go.
There was one more important change in this update: I removed the limit for the consecutive block length when a special condition was triggered by some instruction. I found it completely pointless, everything seems to be working without this condition. There was a bad side-effect of this limit: after an instruction triggered a special condition for the emulation all the following instructions are compiled one-by-one into separate blocks. The overhead for calling these tiny blocks was huge, this is why I introduced the rule of ignoring any block which was smaller than 3 instructions. But in this case lots of the code was not JIT compiled at all. (As some of you guys mentioned: the JIT LED was mostly dark - lots of blocks were not compiled.) This is fixed now, although I am a bit afraid of the side-effect of delaying the handling of the special conditions. We will see how it goes.
I also spent some time on updating my old tool: DiskDaisy. The recent updates for AmigaOS4.1 triggered a bug in that app. Sometimes it is nice doing something completely different for a while, you know. :)
Thank you for the update. Unfortunately Lotus2 will crash at the beginning and Apidya will crash before playing, but in another way as before (many graphic glithches).
ReplyDeleteThen more fixing needed, I guess... ;)
DeleteHave you tried these games on WinUAE using JIT? I doubt that Lotus2 would run on 68020 at all with caches enabled.
Hi Thunder. Would you mind introducing yourself? I know of another Thunder/Thunda on the Amiga forums, and I assumed he was you until he said otherwise. Do you post on Amiga forums and use another handle?
DeleteÁlmos, I noticed on this build that the Workbench 3.1 disks will not successfully boot. There is a Software Failure requester for "FailAt". With the previous build, you could at least get to the main Workbench screen, after a couple of recoverable alerts.
I also noticed the mandlebrot test is faster in this build.
Probably the new instructions and the fixes trigger some other bugs. I will check it later on, now I am trying to implement more instructions first.
DeleteThe Mandelbrot test is much faster (or rather the same speed as before some fixes for the Kickstart) because of the change I mentioned in the post: the compiling won't stop when the special flag was set.
I believe CPU cache was not supported until wb 2.0
Delete2.05, to be precise. See FAQ:
Deletehttp://euaejit.blogspot.co.nz/2011/11/while-you-are-waiting-faq.html
Answer for questions:
I tried to run Kickstart 1.3 and it crashed with the JIT enabled!
I tried to run and it crashed with the JIT enabled!
Ok, i am do my tests as well. All the settings files the same as before, jit configs the same, just the binary of uae new.
ReplyDeleteSo far there is few good news:
1. there is no more yellow / red windowses when i start any ADF game. I.e. not in cractros, not in games itself.
2. jit blinks now everythere, but it didn't help :)
3. whdload games crashes not with red windowses at startup as before, but with software failuer "programm failed (error #80000002). Wait for disk activity to finish. 100% reproducable with any whdload game.
The bad news:
with enabled jit everything slower. For example in alladdin: on 20% (cpu at 100% almost all the time, even in menu). In lion-king about 15% slower (in menu just on 4-5% , in game on 10-20%). Technological dead demo freezes the same right at first scene.
More of it, in lion king for example in one of scenes some "blinks" of animation happens now.
Also kidchaos start to dead heavy with black screen when should start a game, with bunch of : "illegal instruction : 712c at 002cfac -> 003cf56".
But i assume its better just to finish now all insturctions, and then we can worry about bugs, do speed tests, etc (?)
Thanks for the report. I am not surprised that most of the things are running slower at the moment, too many instructions are not finished yet and the optimization cannot be turned on.
Delete@Rachy
DeleteIf it intended then good :) How much instruction is left btw ? 20-30% of all of them ?
@Almos
DeleteBy the way, i know that its out of your goals, but as its a bit make hard to do tests, maybe you can check this out as well :) The problem is: before at some older versions of UAE, we have a crash on exit. After a while someone kind of "fix" it (from which you now do your builds), but there is still some memory leak or so on exit happens. Its now just invisibly for casual user because GR didnt' spawns, but memguard catch it pretty well (and in end of all, after 10-20 runs/exit of euae, system will go to unstable state).
There is memguard hit at exit: http://kas1e.mikendezign.com/aos4/jit_tests/uae_hit_on_exit.txt
To reproduce it, all what you need its to run Memguard (on os4depot) at background like this:
run >NIL: work:debug/memguard/memguard DumpWall ShowFails
If want redirect it to screen shell (without use dumpdebugbuffer), then you can run sashimi like this as well:
run >NIL: work:debug/sashimi/sashimi CONSOLE BUFK=64 NOPROMPT ASKEXIT ASKSAVE
If you in interest to check this out, we can also build debug-version of euae with debug symbols, so stack trace will be much more readable with pointing out on actual bug.
It is pretty easy to find out how much instructions left, because all instructions (which will be supported by the JIT) are listed in the src/table68k_comp file. The instruction is implemented when the number next to the instruction name is 1 in that file.
DeleteThe current state is not that pretty: 139 is implemented out of 388 (without the FPU instructions), so roughly 2/3 of the instructions is not implemented yet, which explains why the emulation is slower than the interpretive alone.
Usually there are three versions of one instruction: byte, word and longword sized, mostly these three can be implemented together. Yet, lots to go as you can see.
Implementing an instruction is sometimes easy, sometimes complicated. The easy ones can be done in half an hour, the complicated ones are 1-2 hours. But I have to concentrate on it because it is very easy to do it wrong. For most of the instructions it is impossible to test the implementation fully with all possible flows. But finding the bug in the emulated program is even harder.
So, it takes time, cannot be rushed. On weekdays I don't have much time and I am way too tired to concentrate at night when I finally managed to get home... :(
Regarding the memory leaking: I have noticed this problem when I ran the emulation lot of times in a row for testing. I cannot promise anything regarding the fix, but I guess you could try to fix it by yourself if you had all the information.
@Rachy
ReplyDelete> without the FPU instructions
Should be FPU instructions also be done on JIT ? I.e. will it make sense ?
It does make sense for some instructions, not for all of them. At least for the constant loading and the data moving instructions compiling the special case would help a bit. But the FPU support is a long-term plan at the moment.
Delete@MickJT: I'm Thore, the guy who compiles this stuff on MorphOS and test it there. On the first posts here there was anonymous access, but now I needed an account ;)
ReplyDelete