TkSilver wrote:I am pretty sure Exo uses a lot of hacks and other tricks to get DraStic up to the speed it runs at. Where I think the desktop DS emulators are more accurate.
There are some things about DeSmuME that are more accurate and expensive, like some details with DMA timing and if you enable advanced bus timing it emulates cache misses which gets more accuracy at a big cost. DraStic tries to do some other things to improve timing accuracy that DeSmuME doesn't, like gamecard read latency (not that expensive) and load instruction interlocking (not expensive for a recompiler). While the advanced bus timing can make a significant difference in accuracy for DeSmuME I doubt it's something that's ever enabled in any of its mobile ports, and I'm not sure how it interacts with its recompiler.
For 2D, 3D, and geometry the accuracy is pretty similar for the most part. DraStic is faster because it's more optimized and vectorizable, and contains a lot of NEON code. There's a few features that are missing on DraStic like OBJ mosaic, but in other ways it's more accurate - the geometry engine operates with the right integer precision and the 3D rendering at least tries to be close, while DeSmuME does all of this with floating point which is pretty significantly off. Neither emulator supports anti-aliasing. Because of this DraStic doesn't have half-blended edge marking, while DeSmuME does but only because it performs edge marking in a way that's totally different from how a real DS does it.
I'm not sure about what No$ supports these days and how accurate it is.
I think if DraStic was changed to be as accurate than DeSmuME in all ways (while still retaining the ways in which it's more accurate) it'd be decently slower than it is now, but still much faster than DeSmuME. Unfortunately I kind of made it hard to really move in this direction because when you try to make something more accurate it might fix some games while breaking others, and I don't really have a lot of room to mess with this (I can say with confidence that DeSmuME's emulation is far from totally accurate here, but they might have struck some heuristics that just work better with more games after enough experimentation). For that reason there are some things like DMA delays that are implemented but only actually enabled for like one game.
If anyone's interested these are all of the per-game hacks currently in DraStic:
Mario & Luigi: Bowser's Inside Story: +2 clocks per instruction, *2 DMA cycles, *4 geometry cycles.. this game has a really nasty race condition
Sonic Chronicles: +1 clocks per instruction, apparently this isn't enough to make it actually work.. oops.
Art Academy: Hack to make vcount read as 192 momentarily before the interrupt triggers, to get past an idle loop. On a real DS this works because of a nuance in interrupt latency. DeSmuME emulates this some other way without a per game hack.. it might be possible to emulate correctly without a big expense but I didn't want to take the risk for this one edge case.
Yu-Gi-Oh 5D series: +1 clocks per instruction, DMA consuming CPU time
Spider-Man Shattered Dimensions: +1 clocks per instruction
Puppy Palace: +2 clocks per instruction
Ore ga Omae o Mamoru: +1 clocks per instruction
Legend of Kay: +1 clocks per instruction
American Girl games: geometry swaps stall the geometry engine (this was originally not a per-game hack but broke some other stuff, same thing happened with DeSmuME.. it's because the stall conditions aren't totally modeled right, it's complex)
Zhu Zhu Babies: +1 clocks per instruction, only for the E version
Will o' Wisp DS: +1 clocks per instruction
Element Hunters: +1 clocks per instruction
Florist Shop: Force undeferred 2D, normally the 2D rendering happens in between updates to IO regs/palette/OAM but not VRAM writes (because the VRAM mapping is complex and hard to defer) and this causes problems if the game writes to active VRAM in the middle of the frame, which this does because it uses a dumb framebuffer that's not properly vsync'd.. I'm sure I could handle this without resorting to hacks with a more sophisticated dynamic detection method, it just didn't feel worth it at the time.
It can be seen that MOST of the hacks are adding an extra clock cycle per instruction (for ARM9). This is done in lieu of stalls that you would see from cache misses, wait states to uncached accesses, write buffer fills, dynamic multiplication costs, and probably some other stuff. My general philosophy in timing emulation is to try to make it no slower than a real DS. Usually this results in the games working, although sometimes they perform better than they did on the real thing. Sometimes you have weird race condition bugs in games that breaks them, so this makes it a little slower to try to compensate. Usually the game just breaks in one place and this gets it past that problem area.
In hindsight it probably would have been better to add a statistical cost to memory instructions since that would more closely model the real performance. I'd also like to experiment with other more accurate emulation options and their performance impact, it's just not something I've been able to get to.
DeSmuME does some of this all the time, and more of it with advanced bus timing. But it's not always that accurate. Here's an example: in Kirby Superstar Deluxe, the emulation being too fast on DraStic causes the videos to run too fast. On DeSmuME with advanced bus timing disabled, they run at the right speed, probably because of VRAM write waitstates being added. But if you turn on advanced bus timing, they now run too slow! And this could be because it doesn't emulate the ARM9 write buffer; when there's a cache miss on a store you pay the full price for it while on a real DS you'd only pay for it after the write buffer is full or if there's a subsequent load stall. So while you could say this is more accurate because it at least does something, it's definitely still a simplified model that makes some concessions.