Just a question...

Discuss anything about DraStic here.
Post Reply
User avatar
huckleberrypie
Posts:442
Joined:Sat May 31, 2014 4:21 am
Contact:
Just a question...

Post by huckleberrypie » Thu Jun 26, 2014 1:30 am

Since the DS is an ARM-powered device, I was wondering as to how vastly different the architecture/instruction set is compared to the ARMs used on Androids, and if it actually aids in speeding up emulation.

Exophase
Posts:1716
Joined:Mon Aug 05, 2013 9:08 pm

Re: Just a question...

Post by Exophase » Thu Jun 26, 2014 1:50 am

The instruction set is compatible, and it gives it a small advantage, but not a very big one. Only a small part of the emulation time is spent emulating actual CPU instructions that do real work.

User avatar
huckleberrypie
Posts:442
Joined:Sat May 31, 2014 4:21 am
Contact:

Re: Just a question...

Post by huckleberrypie » Thu Jun 26, 2014 2:04 am

Exophase wrote:The instruction set is compatible, and it gives it a small advantage, but not a very big one. Only a small part of the emulation time is spent emulating actual CPU instructions that do real work.
You mean instructions that are specific to the DS's processors?

Exophase
Posts:1716
Joined:Mon Aug 05, 2013 9:08 pm

Re: Just a question...

Post by Exophase » Thu Jun 26, 2014 2:15 am

huckleberrypie wrote:You mean instructions that are specific to the DS's processors?
There aren't instructions specific to the DS's processors (there are two, ARM9 and ARM7). All of them are in the ARMv7 CPUs in modern Android devices. But it doesn't make that much of a difference. Depending on the game, the emulator may only spend something like 20-30% of the time emulating the CPUs, with the rest of the time spent emulating 2D, 3D, geometry, audio, DMA, and other parts of the DS. Then there's time spent switching between emulating ARM9 and ARM7 and handling system events.

And within the CPU emulation a lot of time is spent beyond doing the "work" that's part of the instructions. All memory accesses are checked to see if they go to fast paths, and ones that go to places like I/O have to go down different paths (and DS games can have a ton of I/O accesses, they're very bare metal). Stores have to be checked to see if they modify code. Cycle counters have to be checked to see if it's time to switch to emulating something else. Indirect branches have to check translation tables to determine where to find the recompiled code. All of these extra checks alter the emulated CPU flags so there's overhead dealing with that. And you need more registers to handle this stuff so there's overhead with that too. In the end of the day, the time you save because you do something that an ARM CPU can do in one instruction but another CPU can't isn't that big of a deal. Given the choice I'd rather be targeting ARM64 instead - it throws away a lot of the instructions from the old ARMs like DS but it makes up for it by having a lot more registers.

User avatar
huckleberrypie
Posts:442
Joined:Sat May 31, 2014 4:21 am
Contact:

Re: Just a question...

Post by huckleberrypie » Thu Jun 26, 2014 2:23 am

Exophase wrote:
huckleberrypie wrote:You mean instructions that are specific to the DS's processors?
There aren't instructions specific to the DS's processors (there are two, ARM9 and ARM7). All of them are in the ARMv7 CPUs in modern Android devices. But it doesn't make that much of a difference. Depending on the game, the emulator may only spend something like 20-30% of the time emulating the CPUs, with the rest of the time spent emulating 2D, 3D, geometry, audio, DMA, and other parts of the DS. Then there's time spent switching between emulating ARM9 and ARM7 and handling system events.

And within the CPU emulation a lot of time is spent beyond doing the "work" that's part of the instructions. All memory accesses are checked to see if they go to fast paths, and ones that go to places like I/O have to go down different paths (and DS games can have a ton of I/O accesses, they're very bare metal). Stores have to be checked to see if they modify code. Cycle counters have to be checked to see if it's time to switch to emulating something else. Indirect branches have to check translation tables to determine where to find the recompiled code. All of these extra checks alter the emulated CPU flags so there's overhead dealing with that. And you need more registers to handle this stuff so there's overhead with that too. In the end of the day, the time you save because you do something that an ARM CPU can do in one instruction but another CPU can't isn't that big of a deal. Given the choice I'd rather be targeting ARM64 instead - it throws away a lot of the instructions from the old ARMs like DS but it makes up for it by having a lot more registers.
So the approach used in NDS4Droid would be considered more or less a waste if speed is a main consideration, right? I mean, since the target CPU's an ARM and it's more or less the same with the DS, why redundantly emulate things that can be done by the host itself?

Exophase
Posts:1716
Joined:Mon Aug 05, 2013 9:08 pm

Re: Just a question...

Post by Exophase » Thu Jun 26, 2014 2:32 am

huckleberrypie wrote:So the approach used in NDS4Droid would be considered more or less a waste if speed is a main consideration, right? I mean, since the target CPU's an ARM and it's more or less the same with the DS, why redundantly emulate things that can be done by the host itself?
I'm saying, all those things I listed in my last post, you have to do them all whether you're executing on ARM or not. And they take up most of the time. Just because both are ARM doesn't mean you can take the DS code and run it straight, there's a ton of problems with that idea. The only advantage of emulating ARM on ARM is that some instructions, particularly some more esoteric ones (a lot of which are rarely used) can be emulated in one instruction on ARM where it'd take a few on something else. That's it. It doesn't amount to a lot.

DraStic doesn't really have a different approach vs nds4droid, except that a lot of time was spent making sure it'd be fast at pretty much every level of design and implementation. If everything Android were using x86-64 instead of ARM all along the performance probably wouldn't be that much different, since I would have optimized it heavily for x86-64 instead.

User avatar
huckleberrypie
Posts:442
Joined:Sat May 31, 2014 4:21 am
Contact:

Re: Just a question...

Post by huckleberrypie » Thu Jun 26, 2014 2:58 am

Exophase wrote:Just because both are ARM doesn't mean you can take the DS code and run it straight, there's a ton of problems with that idea. The only advantage of emulating ARM on ARM is that some instructions, particularly some more esoteric ones (a lot of which are rarely used) can be emulated in one instruction on ARM where it'd take a few on something else. That's it. It doesn't amount to a lot.

DraStic doesn't really have a different approach vs nds4droid, except that a lot of time was spent making sure it'd be fast at pretty much every level of design and implementation. If everything Android were using x86-64 instead of ARM all along the performance probably wouldn't be that much different, since I would have optimized it heavily for x86-64 instead.
Suffice it to say, playing safe by emulating the ARM architecture on ARM would be better. Which reminds me of the approach the Cxbx developers did by attempting to reimplement the Xbox API rather than emulate x86 on x86, on the grounds that the latter approach would be unnecessary. They did note however that the XBE to EXE conversion approach isn't as easy as it seems, thus why the Xbox emu scene is as of now yet to mature.

brokencodes
Posts:2
Joined:Sun Dec 26, 2021 4:06 am

Re: Just a question...

Post by brokencodes » Sun Dec 26, 2021 4:22 am

I hate to necropost, but I feel it necessary to add to Exophase's statements, and help those who misunderstand what emulation entails:

Emulation is more than just just iterating through the CPU instructions.
You must host all of the periphery of the device that is being emulated.
All of the IO.
All of the components that the software speaks to through that IO.
Much of the ARM architecture is memory mapped, but each ARM SOC (System on chip) has it's own idiosyncrasies.
You have to know how the MMU works if there is one, and you have to be that MMU in software.
You have to know how the interrupt controller works, if there is one, and you have to BE that interrupt controller.
You have to know how the GPU works, know it's oddities, and you have to BE the GPU, and sometimes the GPU runs code itself, in the form of shader kernels, or as integral DSP's, mixers, adders, combiners, shader interpolators, etc...
You have to know how the sound is generated, and you have to BE the sound generator.
Now, to add gas to the fire: You have a time limit, and certain things MUST be done in as close to actual parallel as is possible, or responses to that CPU will not match, screens will have the wrong data, late data, early data etc...
Doing ALL THIS, in such a way that it can operate on more than a single target host device... this is not as simple as "Why doesn't X game work?"

The CPU is less than 20% of the picture.
The CPU is merely the conductor in a symphony of many instruments, that make the music you call an emulation.
Get any one part of this wrong, and it's broken.
Do any one part of this inefficiently, and you have a slow emulation.
Do it all perfectly, and you have a slow emulation also...
It's a balance.
With diminished returns.
/rant

Post Reply