Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Overhaul of the Interpreter's Timing Model #2235

Draft
wants to merge 338 commits into
base: master
Choose a base branch
from

Conversation

Jaklyy
Copy link
Contributor

@Jaklyy Jaklyy commented Dec 13, 2024

Heavily reworks the ARM9 & ARM7 timing models to greatly improve accuracy (and slaughter performance).
Builds upon my work in #2125 and uses the excellent cache implementation found in #1955 (probably want to merge those two first). (hopefully building this pr upon those two doesn't cause any stupid or weird issues with git...? Fingers crossed?)

Implements:

  1. Cache streaming
  2. Write buffer
  3. Bus cycle rounding
  4. Main RAM contention
  5. Improvements to certain instruction timings
  6. Memory stage cycles are now distinguished from the execute stage
  7. Interlocks
  8. Improvements to memory access timings
  9. Minor improvements to DMA timings
  10. ARM9 now only stops for DMA when accessing the bus
  11. Fix ExMemCnt having the incorrect default state. (at least for direct boot, non-direct boot state shouldn't matter...?) (also prevents software from toggling certain bits).
  12. Removes a few non-existent cp15 cache commands

Known Issues:

  1. JIT is completely broken and will most likely need a significant amount of effort to work again.
  2. Write Buffer is very approximate; it needs a lot more work to really be accurate...
  3. There are actually two different types of interlock, this treats all interlocks as identical, which is wrong.
  4. Most DSi stuff has either not been implemented, or extensively tested yet.
  5. There are probably oodles of regressions, freezes, and crashes I have yet to spot.
  6. Main RAM DMA Timings are slightly worse for long DMAs.
  7. Interpreter is roughly half the speed. This is unfortunately just a consequence of chasing high levels of accuracy, and unlikely to be fixed.
  8. ARM7 DMA has yet to be touched.
  9. Full ExMemCnt defaults have yet to be validated; all I know for sure is that bit 15 should be set by default. (TwilightMenu++ relies on this to boot).
  10. Write buffer also uses a shortcut of sorts. It doesn't actually use and increment the address value passed via the fifo. (should be the same as how hw does it?) Im not entirely sure why, but it caused issues.
  11. Nothing is included in savestates yet, so they may be a little broken.

Jaklyy added 30 commits June 25, 2024 11:20
add cycles to the instruction execution time rather than the timestamp directly.
our understanding of how it works is just too incomplete to be worth implementing yet
undo previous commit because actually code cycles *do* matter
too slow, not accurate enough.
we need to do a *lot* more research into the specifics of how this works with all the various aspects of the cpu's timings before we can make a good implementation
behavior seems to be a quirk of the way they made the interlock cycle mandatory
comes with a small-ish performance hit
also ig add some comments next to the svc funcs so that someone searching for "swi" can find them easier
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants