Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOSbox-X support #22

Open
Torinde opened this issue Aug 18, 2023 · 9 comments
Open

DOSbox-X support #22

Torinde opened this issue Aug 18, 2023 · 9 comments
Labels
DOSBox Issue related to DOSBox/DOSBox-X enhancement New feature or request

Comments

@Torinde
Copy link
Contributor

Torinde commented Aug 18, 2023

DOSbox-X supports Win9x, so it would greatly benefit from SoftGPU 3D accelerated driver.

What will be needed to achieve that?

  • VBoxSVGA device provided by DOSbox-X (Issue 3405 there)
  • adaptations at SoftGPU side?
  • will SSE3/4.2/AVX/2 help performance with emulated CPU as in DOSbox? https://github.com/JHRobotics/mesa9x#requirements
  • 17.x Win95 runs on Pentium II, while DOSbox-X emulates up to Pentium III (SSE), so at least that should run?
  • Is SSE3 required for Win98/Me or 17.x can be compiled for those without SSE3? (Issue at DOSbox-X for SSE2 and beyond)
  • Is SSE4.2 required for 21.x or 21.x can be compiled without it?
  • What are the minimum requirements for Win98, for 21.x?
  • Does it make sense for future Mesa versions to increase the minimum requirements, e.g. will DOSbox need more than SSE3 in the future?

DOSbox Pure is anothe DOSbox fork that officially supports Win9x.
86Box "Virtual PC" machine and related emulators would benefit from support as well.
86Box discussion
dosemu2 discussion

@JHRobotics
Copy link
Owner

Hello, main problem with requirement of SSE was problematic CRT (C runtime) in MSYS MinGW distribution, which was compiled with SSE instructions, so they were inserted to code regardless on “-march” flag. Version 17.x I was able to compile with older complier but newer no (C++14 is required).

However I found working MinGW distribution (from here: https://github.com/niXman/mingw-builds-binaries/releases/tag/13.1.0-rt_v11-rev1) without this behaviour – so in last release the “Windows 95” binaries are without SSE instructions in runtime, but LLVMpipe is still able to use SSE instructions, if they are present.

I appreciate the effort to get some additional graphics acceleration options into DOSBox, but I'm not entirely sure if VMWare SVGA is the way to go. The problem with this is that quite a lot of calculations are done in the guest system, and in the newer version of the protocol (GPU gen. 10 or SVGA-III), for example, all the surfaces (textures, framebuffer, and other work buffers) and graphics structures are stored normally in RAM (that are on the real HW located in VRAM). Of course, this has its own logic - if you run multiple virtual machines, you must try to allocate resources as efficiently as possible, and if all things are in memory in the guest system, you have no hidden overhead, and you can also inflate the allocated RAM (memory ballooning) according to how much you really need and you don't have dead textures somewhere in the host memory.

But in the emulator, on the contrary, you have to try to compute as few things as possible in it. Because even with dynamic recompiler, emulation is very, very slow compared to native code. Therefore, it seems to me that a much better way is using shared memory for surfaces (textures, framebuffer, ... = the guest needs to write and read its data) and with addition some FIFO queue, where guest API calls will be pushed.

3D acceleration which is done in this way, for example here: https://github.com/kjliew/qemu-3dfx and basically also in DOSBox when GLIDE is emulated (the entire HW is not emulated, but individual calls are then passed to the library in the host system). It only has the problem, that a driver has to be written for it, which is not entirely easy in the case of Windows 9x, but it still seems easier to me than either optimizing a complex driver to be faster in the emulator or emulating a real (and mostly poorly documented) graphic HW.

Speaking of DOSBox, it should also be mentioned, that some S3 Trio and especially S3 Virge are capable of 3D acceleration - although the DDI version is only 5.0 (DirectX 7 maximum) and the real HW was so slow that it was nicknamed “decelerator”, in theory can work in the emulator better. In addition, there is a driver including source code (part of DDK98).

Sorry that my answer has a lot of letters, but even in virtual machine the performance of the driver depends a lot on the performance of the CPU itself (e.g. Intel i7 4th gen. + GTX 1650 vs. AMD Ryzen 5, 3th generation, or Intel i5 11th gen. + integrated GPU wins the newer CPU by quite a lot, regardless of the graphics card used). And I have a feeling that my driver will be really very slow in the emulator.

But I don’t want to only be saying "won't work", I'll try to follow the development of the implementation and, if I'm able, I'll try to contribute some code.

@Torinde
Copy link
Contributor Author

Torinde commented Sep 4, 2023

Thank you! I appreciate the long answer!

So, you're saying that your implementation relies on high CPU performance in the guest, so you worry it may be too slow in emulators (vs hypervisors, joncampbell123/dosbox-x/issues/1089). Still worth to try.

Also, another implementation can be written, which shifts the computation to the host and which guest driver uses minimal resources.

@joncampbell123 - FYI, maybe you will be interested to work with @JHRobotics on that?

@joncampbell123
Copy link

Maybe on x86/x86-64 platforms that support SSE, the emulator could execute the same or similar instruction for the guest to speed things up? Sort of like all the FPU code inherited from SVN.

@Torinde
Copy link
Contributor Author

Torinde commented Sep 9, 2023

Latest readme says

If you decide to use 98/Me build, your (virtual) CPU needs support these instructions MMX, SSE, SSE2, SSE3, SSSE3, CX16, SAHF and FXSR (Intel Core2).

  • SAHF is 8086 instruction that was missing in 64-bit of the earliest x86-64 CPUs
  • FXSR are FXSAVE and FXRSTOR Pentium II instructions for x87/MMX/SSE
  • CX16 - what is this? I assume the third option:
    • CMPXCHG r/m,r16 from 486
    • CMPXCHG8B m64 from Pentium
    • CMPXCHG16B m128 from x86-64, but missing in the earliest models

So, basically any real SSSE3 CPU will cover the requirements, except 32-bit Atom, which lack CMPXCHG16B (but maybe that instruction can be emulated additionally).
For emulators lacking one of the above - there is the 95 build.

@JHRobotics
Copy link
Owner

Instruction requirements are from GCC manual: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

In theory build can be optimized for any CPU you want.

I'm only afraid code complexity of Mesa3D, because vmwsgl32.dll without debug symbols still have 15 MB. But if Mesa in software mode works in QEMU without acceleration (KVM or WHPX), it'll work in DOSbox :-)

@Torinde
Copy link
Contributor Author

Torinde commented Sep 11, 2023

Thanks, so it's CMPXCHG16B.

EDIT: GCC link actually says "This option enables GCC to generate CMPXCHG16B instructions in 64-bit code" for the -mcx16 option, so I assume the 32-bit won't use any of those (e.g. that switche is ignored for SoftGPU)?

Same "in 64-bit code" is mentioned for the -msahf option, but SAHF/LAHF are always supported in 32-bit CPUs/modes anyway.

@Torinde
Copy link
Contributor Author

Torinde commented Sep 23, 2023

the emulator could execute the same or similar instruction for the guest to speed things up?

Isn't that similar to implementing a hypervisor core? Or you plan some (easier to implement?) middle ground where only parts of the CPU execution are transferred to the host, but it's not a full hypervisor core?

Will that use AVX (or more) from the host (llvmlipe SoftGPU benefits from AVX)? Or it can only go up to the maximum emulated by DOSbox-X (SSE)?

@JHRobotics
Copy link
Owner

Isn't that similar to implementing a hypervisor core? Or you plan some (easier to implement?) middle ground where only parts of the CPU execution are transferred to the host, but it's not a full hypervisor core?

It's relative simpler - for example if emulator finds byte sequence 0F 58 CA (ADDPS xmm1, xmm2), it'll execute ADDPS xmm1, xmm2 on host. x87 FPU is implement on DOSBox same way (or it was, is long time ago, since I examine DOSBox core, and this is only x86 compatible). Dynamic DOSBox core is bit simitar to dynamic recompiler from older hypervisors (more than 10 years ago, for virtualization without HW assistance, but unsupported now). But there is huge difference for purpose - hypervisor is designed to run 32 or 64 bit RING-3 code without minimal performance penalty and rest it's emulated (very precise, but very slow) - every execution of I/O or privileged instruction cost very large performance penalty. But in DOSBox is important speed in real/virtual x86 mode and if some DOS game runs in PM-321 is still using IO or BIOS interrupts to communicate with HW. DOSBox also need emulate precise timing of instructions but this is absolutely unimportant for hypervisors.

Will that use AVX (or more) from the host (llvmlipe SoftGPU benefits from AVX)? Or it can only go up to the maximum emulated by DOSbox-X (SSE)?

AVX is useful only if rendering is pure software (with llvmlipe) but it is slow even on real hardware, on VM is about /2 slower. I done some tests on QEMU with disabled CPU accelerator and it's really slow2. Pure software 3D on guest isn't the way for now.

Anyway, I'm currently trying to passthrough 3D commands to host's GPU (like qemu-3dfx) in DOSBox-X with modified S3 ViRGE driver. If I'll be successful, I'll share the results (and code) :-)

Footnotes

  1. 32-bit protected mode

  2. "slow" isn't exactly the right word, I think a new word needs to be invented for it, because only "slow" doesn't describe the "slowness" of this operation.

@JHRobotics JHRobotics added enhancement New feature or request DOSBox Issue related to DOSBox/DOSBox-X labels Sep 24, 2023
@Torinde
Copy link
Contributor Author

Torinde commented Sep 25, 2023

Great to hear that!

Regarding DDI/DirectX levels

  • Voodoo drivers are open source, correct? Although no real card supported more than DirectX6, maybe some have DDI 6 or 7? If that's the case I assume DOSbox-X can be modified to emulate the required Voodoo model (maybe using code from PCem/86box)...
  • DOSbox-X also recently got ATI Mach64 support (I don't know how mature it is), but again - I assume its DDI levels are low.

:) new word for slow... virgespeed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DOSBox Issue related to DOSBox/DOSBox-X enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants