ResourcePool::allocate() may starve other threads, causing hitches or stutters with dlssg enabled #33

Nukem9 · 2024-03-04T18:29:33Z

When ResourcePool::allocate() reaches a limit (i.e. VRAM budget or maximum queue depth) it'll spin in a busy loop hoping another thread comes around and frees existing allocations via ResourcePool::recycle(). If said busy loop exceeds its time limit, it'll fall back to a brand new allocation instead. I'm assuming this loop doesn't execute under normal circumstances because Streamline doesn't spawn threads and games rarely parallelize slEvaluateFeature() calls.

However, once DLSS-G is enabled there's suddenly 3 threads competing with each other: a game (present) thread, a sl.pacer thread, and a sl.dlssg thread. The game and sl.pacer threads are often contended on ResourcePool's mutex, leading to a problem where ::recycle() is unable to progress after ::allocate() enters its busy loop. Streamline tries to mitigate this deadlock with the following code:

Streamline/source/platforms/sl.chi/generic.cpp

Lines 135 to 149 in 7ac42e4

    
           float resourcePoolWaitUs = bytesAvailable > footprint.totalBytes && allocated.second.size() < m_maxQueueSize ? 500.0f : 100000.0f; 
        
           // Use more precise timer 
        
           extra::AverageValueMeter meter; 
        
           meter.begin(); 
        
           // Prevent deadlocks, time out after a reasonable wait period. 
        
           // See comments above about the wait time and VRAM consumption. 
        
           while (items.second.empty() && meter.getElapsedTimeUs() < resourcePoolWaitUs) 
        
           { 
        
               lock.unlock(); 
        
               // Better than sleep for modern CPUs with hyper-threading 
        
               YieldProcessor(); 
        
               lock.lock(); 
        
               meter.end(); 
        
           }

There's an oversight on line 144 as std::mutex does not guarantee fairness. YieldProcessor() is one instruction and makes no real difference. Unlocking and relocking might wake other threads but ::allocate() can reacquire the lock before anybody else gets a chance. This is often the case on my machine.

Based on my not-so-scientific testing I usually hit that 100000us pause in games every 1-2s which results in a stuttery mess. This only occurs with vertical sync enabled through Nvidia Control Panel. Games are smooth with vertical sync off.

I annotated an Nsight trace while trying to understand what's happening. Possibly useful for someone:

jake-nv · 2024-03-05T22:31:28Z

Thanks for the detailed analysis and report. We're discussing this, but it will likely take a while to arrive at a fix everyone is happy with.

kirillNVIDIA · 2024-03-26T16:34:07Z

Yeah - we need Sleep(1) instead of YieldProcessor() there. I think it should be a rare condition though. This happens only when there are no resources to recycle. Normally the presenting thread should release the resource by the time rendering thread needs it. To fix it properly - we need a detailed description of the repro case so we can repro the bad case, then fix it, and then verify that the issue is fixed.

Nukem9 · 2024-03-28T06:54:38Z

Yeah - we need Sleep(1) instead of YieldProcessor() there. I think it should be a rare condition though.

Agreed. Although I was kind of hoping you guys would use a condition variable instead.

Normally the presenting thread should release the resource by the time rendering thread needs it. To fix it properly - we need a detailed description of the repro case so we can repro the bad case, then fix it, and then verify that the issue is fixed.

There's little information to add besides what's posted above. There's no easy repro. VRAM exhaustion is not a factor. What I do know is that I can reproduce minute stutters in a number of games (say, Cyberpunk 2077) in areas with light CPU load - probably because the kernel thread scheduler doesn't preempt the busy loop. I don't plan on root causing it as there's no source code or symbols available for sl.dlss_g.dll.

No amount of fiddling with settings seems to change things, so I've accepted it as a consequence of my HW/OS (Windows Server 2022) configuration. I binary patched various Streamline DLLs and that's a good enough "fix" for me.

Given the rarity, I don't think it's worth spending time investigating.

jake-nv added the need-info Need additional info label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourcePool::allocate() may starve other threads, causing hitches or stutters with dlssg enabled #33

ResourcePool::allocate() may starve other threads, causing hitches or stutters with dlssg enabled #33

Nukem9 commented Mar 4, 2024 •

edited

Loading

jake-nv commented Mar 5, 2024

kirillNVIDIA commented Mar 26, 2024

Nukem9 commented Mar 28, 2024

ResourcePool::allocate() may starve other threads, causing hitches or stutters with dlssg enabled #33

ResourcePool::allocate() may starve other threads, causing hitches or stutters with dlssg enabled #33

Comments

Nukem9 commented Mar 4, 2024 • edited Loading

jake-nv commented Mar 5, 2024

kirillNVIDIA commented Mar 26, 2024

Nukem9 commented Mar 28, 2024

Nukem9 commented Mar 4, 2024 •

edited

Loading