You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When ResourcePool::allocate() reaches a limit (i.e. VRAM budget or maximum queue depth) it'll spin in a busy loop hoping another thread comes around and frees existing allocations via ResourcePool::recycle(). If said busy loop exceeds its time limit, it'll fall back to a brand new allocation instead. I'm assuming this loop doesn't execute under normal circumstances because Streamline doesn't spawn threads and games rarely parallelize slEvaluateFeature() calls.
However, once DLSS-G is enabled there's suddenly 3 threads competing with each other: a game (present) thread, a sl.pacer thread, and a sl.dlssg thread. The game and sl.pacer threads are often contended on ResourcePool's mutex, leading to a problem where ::recycle() is unable to progress after ::allocate() enters its busy loop. Streamline tries to mitigate this deadlock with the following code:
// Prevent deadlocks, time out after a reasonable wait period.
// See comments above about the wait time and VRAM consumption.
while (items.second.empty() && meter.getElapsedTimeUs() < resourcePoolWaitUs)
{
lock.unlock();
// Better than sleep for modern CPUs with hyper-threading
YieldProcessor();
lock.lock();
meter.end();
}
There's an oversight on line 144 as std::mutex does not guarantee fairness. YieldProcessor() is one instruction and makes no real difference. Unlocking and relocking might wake other threads but ::allocate() can reacquire the lock before anybody else gets a chance. This is often the case on my machine.
Based on my not-so-scientific testing I usually hit that 100000us pause in games every 1-2s which results in a stuttery mess. This only occurs with vertical sync enabled through Nvidia Control Panel. Games are smooth with vertical sync off.
I annotated an Nsight trace while trying to understand what's happening. Possibly useful for someone:
The text was updated successfully, but these errors were encountered:
Yeah - we need Sleep(1) instead of YieldProcessor() there. I think it should be a rare condition though. This happens only when there are no resources to recycle. Normally the presenting thread should release the resource by the time rendering thread needs it. To fix it properly - we need a detailed description of the repro case so we can repro the bad case, then fix it, and then verify that the issue is fixed.
Yeah - we need Sleep(1) instead of YieldProcessor() there. I think it should be a rare condition though.
Agreed. Although I was kind of hoping you guys would use a condition variable instead.
Normally the presenting thread should release the resource by the time rendering thread needs it. To fix it properly - we need a detailed description of the repro case so we can repro the bad case, then fix it, and then verify that the issue is fixed.
There's little information to add besides what's posted above. There's no easy repro. VRAM exhaustion is not a factor. What I do know is that I can reproduce minute stutters in a number of games (say, Cyberpunk 2077) in areas with light CPU load - probably because the kernel thread scheduler doesn't preempt the busy loop. I don't plan on root causing it as there's no source code or symbols available for sl.dlss_g.dll.
No amount of fiddling with settings seems to change things, so I've accepted it as a consequence of my HW/OS (Windows Server 2022) configuration. I binary patched various Streamline DLLs and that's a good enough "fix" for me.
Given the rarity, I don't think it's worth spending time investigating.
When ResourcePool::allocate() reaches a limit (i.e. VRAM budget or maximum queue depth) it'll spin in a busy loop hoping another thread comes around and frees existing allocations via ResourcePool::recycle(). If said busy loop exceeds its time limit, it'll fall back to a brand new allocation instead. I'm assuming this loop doesn't execute under normal circumstances because Streamline doesn't spawn threads and games rarely parallelize slEvaluateFeature() calls.
However, once DLSS-G is enabled there's suddenly 3 threads competing with each other: a game (present) thread, a sl.pacer thread, and a sl.dlssg thread. The game and sl.pacer threads are often contended on ResourcePool's mutex, leading to a problem where ::recycle() is unable to progress after ::allocate() enters its busy loop. Streamline tries to mitigate this deadlock with the following code:
Streamline/source/platforms/sl.chi/generic.cpp
Lines 135 to 149 in 7ac42e4
There's an oversight on line 144 as std::mutex does not guarantee fairness. YieldProcessor() is one instruction and makes no real difference. Unlocking and relocking might wake other threads but ::allocate() can reacquire the lock before anybody else gets a chance. This is often the case on my machine.
Based on my not-so-scientific testing I usually hit that 100000us pause in games every 1-2s which results in a stuttery mess. This only occurs with vertical sync enabled through Nvidia Control Panel. Games are smooth with vertical sync off.
I annotated an Nsight trace while trying to understand what's happening. Possibly useful for someone:
The text was updated successfully, but these errors were encountered: