docs: "Stetching" algorithm visualization #166

WofWca · 2024-02-26T15:13:22Z

We have a high-level explanation:

Lines 63 to 106 in ba9727d

    
           <details><summary>Details, why it's called "stretching"</summary> 
        
           The algorithm we just described cannot "look ahead" in the audio timeline. 
        
           It only looks at the current loudness, at the sample that we've already sent 
        
           to the audio output device. 
        
           But looking ahead (a.k.a. "Margin before") is important, because, for example, 
        
           there are certain sounds in speech that you can start a word with 
        
           that are not very loud. 
        
           But it's not good to skip such sounds just because of that. 
        
           The speech would become harder to understand. 
        
           For example, "throb" would become "rob". 
        
           <!-- You'd probably still understand what's being said based on the context, 
        
           but you'd need to use more mental effort. --> 
        
           Here is where the "stretching" part comes in. 
        
           It's about how we're able to "look ahead" and slow down 
        
           shortly before a loud part. 
        
           Basically it involves slightly (~200ms) _delaying_ the audio 
        
           before outputting it (and that is for a purpose!). 
        
           Imagine that we're currently playing a silent part, 
        
           so the playback rate is higher. 
        
           Now, when we encounter a loud part, we go 
        
           "aha! That might be a word, and it might start with 'th'". 
        
           <!-- , which we might not have marked as loud, because 'th' is not that loud" --> 
        
           As said above, we always delay (buffer) the audio for ~200ms 
        
           before outputting it. 
        
           So we know that these 200ms of buffered audio 
        
           must contain that "th" sound, 
        
           and we want the user to hear that "th" sound. 
        
           But remember: at the time we recorded the said sound, 
        
           the video was playing at _a high speed_, 
        
           but we want to play back that 'th' _at normal speed_. 
        
           So we can't just output it as is. What do we do? 
        
           What we do is we take that buffered (delayed) audio, 
        
           and we _slow it down_ (stretch and pitch-shift it) 
        
           so that it appears to have been played at normal speed! 
        
           Only then do we pass it to the system (which then passes it to your speakers). 
        
           And that, kids, is why we call it "the stretching algorithm". 
        
           For more details, you can check out the comments in its source code. 
        
           </details>

But I'm afraid reading the code would still make anyone's head explode.

I guess it can be visualized with the Chart, and the dev mode where it shows the delay of the stretcher node, but there is no explanation as to what that black vertical line is, or the pink vertical one (only shown in dev mode).

Pink line:

jumpcutter/src/entry-points/popup/Chart.svelte

Lines 294 to 296 in 8baced5

    
               if (PLOT_STRETCHER_DELAY) { 
        
                 stretcherDelaySeries = new TimeSeries(); 
        
               }

Black vertical line:

jumpcutter/src/entry-points/popup/Chart.svelte

Lines 543 to 574 in 8baced5

    
                     // The main algorithm may introduce a delay. This is to display what sound is currently on the output. 
        
                     // Not sure if this is a good idea to use the canvas both directly and through a library. If anything bad 
        
                     // happens, check out the commit that introduced this change – we were drawing this marker by smoothie's 
        
                     // means before. 
        
                     let chartEdgeTimeOffset: TimeDelta; 
        
                     if (timelineIsMediaIntrinsic) { 
        
                       const momentCurrentlyBeingOutputContextTime = latestTelemetryRecord.contextTime - totalOutputDelayRealTime; 
        
                       const momentCurrentlyBeingOutputIntrinsicTime 
        
                         = toIntrinsicTime(momentCurrentlyBeingOutputContextTime, latestTelemetryRecord, prevPlaybackRateChange); 
        
                       const totalOutputDelayIntrinsicTime 
        
                         = latestTelemetryRecord.intrinsicTime - momentCurrentlyBeingOutputIntrinsicTime; 
        
                       // TODO this is incorrect because the delay introduced by `getExpectedElementCurrentTimeDelayed` 
        
                       // is not taken into account. But it's good enough, as that delay is unnoticeable currently. 
        
                       chartEdgeTimeOffset = totalOutputDelayIntrinsicTime; 
        
                     } else { 
        
                       chartEdgeTimeOffset = totalOutputDelayRealTime; 
        
                     } 
        
                     const pixelOffset = (sToMs(chartEdgeTimeOffset) + chartJumpingOffsetMs) / millisPerPixelTweened; 
        
                     // So it's not smeared accross two pixels. 
        
                     const pixelOffsetCentered = Math.floor(pixelOffset) + 0.5; 
        
                     const x = widthPx - pixelOffsetCentered; 
        
                     canvasContext.save(); 
        
                     canvasContext.beginPath(); 
        
                     canvasContext.lineWidth = 1; 
        
                     canvasContext.strokeStyle = jumpPeriodMs === 0 
        
                       ? 'rgba(0, 0, 0, 0.3)' 
        
                       // So it's more clearly visible as it's moving accross the screen. 
        
                       : 'rgba(0, 0, 0, 0.8)'; 
        
                     canvasContext.moveTo(x, 0); 
        
                     canvasContext.lineTo(x, heightPx); 
        
                     canvasContext.stroke(); 
        
                     canvasContext.restore();

WofWca added the documentation Improvements or additions to documentation label Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: "Stetching" algorithm visualization #166

docs: "Stetching" algorithm visualization #166

WofWca commented Feb 26, 2024

docs: "Stetching" algorithm visualization #166

docs: "Stetching" algorithm visualization #166

Comments

WofWca commented Feb 26, 2024