Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: "Stetching" algorithm visualization #166

Open
WofWca opened this issue Feb 26, 2024 · 0 comments
Open

docs: "Stetching" algorithm visualization #166

WofWca opened this issue Feb 26, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@WofWca
Copy link
Owner

WofWca commented Feb 26, 2024

We have a high-level explanation:

jumpcutter/README.md

Lines 63 to 106 in ba9727d

<details><summary>Details, why it's called "stretching"</summary>
The algorithm we just described cannot "look ahead" in the audio timeline.
It only looks at the current loudness, at the sample that we've already sent
to the audio output device.
But looking ahead (a.k.a. "Margin before") is important, because, for example,
there are certain sounds in speech that you can start a word with
that are not very loud.
But it's not good to skip such sounds just because of that.
The speech would become harder to understand.
For example, "throb" would become "rob".
<!-- You'd probably still understand what's being said based on the context,
but you'd need to use more mental effort. -->
Here is where the "stretching" part comes in.
It's about how we're able to "look ahead" and slow down
shortly before a loud part.
Basically it involves slightly (~200ms) _delaying_ the audio
before outputting it (and that is for a purpose!).
Imagine that we're currently playing a silent part,
so the playback rate is higher.
Now, when we encounter a loud part, we go
"aha! That might be a word, and it might start with 'th'".
<!-- , which we might not have marked as loud, because 'th' is not that loud" -->
As said above, we always delay (buffer) the audio for ~200ms
before outputting it.
So we know that these 200ms of buffered audio
must contain that "th" sound,
and we want the user to hear that "th" sound.
But remember: at the time we recorded the said sound,
the video was playing at _a high speed_,
but we want to play back that 'th' _at normal speed_.
So we can't just output it as is. What do we do?
What we do is we take that buffered (delayed) audio,
and we _slow it down_ (stretch and pitch-shift it)
so that it appears to have been played at normal speed!
Only then do we pass it to the system (which then passes it to your speakers).
And that, kids, is why we call it "the stretching algorithm".
For more details, you can check out the comments in its source code.
</details>

But I'm afraid reading the code would still make anyone's head explode.

I guess it can be visualized with the Chart, and the dev mode where it shows the delay of the stretcher node, but there is no explanation as to what that black vertical line is, or the pink vertical one (only shown in dev mode).

Pink line:

if (PLOT_STRETCHER_DELAY) {
stretcherDelaySeries = new TimeSeries();
}

Black vertical line:

// The main algorithm may introduce a delay. This is to display what sound is currently on the output.
// Not sure if this is a good idea to use the canvas both directly and through a library. If anything bad
// happens, check out the commit that introduced this change – we were drawing this marker by smoothie's
// means before.
let chartEdgeTimeOffset: TimeDelta;
if (timelineIsMediaIntrinsic) {
const momentCurrentlyBeingOutputContextTime = latestTelemetryRecord.contextTime - totalOutputDelayRealTime;
const momentCurrentlyBeingOutputIntrinsicTime
= toIntrinsicTime(momentCurrentlyBeingOutputContextTime, latestTelemetryRecord, prevPlaybackRateChange);
const totalOutputDelayIntrinsicTime
= latestTelemetryRecord.intrinsicTime - momentCurrentlyBeingOutputIntrinsicTime;
// TODO this is incorrect because the delay introduced by `getExpectedElementCurrentTimeDelayed`
// is not taken into account. But it's good enough, as that delay is unnoticeable currently.
chartEdgeTimeOffset = totalOutputDelayIntrinsicTime;
} else {
chartEdgeTimeOffset = totalOutputDelayRealTime;
}
const pixelOffset = (sToMs(chartEdgeTimeOffset) + chartJumpingOffsetMs) / millisPerPixelTweened;
// So it's not smeared accross two pixels.
const pixelOffsetCentered = Math.floor(pixelOffset) + 0.5;
const x = widthPx - pixelOffsetCentered;
canvasContext.save();
canvasContext.beginPath();
canvasContext.lineWidth = 1;
canvasContext.strokeStyle = jumpPeriodMs === 0
? 'rgba(0, 0, 0, 0.3)'
// So it's more clearly visible as it's moving accross the screen.
: 'rgba(0, 0, 0, 0.8)';
canvasContext.moveTo(x, 0);
canvasContext.lineTo(x, heightPx);
canvasContext.stroke();
canvasContext.restore();

image

@WofWca WofWca added the documentation Improvements or additions to documentation label Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant