Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework silence interpolation to work around duplicated silence problem. #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

synety-jdebp
Copy link
Contributor

The duplicated silence problem is an artefact of some soft 'phones that exist in the wild.

The scenario is a soft 'phone being put on hold and taken off hold by the other end. These particular soft 'phones sometimes end up sending two RTP packets as follows when they come off hold.

  • The first packet has the next timestamp (e.g. 160 ticks after the last packet when the 'phone went on hold) and some audio, that had been cut off when the 'phone was set to receive-only. It is delayed in arrival time by the entire hold period.
  • The next packet is no further delayed, but has a timestamp incorporating the silence period.

An actual receiving UA drops the first, delayed-arrival, audio; and the silence is implicit. extractaudio tries to do silence interpolation to sort-of match this (to the extent that it can, given that it doesn't discard audio). But it has two triggers for interpolated silence which are both triggered in this scenario. One trigger is the delayed arrival of old-timestamped audio. Another trigger is the gap in the timestamps. This ends up effectively doubling the silence period in the output audio file. (Heartbeats have some slight effect on this, and also result in timestamps decreasing if the hold period is longer than 800 seconds.)

This rework adjusts the silence interpolation as follows:

  • Silence interpolated from sender timestamp gaps is always generated.
  • Silence interpolated from arrival time delay is saved, and is only detected if the delayed packet actually carries some audio data (thereby excluding heartbeats).
    • Arrival-time silence is added after the audio in the delayed packet, rather than before it. It is prepended to the next packet of audio.
    • If sender-indicated silence follows in the next packet, it is deducted from the amount of arrival-time silence.

@sobomax sobomax force-pushed the master branch 6 times, most recently from 55700ad to 2131bec Compare January 13, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant