Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% cpu usage when send data with channel.data #266

Open
VladTheJunior opened this issue Apr 15, 2024 · 6 comments
Open

100% cpu usage when send data with channel.data #266

VladTheJunior opened this issue Apr 15, 2024 · 6 comments

Comments

@VladTheJunior
Copy link

VladTheJunior commented Apr 15, 2024

Recently I got strange behavior on my app, it can stuck with 100% cpu usage forever, and only restart of proccess can fix that.
After some investigations, I found that it happens here:

pub async fn execute(&mut self, command: &str, input: Option<&[u8]>) -> Result<SshResult> {
        let mut stdout_buffer = vec![];
        let mut stderr_buffer = vec![];
        let mut channel = self.session.channel_open_session().await?;
        let mut status = None;
        channel.exec(true, command).await?;
        if input.is_some() {
            channel.data(input.unwrap()).await?; // <--- 100% cpu happens here
            channel.eof().await?;
        }

        while let Some(msg) = channel.wait().await {
            match msg {
                ChannelMsg::Data { ref data } => {
                    stdout_buffer.write_all(data).await?;
                }
                ChannelMsg::ExtendedData { ref data, ext } => {
                    if ext == 1 {
                        stderr_buffer.write_all(data).await?
                    }
                }
                ChannelMsg::ExitStatus { exit_status } => {
                    status = Some(exit_status);
                }
                _ => { }
            }
        }

        match status {
            Some(status) => {
                if status != 0 {
                    bail!(
                        "Failed with exit code {}\n{}\n{}",
                        status,
                        String::from_utf8_lossy(&stdout_buffer),
                        String::from_utf8_lossy(&stderr_buffer)
                    )
                }
                Ok(SshResult {
                    stdout: stdout_buffer,
                    stderr: stderr_buffer,
                    status: status,
                })
            }
            None => bail!("Did not exit cleanly"),
        }
    }

Debug log:

[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 6
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 9
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 108
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 7
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 8
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 9
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:187 - reading, len = [220, 62, 234, 212]
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:192 - reading, seqn = 6
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:195 - reading, clear len = 44
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:200 - read_exact 48
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:203 - read_exact done
[2024-04-15T03:49:22Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:210 - reading, padding_length 6
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:187 - reading, len = [30, 175, 215, 122]
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:192 - reading, seqn = 7
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:195 - reading, clear len = 60
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:200 - read_exact 64
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:203 - read_exact done
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:210 - reading, padding_length 18
[2024-04-15T03:49:26Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:628 - channel_window_adjust
[2024-04-15T03:49:26Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:633 - amount: 8192
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 10
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:49:26Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:187 - reading, len = [186, 146, 45, 19]
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:192 - reading, seqn = 8
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:195 - reading, clear len = 60
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:200 - read_exact 64
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:203 - read_exact done
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:210 - reading, padding_length 18
[2024-04-15T03:49:29Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:628 - channel_window_adjust
[2024-04-15T03:49:29Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:633 - amount: 8192
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 11
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:49:29Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:187 - reading, len = [105, 66, 190, 99]
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:192 - reading, seqn = 9
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:195 - reading, clear len = 60
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:200 - read_exact 64
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:203 - read_exact done
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:210 - reading, padding_length 18
[2024-04-15T03:50:03Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:628 - channel_window_adjust
[2024-04-15T03:50:03Z DEBUG russh::client::encrypted]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\client\encrypted.rs:633 - amount: 8192
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:147 - writing, seqn = 12
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:150 - padding length 18
[2024-04-15T03:50:03Z DEBUG russh::cipher]C:\Users\vladt\.cargo\registry\src\index.crates.io-6f17d22bba15001f\russh-0.43.0\src\cipher\mod.rs:152 - packet_length 8220

log.txt

@Eugeny
Copy link
Owner

Eugeny commented Apr 15, 2024

Could you please take a profile of your program to see where it's looping?

@VladTheJunior
Copy link
Author

How can I do that?

@Eugeny
Copy link
Owner

Eugeny commented Apr 15, 2024

@VladTheJunior
Copy link
Author

Is it enough?

perf.zip

   7.79%  tokio-runtime-w  test_agent         [.] memcpy
   7.77%  tokio-runtime-w  test_agent         [.] <russh::channels::io::tx::ChannelTx<S> as tokio::io::async_write::AsyncWrite>::poll_write
   6.62%  tokio-runtime-w  test_agent         [.] tokio::runtime::scheduler::multi_thread::worker::Context::run
   5.77%  tokio-runtime-w  test_agent         [.] tokio::runtime::scheduler::multi_thread::worker::<impl tokio::runtime::scheduler::multi_thread::ha   5.11%  tokio-runtime-w  test_agent         [.] tokio::sync::batch_semaphore::Semaphore::add_permits_locked
   4.53%  tokio-runtime-w  test_agent         [.] tokio::sync::mutex::Mutex<T>::lock_owned::{{closure}}
   4.52%  tokio-runtime-w  test_agent         [.] <tokio::sync::batch_semaphore::Acquire as core::future::future::Future>::poll
   3.76%  tokio-runtime-w  test_agent         [.] __libc_free
   3.42%  tokio-runtime-w  test_agent         [.] tokio::runtime::context::with_scheduler
   3.16%  tokio-runtime-w  test_agent         [.] __unlock
   2.95%  tokio-runtime-w  test_agent         [.] <tokio::io::util::copy::Copy<R,W> as core::future::future::Future>::poll
   2.79%  tokio-runtime-w  test_agent         [.] tokio::sync::batch_semaphore::Semaphore::release
   2.78%  tokio-runtime-w  test_agent         [.] __lock
   2.77%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::state::State::transition_to_running
   2.46%  tokio-runtime-w  test_agent         [.] tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   2.45%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::state::State::transition_to_idle
   2.14%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::waker::wake_by_ref
   2.04%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::state::State::ref_dec
   1.81%  tokio-runtime-w  test_agent         [.] agent::ssh::SshSession::execute::{{closure}}
   1.80%  tokio-runtime-w  test_agent         [.] sccp
   1.77%  tokio-runtime-w  test_agent         [.] __libc_malloc_impl
   1.70%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::harness::Harness<T,S>::poll
   1.59%  tokio-runtime-w  test_agent         [.] tokio::runtime::context::scoped::Scoped<T>::with
   1.53%  tokio-runtime-w  test_agent         [.] alloc_slot
   1.51%  tokio-runtime-w  test_agent         [.] enframe
   1.42%  tokio-runtime-w  test_agent         [.] get_meta
   1.30%  tokio-runtime-w  test_agent         [.] tokio::runtime::task::core::Core<T,S>::poll

@VladTheJunior
Copy link
Author

VladTheJunior commented Apr 16, 2024

There is also additional context of this problem. Probably root reason is target device.
I trying to execute command which sends ~200KB of data. But for unknown reason it recieves just 40KB and then I met 100% cpu usage on my side and stacked process on device with command (sh -c /cat > /some/file). BUT when I restarted this device problem is gone.
So I can't catch this bug now, but behavior of 100% cpu is bad and can break my app again in the next time (it happened rarely some times before, but I could detect it only yesterday since I have thousands of devices where this command is executing). It should generate some error instead or at least react to timeout

@ondt
Copy link

ondt commented Jul 1, 2024

I'm facing the same issue - 100% CPU usage (single core) when awaiting russh::Channel::data.

In comparison, ssh [email protected] "cat > /dev/null" < /dev/urandom consumes barely any CPU.

Minimal example:

use async_trait::async_trait;
use russh::client;
use russh::client::Config;
use russh_keys::key;
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let addrs = ("10.0.0.123", 22);

    let mut session = client::connect(Arc::new(Config::default()), addrs, Client).await?;

    let auth_ok = session.authenticate_password("user", "pass").await?;
    if !auth_ok {
        anyhow::bail!("auth failed");
    }

    let channel = session.channel_open_session().await?;
    channel.exec(true, "cat > /dev/null").await?;

    let infinite_data = tokio::io::repeat(b'A');

    println!("sending data");
    channel.data(infinite_data).await?;

    unreachable!();
}

struct Client;

#[async_trait]
impl client::Handler for Client {
    type Error = russh::Error;

    async fn check_server_key(
        &mut self,
        _server_public_key: &key::PublicKey,
    ) -> Result<bool, Self::Error> {
        Ok(true)
    }
}

(I'm using russh 0.44.0-beta.2)

Flamegraph: flamegraph.svg.gz
(russh is highlighted)
2024-07-01T23:05:11+02:00_3838x1893

I generated the flamegraph using cargo flamegraph --release. I had to let it run for a few minutes to make the data method more apparent in the perf data.

I also turned on debuginfo in the release profile:

# Cargo.toml

[profile.release]
debug = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants