Candle implementation of Mamba [1] inference only. Mamba is an alternative to the transformer architecture. It leverages State Space Models (SSMs) with the goal of being computationally efficient on long sequences. The implementation is based on mamba.rs.
Compared to the mamba-minimal example, this version is far more efficient but would only work for inference.
$ cargo run --example mamba-minimal --release -- --prompt "Mamba is the"