Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panick when use CosineSimilarity #17

Open
vaaaaanquish opened this issue Aug 9, 2021 · 9 comments
Open

panick when use CosineSimilarity #17

vaaaaanquish opened this issue Aug 9, 2021 · 9 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@vaaaaanquish
Copy link

Hi.

With the latest hora, I get a panic when using hora::core::metrics::Metric::CosineSimilarity.

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54

I think this is an error caused by partial_cmp.
partial_cmp expect it to be None when the given value cannot be ordered.

@salamer
Copy link
Contributor

salamer commented Aug 10, 2021

yes, you are right, because it's float type(https://en.wikipedia.org/wiki/IEEE_754) which have lots of situations, such as NaN. it can not be compared, so I have to make sure for every place where has use partial_cmp is valid.

I will fix it as soon as possible and do something to avoid invalid value(maybe just panic?).

@salamer salamer added bug Something isn't working good first issue Good for newcomers labels Aug 10, 2021
@salamer
Copy link
Contributor

salamer commented Aug 10, 2021

hi @vaaaaanquish, I’ve read a post about hora as I was googling this project. Is it written by you? ;-)

your application is great👍👍👍, and if you want to reach hora's full potential, you can enableSIMD support

firstly, use some command, such as lscpu to check which SIMD instruction your platform support, and export an env variable to enable it

export RUSTFLAGS='-C target-feature=+avx2,+fma'

and in your Cargo.toml to enable simd feature

hora = {version = "0.1.0", features = ["simd"]}

then, use nightly to run your application, you would find that blazingly fast!

PS: The lack of SIMD-supported documents is my carelessness 😢, I will fill it up immediately, for more information, you can check this documentation, feel free to contribute!

@vaaaaanquish
Copy link
Author

@salamer Thank you.
As you said, I had assumed that SIMD was enabled. It is very helpful.

PS: I published example, and I will do some PR in the future. Thx good project 😄
https://github.com/vaaaaanquish/rust-ann-search-example

@salamer
Copy link
Contributor

salamer commented Aug 10, 2021

Can I display this example link in hora's readme?

I think this is a very complete and great example. The model in my demo is still in Python (huggingface), the others are rust, using RPC to connect. not pure Rust 😂😂

Finally, thanks for using hora, I am very happy to see someone using it.

BTW, I'm continuing to study Japanese, 「小さな恋のうた」 is a song I once learned. 😂😂

@vaaaaanquish
Copy link
Author

Can I display this example link in hora's readme?

Of course. I'm glad :)

I'm continuing to study Japanese, 「小さな恋のうた」 is a song I once learned.

Wow. I was wondering why. It's an interesting connection!

@salamer
Copy link
Contributor

salamer commented Aug 10, 2021

my Japanese level may only be able to read but not the level of daily communication 😂.

and of course, the most direct benefit of learning Japanese is to help the Japanese doc which I am writing 😂.

@justinturpin
Copy link

As far as I can tell, this is because the dot product of a vector and itself is always negative, so the square root is always NaN. Is this intentional? It seems like it is as there is a test in calc.rs:

#[cfg(test)]
mod tests {
    use crate::core::calc::dot;
    #[test]
    fn test_dot() {
        let a = [1., 2., 3.];
        let b = [1., 2., 3.];
        assert_eq!(dot::<f32>(&a, &b).unwrap(), -14.0);
    }
}

And both the simd and non-simd implementations negate the sums:

fn dot_product(a: &[$type_id], b: &[$type_id]) -> Result<$type_id, &'static str> {
                assert_eq!(a.len(), b.len());

                #[cfg(feature = $simd_size)]
                {
                    let size = a.len() - (a.len() % $size);
                    let c = a
                        .chunks_exact($size)
                        .map($simd_type::from_slice_unaligned)
                        .zip(b.chunks_exact($size).map($simd_type::from_slice_unaligned))
                        .map(|(a, b)| a * b)
                        .sum::<$simd_type>()
                        .sum();
                    let d: $type_id = a[size..].iter().zip(&b[size..]).map(|(p, q)| p * q).sum();
                    Ok(-(c + d))  // negated here
                }
                #[cfg(not(feature = $simd_size))]
                {
                    Ok(-(a.iter().zip(b).map(|(p, q)| p * q).sum::<$type_id>())) // negated here
                }
            }

Removing the negation allows cosine similarity to work. Any suggestions about where to go from here?

@andrew-pyle
Copy link

Great repository! I was able to integrate hora search into a rust application even as an inexperienced rust programmer.

But I cannot use hora::core::metrics::Metric::CosineSimilarity because of the same panic: "called Option::unwrap() on a None value"

What is the status of a fix here?

@httpjamesm
Copy link

I'm getting the same crash. Euclidian works, but not Cosine Similarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants