LSH Implementation with TFIDF Dense Matrix #101

girishmt4 · 2018-04-24T12:07:29Z

I am currently working on Documents similarity project. We are processing text documents to generate TFIDF Vectors for each document in the corpus. In a nutshell, we are working with DENSE DATA with the documents being the data points and TFIDF values of the terms occuring in the document as their features.
We succeeded in implementing LSH with sparse data but it's not quite efficient.
Is it possible to use FALCONN with dense data for LSH implementation?

ludwigschmidt · 2018-04-25T01:04:38Z

Yes, FALCONN supports dense data. In fact, the support for dense data is better than for sparse data. But if your data is very high-dimensional, the dense approach might not be efficient. What dimension do you work with?

girishmt4 · 2018-04-25T21:55:05Z

I am currently working with a dataset that stores the TF-IDF values for only those terms that occur in the particular document. So, every point will have different dimension.
What is your say on this?

ludwigschmidt · 2018-04-26T03:08:55Z

In that case, using a sparse representation might be better.

girishmt4 · 2018-04-26T04:12:14Z

can you explain the reason behind that? I am still wondering why sparse representation can perform better than the dense one!

ludwigschmidt · 2018-04-26T19:45:39Z

With a dense representation, the code will be performing many unnecessary multiplications with zero.

A-Guldborg pushed a commit to duckth/FOENNIX that referenced this issue May 12, 2024

sync to try on another build machine (FALCONN-LIB#101)

8aafc95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSH Implementation with TFIDF Dense Matrix #101

LSH Implementation with TFIDF Dense Matrix #101

girishmt4 commented Apr 24, 2018

ludwigschmidt commented Apr 25, 2018

girishmt4 commented Apr 25, 2018

ludwigschmidt commented Apr 26, 2018

girishmt4 commented Apr 26, 2018

ludwigschmidt commented Apr 26, 2018

LSH Implementation with TFIDF Dense Matrix #101

LSH Implementation with TFIDF Dense Matrix #101

Comments

girishmt4 commented Apr 24, 2018

ludwigschmidt commented Apr 25, 2018

girishmt4 commented Apr 25, 2018

ludwigschmidt commented Apr 26, 2018

girishmt4 commented Apr 26, 2018

ludwigschmidt commented Apr 26, 2018