the BM25F ranking formula is an extension of the BM25 ranking formula, modified to work on documents with several fields (see Wikipedia BM25F, or this article by Perez-Iglesias et al.).
I wrote bm25f for the first time in 2010 when I was collaborating with Europeana, and then upgraded it several times (with help from Yorgos Mamakis) to the newer versions of Solr, but I never submitted a patch (my bad, I was shy). I upgraded the old code to the Solr 6 interface during the Lucene4IR Hackathon and during the London Lucene Solr Meetup Hackathon.
- Together with Henry Cleland we ported the bm25f ranking function for a single term query. The bm25f boolean (multiterm-)query needs to be fixed (and tested). The code that still has to be fixed is commented in the repo;
explain()
can be improved (and in general all the code, some methods/variables are not used, finals can be added ... );- More unit tests can be added, adapting them from the old ones (available in the old repo);
- Improve documentation, again I had some documentation in the old repo.
If you work want to work on this feel free to reach me at my email address diego [dot] ceccarelli [at] gmail [dot] com