Skip to content

Latest commit

 

History

History
248 lines (204 loc) · 7.64 KB

45_Popularity.asciidoc

File metadata and controls

248 lines (204 loc) · 7.64 KB

Boosting by Popularity

Imagine that we have a website that hosts blog posts and enables users to vote for the blog posts that they like. We would like more-popular posts to appear higher in the results list, but still have the full-text score as the main relevance driver. We can do this easily by storing the number of votes with each blog post:

PUT /blogposts/post/1
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   6
}

At search time, we can use the function_score query with the field_value_factor function to combine the number of votes with the full-text relevance score:

GET /blogposts/post/_search
{
  "query": {
    "function_score": { (1)
      "query": { (2)
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": { (3)
        "field": "votes" (4)
      }
    }
  }
}
  1. The function_score query wraps the main query and the function we would like to apply.

  2. The main query is executed first.

  3. The field_value_factor function is applied to every document matching the main query.

  4. Every document must have a number in the votes field for the function_score to work. If every document does not have a number in the votes field, then you must use the {ref}/query-dsl-function-score-query.html#function-field-value-factor[missing property] to provide a default value for the score calculation.

In the preceding example, the final _score for each document has been altered as follows:

new_score = old_score * number_of_votes

This will not give us great results. The full-text _score range usually falls somewhere between 0 and 10. As can be seen in Linear popularity based on an original _score of 2.0, a blog post with 10 votes will completely swamp the effect of the full-text score, and a blog post with 0 votes will reset the score to zero.

Linear popularity based on an original `_score` of `2.0`
Figure 1. Linear popularity based on an original _score of 2.0

modifier

A better way to incorporate popularity is to smooth out the votes value with some modifier. In other words, we want the first few votes to count a lot, but for each subsequent vote to count less. The difference between 0 votes and 1 vote should be much bigger than the difference between 10 votes and 11 votes.

A typical modifier for this use case is log1p, which changes the formula to the following:

new_score = old_score * log(1 + number_of_votes)

The log function smooths out the effect of the votes field to provide a curve like the one in Logarithmic popularity based on an original _score of 2.0.

Logarithmic popularity based on an original `_score` of `2.0`
Figure 2. Logarithmic popularity based on an original _score of 2.0

The request with the modifier parameter looks like the following:

GET /blogposts/post/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field":    "votes",
        "modifier": "log1p" (1)
      }
    }
  }
}
  1. Set the modifier to log1p.

The available modifiers are none (the default), log, log1p, log2p, ln, ln1p, ln2p, square, sqrt, and reciprocal. You can read more about them in the {ref}/query-dsl-function-score-query.html#function-field-value-factor[field_value_factor documentation].

factor

The strength of the popularity effect can be increased or decreased by multiplying the value in the votes field by some number, called the factor:

GET /blogposts/post/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field":    "votes",
        "modifier": "log1p",
        "factor":   2 (1)
      }
    }
  }
}
  1. Doubles the popularity effect

Adding in a factor changes the formula to this:

new_score = old_score * log(1 + factor * number_of_votes)

A factor greater than 1 increases the effect, and a factor less than 1 decreases the effect, as shown in Logarithmic popularity with different factors.

Logarithmic popularity with different factors
Figure 3. Logarithmic popularity with different factors

boost_mode

Perhaps multiplying the full-text score by the result of the field_value_factor function still has too large an effect. We can control how the result of a function is combined with the _score from the query by using the boost_mode parameter, which accepts the following values:

multiply

Multiply the _score with the function result (default)

sum

Add the function result to the _score

min

The lower of the _score and the function result

max

The higher of the _score and the function result

replace

Replace the _score with the function result

If, instead of multiplying, we add the function result to the _score, we can achieve a much smaller effect, especially if we use a low factor:

GET /blogposts/post/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field":    "votes",
        "modifier": "log1p",
        "factor":   0.1
      },
      "boost_mode": "sum" (1)
    }
  }
}
  1. Add the function result to the _score.

The formula for the preceding request now looks like this (see Combining popularity with sum):

new_score = old_score + log(1 + 0.1 * number_of_votes)
Combining popularity with `sum`
Figure 4. Combining popularity with sum

max_boost

Finally, we can cap the maximum effect that the function can have by using the max_boost parameter:

GET /blogposts/post/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field":    "votes",
        "modifier": "log1p",
        "factor":   0.1
      },
      "boost_mode": "sum",
      "max_boost":  1.5 (1)
    }
  }
}
  1. Whatever the result of the field_value_factor function, it will never be greater than 1.5.

Note
The max_boost applies a limit to the result of the function only, not to the final _score.