Skip to content

Commit

Permalink
Change queries used for pagination when order_by parameter is passed
Browse files Browse the repository at this point in the history
The previous logic relied on using concatenation to generate a unique
field that could both be filtered and sorted over. This was done with
the assumption that databases would allow adding an index on such a
concatenated field. But this is actually not the case, neither MySQL nor
Postgres support indices on functions like `CONCAT`.

Therefore, the queries used by the gem had a vert poor performance.

To combat this, the queries are now changed to query the given
`order_by` field and the ID field separately and also have them
separately in the `ORDER BY` clause. This allows making use of a
compound or multicolumn index on the combination of the two fields.

Since both MySQL and Postgres support efficient use of compound indices
when accessing / filtering over / ordering by either all fields of the
index or the leftmost fields only, and index on `(<custom_column>, id)`
will ensure a high performance of these queries.

Furthermore, abandoning the concatenation now also ensures that
non-string columns are ordered as expected. Before, e.g. integer columns
were treated as text columns, leading to weird orders like
`1, 10, 11, 2, 3, 4`. This has also been solved with these changes.

However, due to this changing the order of the returned results,
releasing this cause a breaking change.

Since the set of tables this gem can be used on will surely be fairly
limited, we want the string returned by the `Paginator#id_column`
method to be frozen. In Ruby < 3.0 this is automatic through the magic
comment on top of the file, however in Ruby 3.0 this is no longer the
case for interpolated strings. Therefore it has to be manually frozen
and the Rubocop rule (which checks for Ruby 2.5 syntax) has to be
adjusted accordingly.
  • Loading branch information
nicolas-fricke committed Apr 19, 2021
1 parent bcbb3f1 commit 4009f7e
Show file tree
Hide file tree
Showing 6 changed files with 105 additions and 65 deletions.
6 changes: 6 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,9 @@ Layout/LineLength:

Naming/VariableNumber:
EnforcedStyle: snake_case

# In Ruby 3.0 interpolated strings will no longer be frozen automatically, so
# to ensure consistent performance, we have to manually call `String#freeze` in
# some places.
Style/RedundantFreeze:
Enabled: false
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ These are the latest changes on the project's `master` branch that have not yet
Follow the same format as previous releases by categorizing your feature into "Added", "Changed", "Deprecated", "Removed", "Fixed", or "Security".
--->

### Changed
- **Breaking change:** The way records are retrieved from a given cursor has been changed to no longer use `CONCAT` but instead simply use a compound `WHERE` clause in case of a custom order and having both the custom field as well as the `id` field in the `ORDER BY` query. This is a breaking change since it now changes the internal order of how records with the same value of the `order_by` field are returned.

## [0.1.3] - 2021-03-17

### Changed
Expand Down
30 changes: 22 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,19 @@ Of course, this can both be combined with `first`, `last`, `before`, and `after`

**Important:**
If your app regularly orders by another column, you might want to add a database index for this.
Say that your order column is `author` then your index should be on `CONCAT(author, '-', id)`.
Say that your order column is `author` then you'll want to add a compound index on `(author, id)`.
If your table is called `posts` you can use a query like this in MySQL or Postgres:
```sql
CREATE INDEX index_posts_on_author_and_id ON posts (author, id);
```
Or you can just do it via an `ActiveRecord::Migration`:
```ruby
class AddAuthorAndIdIndexToPosts < ActiveRecord::Migration
def change
add_index :posts, %i[author id]
end
end
```

Please take a look at the _"How does it work?"_ to find out more why this is necessary.

Expand Down Expand Up @@ -314,11 +326,11 @@ This will issue the following SQL query:
```sql
SELECT *
FROM "posts"
ORDER BY CONCAT(author, '-', "posts"."id") ASC
ORDER BY "posts"."author" ASC, "posts"."id" ASC
LIMIT 2
```

As you can see, this will now order by a concatenation of the requested column, a dash `-`, and the ID column.
As you can see, this will now order by the author first, and if two records have the same author it will order them by ID.
Ordering only the author is not enough since we cannot know if the custom column only has unique values.
And we need to guarantee the correct order of ambiguous records independent of the direction of ordering.
This unique order is the basis of being able to paginate forward and backward repeatedly and getting the correct records.
Expand All @@ -344,17 +356,19 @@ We get this SQL query:
```sql
SELECT *
FROM "posts"
WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4'
ORDER BY CONCAT(author, '-', "posts"."id") ASC
WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4))
ORDER BY "posts"."author" ASC, "posts"."id" ASC
LIMIT 2
```

You can see how the cursor is being translated into the WHERE clause to uniquely identify the row and properly filter based on this.
You can see how the cursor is being used by the WHERE clause to uniquely identify the row and properly filter based on this.
We only want to get records that either have a name that is alphabetically _after_ `"Jane"` or another `"Jane"` record with an ID that is higher than `4`.
We will get the records #5 and #2 as response.

As you can see, when using a custom `order_by`, the concatenation is used for both filtering and ordering.
When using a custom `order_by`, this affects both filtering as well as ordering.
Therefore, it is recommended to add an index for columns that are frequently used for ordering.
In our test case we would want to add an index for `CONCAT(author, '-', id)`.
In our test case we would want to add a compound index for the `(author, id)` column combination.
Databases like MySQL and Postgres are able to then use the leftmost part of the index, in our case `author`, by its own _or_ can use it combined with the `id` index.

## Development

Expand Down
30 changes: 17 additions & 13 deletions lib/rails_cursor_pagination.rb
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,11 @@
#
# SELECT *
# FROM "posts"
# ORDER BY CONCAT(author, '-', "posts"."id") ASC
# ORDER BY "posts"."author" ASC, "posts"."id" ASC
# LIMIT 2
#
# As you can see, this will now order by a concatenation of the requested
# column, a dash `-`, and the ID column. Ordering only the author is not
# As you can see, this will now order by the author first, and if two records
# have the same author it will order them by ID. Ordering only the author is not
# enough since we cannot know if the custom column only has unique values.
# And we need to guarantee the correct order of ambiguous records independent
# of the direction of ordering. This unique order is the basis of being able
Expand Down Expand Up @@ -128,18 +128,22 @@
#
# SELECT *
# FROM "posts"
# WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4'
# ORDER BY CONCAT(author, '-', "posts"."id") ASC
# WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4))
# ORDER BY "posts"."author" ASC, "posts"."id" ASC
# LIMIT 2
#
# You can see how the cursor is being translated into the WHERE clause to
# uniquely identify the row and properly filter based on this. We will get
# the records #5 and #2 as response.
#
# As you can see, when using a custom `order_by`, the concatenation is used
# for both filtering and ordering. Therefore, it is recommended to add an
# index for columns that are frequently used for ordering. In our test case
# we would want to add an index for `CONCAT(author, '-', id)`.
# You can see how the cursor is being used by the WHERE clause to uniquely
# identify the row and properly filter based on this. We only want to get
# records that either have a name that is alphabetically after `"Jane"` or
# another `"Jane"` record with an ID that is higher than `4`. We will get the
# records #5 and #2 as response.
#
# When using a custom `order_by`, this affects both filtering as well as
# ordering. Therefore, it is recommended to add an index for columns that are
# frequently used for ordering. In our test case we would want to add a compound
# index for the `(author, id)` column combination. Databases like MySQL and
# Postgres are able to then use the leftmost part of the index, in our case
# `author`, by its own _or_ can use it combined with the `id` index.
#
module RailsCursorPagination
class Error < StandardError; end
Expand Down
85 changes: 49 additions & 36 deletions lib/rails_cursor_pagination/paginator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ class InvalidCursorError < ParameterError; end
# Cursor to paginate upto (excluding). Can be combined with `last`.
# @param order_by [Symbol, String, nil]
# Column to order by. If none is provided, will default to ID column.
# NOTE: this will cause an SQL `CONCAT` query. Therefore, you might want
# to add an index to your database: `CONCAT(<order_by_field>, '-', id)`
# NOTE: this will cause the query to filter on both the given column as
# well as the ID column. So you might want to add a compound index to your
# database similar to:
# ```sql
# CREATE INDEX <index_name> ON <table_name> (<order_by_field>, id)
# ```
# @param order [Symbol, nil]
# Ordering to apply, either `:asc` or `:desc`. Defaults to `:asc`.
#
Expand Down Expand Up @@ -374,38 +378,6 @@ def decoded_cursor_field
decoded_cursor.first
end

# The SQL identifier of the column we need to consider for both ordering and
# filtering.
#
# In case we have a custom field order, this is a concatenation
# of the custom order field and the ID column joined by a dash. This is to
# ensure uniqueness of records even if they might have duplicates in the
# custom order field. If we don't have a custom order, it just returns a
# reference to the table's ID column.
#
# This uses the fully qualified and escaped reference to the ID column to
# prevent ambiguity in case of a query that uses JOINs and therefore might
# have multiple ID columns.
#
# @return [String]
def sql_column
memoize :sql_column do
escaped_table_name = @relation.quoted_table_name
escaped_id_column = @relation.connection.quote_column_name(:id)

id_column = "#{escaped_table_name}.#{escaped_id_column}"

sql =
if custom_order_field?
"CONCAT(#{@order_field}, '-', #{id_column})"
else
id_column
end

Arel.sql(sql)
end
end

# Ensure that the relation has the ID column and any potential `order_by`
# column selected. These are required to generate the record's cursor and
# therefore it's crucial that they are part of the selected fields.
Expand All @@ -432,19 +404,60 @@ def relation_with_cursor_fields
#
# @return [ActiveRecord::Relation]
def sorted_relation
unless custom_order_field?
return relation_with_cursor_fields.reorder id: pagination_sorting.upcase
end

relation_with_cursor_fields
.reorder(sql_column => pagination_sorting.upcase)
.reorder(@order_field => pagination_sorting.upcase,
id: pagination_sorting.upcase)
end

# Return a properly escaped reference to the ID column prefixed with the
# table name. This prefixing is important in case of another model having
# been joined to the passed relation.
#
# @return [String (frozen)]
def id_column
escaped_table_name = @relation.quoted_table_name
escaped_id_column = @relation.connection.quote_column_name(:id)

"#{escaped_table_name}.#{escaped_id_column}".freeze
end

# Applies the filtering based on the provided cursor and order column to the
# sorted relation.
#
# In case a custom `order_by` field is provided, we have to filter based on
# this field and the ID column to ensure reproducible results.
#
# To better understand this, let's consider our example with the `posts`
# table. Say that we're paginating forward and add `order_by: :author` to
# the call, and if the cursor that is passed encodes `['Jane', 4]`. In this
# case we will have to select all posts that either have an author whose
# name is alphanumerically greater than 'Jane', or if the author is 'Jane'
# we have to ensure that the post's ID is greater than `4`.
#
# So our SQL WHERE clause needs to be something like:
# WHERE author > 'Jane' OR author = 'Jane' AND id > 4
#
# @return [ActiveRecord::Relation]
def filtered_and_sorted_relation
memoize :filtered_and_sorted_relation do
next sorted_relation if @cursor.blank?

sorted_relation.where "#{sql_column} #{filter_operator} ?", filter_value
unless custom_order_field?
next sorted_relation.where "#{id_column} #{filter_operator} ?",
decoded_cursor_id
end

sorted_relation
.where("#{@order_field} #{filter_operator} ?", decoded_cursor_field)
.or(
sorted_relation
.where("#{@order_field} = ?", decoded_cursor_field)
.where("#{id_column} #{filter_operator} ?", decoded_cursor_id)
)
end
end

Expand Down
16 changes: 8 additions & 8 deletions spec/rails_cursor_pagination/paginator_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -162,25 +162,25 @@
]
end
let(:posts_by_author) do
# Note that the ID is being used in a string sorting. Therefore, the order
# is '1' < '10' < '2' (instead of 1 < 2 < 10 as it would be for integers).
# Posts are first ordered by the author's name and then, in case of two
# posts having the same author, by ID
[
# All posts by "Jane"
post_13,
post_2,
post_3,
post_5,
post_7,
post_10,
post_13,
# All posts by "Jess"
post_9,
post_1,
post_10,
# All posts by "John"
post_11,
post_12,
post_1,
post_4,
post_6,
post_8
post_8,
post_11,
post_12
]
end

Expand Down

0 comments on commit 4009f7e

Please sign in to comment.