Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo caching in entries queries #592

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nickdaugherty
Copy link
Contributor

Description

Introduces a few new functions to improve caching for entries requests.

Problem

The current approach to querying entries uses WPCOM_Liveblog_Entry_Query::get_all_entries_asc() to grab the full list of entries, then WPCOM_Liveblog_Entry_Query::find_between_timestamps() to slice out the relevant entries from the full list.

This is fast and generally works well, but it breaks down as the size of the Liveblog grows. When the list of entries is too big to fit in a single cache entry (1MB is the default for memcache), Liveblog will fall over completely as there is no caching happening.

The current approach is also not super efficient, as, despite the caching of the entries list, when WPCOM_Liveblog_Entry::for_json() is called in a loop on every entry from inside WPCOM_Liveblog::get_entries_by_time(), at least one additional, uncached DB query is made per entry (for the comment author, via get_comment_class()), and several other functions / filters are called, which is unnecessary CPU cycles when all that can be cached and reused by all subsequent requests.

Solution

Instead of caching everything in a giant blob, we should cache the results of WPCOM_Liveblog::get_entries_by_time(), so that we are only caching a few entries and are preventing any additional queries and processing. When the cache is hot, we can spit out results instantly without worrying about the total number of entries.

To get there, this PR adds 3 new functions:

WPCOM_Liveblog::get_entries_by_time_cached() - operates exactly the same as WPCOM_Liveblog::get_entries_by_time(), but internally does not query all entries at once, and caches the results.

WPCOM_Liveblog_Entry_Query::count_entries() - This returns a count of all entries in a Liveblog, removing deletions. This replaces the need to grab the full list, strip out deletions (via WPCOM_Liveblog::flatten_entries(), then count it. This function caches internally and the cache is busted when a new entry is added, as the last entry timestamp is part of the key.

WPCOM_Liveblog_Entry_Query::get_between_timestamps_with_query() - Replaces WPCOM_Liveblog_Entry_Query::get_between_timestamps() (which queries full list then slices) with a comments query with a date query. Maintains the same expected return value by running the results through WPCOM_Liveblog_Entry_Query::remove_replaced_entries(), just like WPCOM_Liveblog_Entry_Query::find_between_timestamps() does.

Considerations

Instead of the full list, we now do a comments query with a date query to find entries between two timestamps. Internally, get_comments() also caches all queries, so this should be pretty fast.

These new functions aren't yet hooked up anywhere, but have been designed to be a drop-in replacement for WPCOM_Liveblog::get_entries_by_time() inside the REST route handler for entries requests.

WPCOM_Liveblog::get_entries_paged() still uses the full entries list, and should be rewritten. This is less critical, as these requests are comparatively rare. Large enough lists of entries will still stop being cached inside this function, but it may be passable as most traffic is for entries polling.

I believe there is a bug in how pagination is calculated. WPCOM_Liveblog::flatten_entries() currently does not remove update entries (it does remove delete entries), so the total count is higher than the actual number of entries that would render to a page. The new code in WPCOM_Liveblog_Entry_Query::count_entries() matches this behavior by counting all entries that are not deletes. To me this is a bug, but at least the new stuff is matching what exists for now.

Currently there aren't tests for the new functions - those should be added.

This uses a direct get_comments() query (via $this->get() ) with a date_query to find the relevant entries.

This means we can get entries without using the full cache from get_all_entries_asc(), which doesn't scale beyond the number of entries that fit inside a single cache entry
Previously we had to get all entries and count that array, which is inefficient and doesn't scale beyond the size of a single cache entry (usually 1MB)
Better caching approach that scales out on larger Liveblogs, and prevents redundant DB queries and processing on all Liveblogs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants