Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent caching based on response header #65

Closed
sandros94 opened this issue Oct 27, 2023 · 12 comments
Closed

Prevent caching based on response header #65

sandros94 opened this issue Oct 27, 2023 · 12 comments

Comments

@sandros94
Copy link

sandros94 commented Oct 27, 2023

So, long story short I'm trying to accelerate an old Drupal 7 website, that more than 95% is anonymous user's traffic and the rest is mostly admins, editors and publishers working on the platform.
Problem is that Drupal 7 doesn't have an Authorization header, so any logged user will be cached.

While reading #34 I though that a good solution was to only cache based on the response header, since the bulk of public content is pre-rendered by the cms and tagged accordingly via headers. But to my understanding in Caddy there is no way to run a matcher on the response because it is already too late.

A regex based on path isn't an option, since the path of unpublished content is the same as when it would be published.

But, I also have to admit that I still don't fully understand how Souin works, in particular what are the cache keys. Because reading at this and this it almost seems that the functionality of caching only based on response headers is already there, its just I don't understand how to do it.

@francislavoie
Copy link
Member

I think you can respond with Cache-Control: no-cache?

@sandros94
Copy link
Author

I think you can respond with Cache-Control: no-cache?

I was indeed expecting that behavior from Drupal, but it doesn't seem consistent. I'm investigating that side as well

@sandros94
Copy link
Author

To give an update, so that you are able to judge, and decide to close as not planned or leave it open for potentially a future implementation (I would have liked to open up a draft PR, but I'm not yet that familiar with Go).

What I've found

Indeed Drupal 7 (well, I've tested only 7.97) does add Cache-Control: no-cache to the html page for logged in users, but it also adds Cache-Control: max-age=N to the related js for them. Making the actual page be displayed correctly, but not things like form submissions and interactivity.

In further investigation it looks to be a design philosophy compared to an issue: since Drupal is essentially creating an internal, volatile, cache for each user's js ready to be delivered and not regenerating it using PHP on each request (potentially making it quite an expensive task).

@francislavoie
Copy link
Member

francislavoie commented Nov 18, 2023

Huh? Users have different JS payloads sent to them? That's unusual. Typically you have a single JS bundle sent to the client (or multiple files split up, depending on your bundler config) regardless of authentication, and config is set in the HTML on window ahead of time for the JS to read, to change how JS behaves.

@sandros94
Copy link
Author

That's unusual

Indeed and I think this could be down to a Drupal config that aggregates multiple CSS/JS files together. The issue is definitely in the AJAX/jQuery and how they are bundled, but I'm more interested in seeing what a more modern version like Drupal 10 do (I should also look at modules too).

@darkweak
Copy link
Collaborator

darkweak commented Dec 1, 2023

@sandros94 does the frontend send a SESSION cookie?

@sandros94
Copy link
Author

@sandros94 does the frontend send a SESSION cookie?

It doesn't. To my (limited) understanding in Drupal 7 the session is fully handled server-side and it removes any indication of that particular session from headers and cookies. This feels so strange and unusual (I'm used to Vue and mostly do things myself).

I still need to check if the settings that pre-compresses and aggregates both css and js could be the issue with mixing up anonymous and authenticated sessions when using any external cache (since now I could simply use caddy to compress on request)

@francislavoie
Copy link
Member

the session is fully handled server-side and it removes any indication of that particular session from headers and cookies

That's wrong. Cookies are used to identify a session in the backend storage. Otherwise the server would have no idea who the client is.

Sometimes it's a cookie like PHPSESSID or whatever. It depends on the framework etc.

@darkweak
Copy link
Collaborator

darkweak commented Dec 1, 2023

Drupal adds a session cookie in the browser for authenticated users (e.g. SESS49960de5880e8c687434170f6476605b). So you can use the caddy matcher to detect if the request has a cookie with the prefix SESS*, if true, don't use cache, use cache otherwise.

Maybe something like that would work:

@authCookie `header({'Cookie':'*SESS*'})`

route @authCookie {
    # only php_fastcgi
}

route {
    cache
    php_fastcgi
}

@francislavoie
Copy link
Member

Or simpler:

@noAuthCookie not header Cookie *SESS*
cache @noAuthCookie

So cache is only used when there's no session cookie.

@sandros94
Copy link
Author

sandros94 commented Dec 1, 2023

AH! Thank you so much for the explanation!
Quickly found the SESS cookie and I'll test the suggested config soon, I'll update.

UPDATE 1

This made me understand a bit more how Drupal 7.97 works.
Long story short: the suggested config, at least in a first test, does seem to have solved the content editing problem I was facing in the beginning (forms not posting/saving, images not uploading).

What I also understood is that Drupal 7 doesn't regenerate the css and js on a per-user basis but on a per-cache cycle. Each css and js share the same query string based on the current cache cycle, (to be used as a cache buster).

This is further increased when using aggregated content (under `/admin/config/development/performance'), where each group
of css/js becomes a multi-string (a combination of current page and cache cycle). Increasing the possibility of a mismatch/unavailability of the resources compared to the ones specified in the first html (sometimes rendering a broken css). And when this mismatch happens (the frontend receives a request for only a sub-group and not the full page load) Drupal tries to regenerate those caches and most often than not it provides an unfinished one, corrupting the resources displayed.

Next step I need to understand if this Drupal version/project can even have a cache/cdn in front.

@darkweak
Copy link
Collaborator

darkweak commented Mar 5, 2024

@sandros94 we're closing this issue, reopen it if needed.

@darkweak darkweak closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants