Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serving databases (core.db, extra.db, community.db, ...) #82

Open
bernhard-da opened this issue Nov 20, 2021 · 7 comments
Open

serving databases (core.db, extra.db, community.db, ...) #82

bernhard-da opened this issue Nov 20, 2021 · 7 comments
Labels
enhancement New feature or request feature-draft detailed and structured description about a planned feature

Comments

@bernhard-da
Copy link

hi @nroi , first off, thx a lot for providing flexo. it is a really great and very useful piece of software!

i have however experienced one issue; I am working on a fully updated arch-system with the following flexo.toml in which I changed the path of the cache-directory to /storage/...

flexo.toml

cache_directory = "/storage/flexo/pkg"
low_speed_time_secs = 3
connect_timeout = 3000
mirrorlist_fallback_file = "/storage/flexo/state/mirrorlist"
mirrorlist_latency_test_results_file = "/storage/flexo/state/latency_test_results.json"
listen_ip_address = "0.0.0.0"
port = 7878
mirror_selection_method = "auto"
mirrors_predefined = []
num_versions_retain = 1
[mirrors_auto]
    mirrors_status_json_endpoint_fallbacks = [
        "https://raw.githubusercontent.com/nroi/archlinux-mirrors-status-fallback/main/mirrorlist.json",
    ]
    mirrors_blacklist = [ ]
    https_required = true
    ipv4 = true
    ipv6 = false
    max_score = 2.5
    num_mirrors = 8
    mirrors_random_or_sort = "sort"
    timeout = 350
    refresh_latency_tests_after = "8 days"
    allowed_countries = ["DE", "AT", "NL", "CZ"]

flexo is serving cached packages for all clients in my lan works flawlessly. however, i see the following entries in the server-log for all enabled repos when I do a pacman -Syu on a client.

log

{timestamp} {server} flexo[8289]: [{timestamp} INFO  flexo] Request served [CACHE MISS]: "core/os/x86_64/core.db"
{timestamp} {server} flexo[8289]: [{timestamp} INFO  flexo] core/os/x86_64/core.db.sig is not available at https://mirror.f4st.host/archlinux/
{timestamp} {server} flexo[8289]: [{timestamp} INFO  flexo] core/os/x86_64/core.db.sig was unavailable at all remote mirrors.
{timestamp} {server} flexo[8289]: [{timestamp} INFO  flexo] Request served [NO PAYLOAD]: "core/os/x86_64/core.db.sig"

i have tried with different mirrors but I cannot manage that also the databases are provided from flexo. As I have quite a large number of internal clients the traffic (e.g from community.db) adds up over time. Do I have to set a specific config-setting to make this work or do you have an idea where I could start looking?

@nroi
Copy link
Owner

nroi commented Nov 20, 2021

Hi @bernhard-da,
although the logs may look like there is some kind of problem, Flexo works as intended here: Notice that the messages saying xxx is not available and xxx was unavailable at all remote mirrors only appear for those files that end with .db.sig, but .db files are served just fine. Flexo does not find db.sig files because they are simply not available at the remote mirror. Have a look at this thread where one of the Arch Linux maintainers explains:

Because the databases are not signed yet. The process for doing that is still being worked out...

So, the current status (even if you don't use Flexo) is that Pacman requests those files, receives a 404 response and then just silently ignores the response.

As I have quite a large number of internal clients the traffic (e.g from community.db) adds up over time.

Files ending with .db are another story: Flexo serves the .db files, but it does not cache them. This is intentional, and it cannot be changed at this moment. If Flexo would cache database files like normal files, then clients would eventually receive outdated database files. Of course, one could implement some special caching logic for database files and only cache them for a configurable duration (e.g., so you can configure Flexo to serve the database from cache if the cached version is not more than one hour old). But I decided against this because I found that the benefit does not justify the added complexity. The community.db file is currently just ~ 6 MB, so I never saw an issue in downloading this file a couple of times.

May I ask how fast your internet connection is? Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?

@bernhard-da
Copy link
Author

hi @nroi
thx a lot for your detailled answer; indeed I was not really wondering about the .sig files but the the [CACHE MISS] for the .db files;

your explanation does make perfect sense. to answer your question:

May I ask how fast your internet connection is? 
Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?

yes, i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.

@nroi
Copy link
Owner

nroi commented Nov 21, 2021

i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.

I see. I guess there are other users with similar issues. In that case, I might reconsider if it makes sense to implement some caching mechanism for database files. This should probably be disabled by default, and it should be configurable to determine the duration after which locally stored database files are considered stale and redownloaded again.

But don't expect this to be implemented very soon, I'm currently prioritizing changes that improve the code-maintainability over new features.

@bernhard-da
Copy link
Author

@nroi fair enough. thx again for your comments and working on flexo :)

@nroi nroi added the enhancement New feature or request label Nov 21, 2021
@nroi nroi reopened this Nov 21, 2021
@Zebradil
Copy link

Zebradil commented Jan 27, 2022

I also see an opportunity of improvement here. Maybe it make sense to check how pacman handles this, because, when I don't use flexo, database files are cached somehow.

sudo pacman -Sy
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
 multilib is up to date

But when I use flexo, the database files are always being downloaded.

I can't check how pacman works right now, but I'll try to figure this out later.

@nroi
Copy link
Owner

nroi commented Jan 30, 2022

@Zebradil Thanks for pointing this out. pacman sends the If-Modified-Since header, for example:

If-Modified-Since: Sun, 30 Jan 2022 10:17:26 GMT

Which means that the mirror may respond with a 304 Not Modified instead of sending the entire payload.

The timestamp seems to be set according to the Modify or Change timestamp of the file in /var/lib/pacman/sync. If you run sudo touch -m /var/lib/pacman/sync/core.db, then pacman sends a new If-Modified-Since timestamp.

It makes sense for flexo to behave like pacman, so this is something that should change in flexo.

@nroi
Copy link
Owner

nroi commented Mar 5, 2023

Feature draft

This post is intended to summarize all information required to implement this feature, as well as information about what value this feature adds to Flexo.

Problem description:

Database files are currently not cached. With a large number of clients, this can add up in traffic. This is relevant especially for users with a slow internet connection or an ISP that throttles speed after a given amount of data has been downloaded (see also: #82 (comment)).

Background information:

Originally, it was not planned to implement any kind of caching for database files to avoid that Flexo serves any outdated files. However, it turns out that it should actually be possible to implement some kind of caching:
Consider the case when pacman is used without Flexo. When pacman requests a database file, then it sends the If-Modified-Since header. The remote mirror then either serves this file as usual if the database file on the remote mirror is more recent than the header, or it just returns 304 Not Modified no more up-to-date file is available.
We therefore aim to implement something comparable for Flexo: If a new database file is available at the remote mirror, then Flexo should always serve this file instead of a stale, cached version. On the other hand, if Flexo already has the database file in a version that is more recent or just as recent as the version on the remote mirror, then no new download from a remote mirror should be required.

Proposed solution:

  • Flexo stores database files locally.
  • Flexo behaves like pacman against the remote mirror: it sends the If-Modified-Since when requesting database files. The value of this header should be the Modify or Change timestamp of the database file (need to find out which one pacman uses).
  • If the remote mirror then responds with 304, we just assume that the locally cached version is not stale, and serve this one to the requesting client.
  • If the remote mirror responds with 2xx, then we overwrite the locally stored version with the payload served by the remote mirror.

@nroi nroi added the feature-draft detailed and structured description about a planned feature label Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature-draft detailed and structured description about a planned feature
Projects
None yet
Development

No branches or pull requests

3 participants