Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirrors timing out in docker #116

Open
bwrsandman opened this issue Jan 20, 2024 · 7 comments
Open

Mirrors timing out in docker #116

bwrsandman opened this issue Jan 20, 2024 · 7 comments
Assignees

Comments

@bwrsandman
Copy link

bwrsandman commented Jan 20, 2024

Describe the bug
When running the flexo docker, no mirrors are reachable and they time out. Increasing the time out duration does nothing.
Running on two different machines (Arch Linux x86_64, TrueNAS Scale cobia x86_64) gives the same result: Locally, the mirrors are reachable, but in the docker they time out.

Starting up an arch linux docker and installing flexo from the AUR results in timeouts as well.

Running curl inside of the docker works

docker run -it --rm -e RUST_LOG=trace --entrypoint /bin/bash nroi/flexo:latest
root@d56af00b032a:/# curl https://mirror.0xem.ma/arch/extra/os/x86_64/extra.db > /tmp/extra.db

Installation method
Docker

Distribution
linux/amd64

Version
digest e9f3ba02b33a

Log
Running docker run -it --rm -e RUST_LOG=trace nroi/flexo:latest

The following settings were fetched from the TOML file or environment variables: MirrorConfig {
        cache_directory: "/var/cache/flexo/pkg",
        mirrorlist_fallback_file: "/var/cache/flexo/state/mirrorlist",
        mirrorlist_latency_test_results_file: Some(
            "/var/cache/flexo/state/latency_test_results.json",
        ),
        refresh_latency_tests_after: None,
        port: 7878,
        listen_ip_address: Some(
            "0.0.0.0",
        ),
        mirror_selection_method: Auto,
        mirrors_predefined: [],
        custom_repo: None,
        low_speed_limit: Some(
            128000,
        ),
        low_speed_limit_formatted: None,
        low_speed_time_secs: Some(
            3,
        ),
        connect_timeout: None,
        max_speed_limit: None,
        num_versions_retain: Some(
            3,
        ),
        mirrors_auto: Some(
            MirrorsAutoConfig {
                mirrors_status_json_endpoint: "https://archlinux.org/mirrors/status/json/",
                mirrors_status_json_endpoint_fallbacks: [],
                mirrors_blacklist: [],
                https_required: true,
                ipv4: true,
                ipv6: false,
                max_score: 2.5,
                num_mirrors: 8,
                mirrors_random_or_sort: Sort,
                timeout: 350,
                allowed_countries: [],
            },
        ),
    }

...

[2024-01-20T18:02:35.753Z DEBUG flexo::mirror_fetch] Remote mirror https://mirror.osbeck.com/archlinux/core/os/x86_64/core.db appears to use CloudFlare, this mirror will be ignored.
[2024-01-20T18:02:35.753Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror.osbeck.com/archlinux/: Latency test did not succeed: Error { description: "Failed writing received data to disk/application", code: 23, extra: Some("Failed writing header") }
[2024-01-20T18:02:36.103Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror.sunred.org/archlinux/ due to timeout.
[2024-01-20T18:02:36.453Z DEBUG flexo::mirror_flexo] Skip mirror https://archlinux.thaller.ws/ due to timeout.
[2024-01-20T18:02:36.705Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror.moson.org/arch/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to mirror.moson.org port 443: Connection timed out") }
[2024-01-20T18:02:37.055Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror.cyberbits.eu/archlinux/ due to timeout.
[2024-01-20T18:02:37.338Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror.f4st.host/archlinux/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to mirror.f4st.host port 443: Connection timed out") }
[2024-01-20T18:02:37.598Z DEBUG flexo::mirror_flexo] Skip mirror https://archlinux.mailtunnel.eu/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to archlinux.mailtunnel.eu port 443: Connection timed out") }
[2024-01-20T18:02:37.949Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror2.sl-chat.ru/archlinux/ due to timeout.
[2024-01-20T18:02:38.299Z DEBUG flexo::mirror_flexo] Skip mirror https://arch.jensgutermuth.de/ due to timeout.
[2024-01-20T18:02:38.936Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror3.sl-chat.ru/archlinux/ due to timeout.
[2024-01-20T18:02:39.285Z DEBUG flexo::mirror_flexo] Skip mirror https://geo.mirror.pkgbuild.com/ due to timeout.
[2024-01-20T18:02:39.636Z DEBUG flexo::mirror_flexo] Skip mirror https://mirror1.sl-chat.ru/archlinux/ due to timeout.
[2024-01-20T18:02:39.869Z DEBUG flexo::mirror_flexo] Skip mirror https://md.mirrors.hacktegic.com/archlinux/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to md.mirrors.hacktegic.com port 443: Connection timed out") }
[2024-01-20T18:02:40.178Z DEBUG flexo::mirror_flexo] Skip mirror https://de.arch.mirror.kescher.at/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to de.arch.mirror.kescher.at port 443: Connection timed out") }
[2024-01-20T18:02:40.486Z DEBUG flexo::mirror_flexo] Skip mirror https://at.arch.mirror.kescher.at/: Latency test did not succeed: Error { description: "Couldn't connect to server", code: 7, extra: Some("Failed to connect to at.arch.mirror.kescher.at port 443: Connection timed out") }
@bwrsandman
Copy link
Author

bwrsandman commented Jan 20, 2024

Additionally on the Truenas Docker I get this if I set a predefined server

root@flexo-ix-chart-fccd6b8b5-pjg8f:/# FLEXO_MIRROR_SELECTION_METHOD=predefined FLEXO_MIRRORS_PREDEFINED="['https://mirrors.kernel.org/archlinux']" FLEXO_CONNECT_TIMEOUT=50000 flexo
[2024-01-20T19:25:38.491Z DEBUG flexo] The following settings were fetched from the TOML file or environment variables: MirrorConfig {
        cache_directory: "/var/cache/flexo/pkg",
        mirrorlist_fallback_file: "/var/cache/flexo/state/mirrorlist",
        mirrorlist_latency_test_results_file: Some(
            "/var/cache/flexo/state/latency_test_results.json",
        ),
        refresh_latency_tests_after: None,
        port: 47878,
        listen_ip_address: Some(
            "0.0.0.0",
        ),
        mirror_selection_method: Predefined,
        mirrors_predefined: [
            "https://mirrors.kernel.org/archlinux",
        ],
        custom_repo: None,
        low_speed_limit: Some(
            128000,
        ),
        low_speed_limit_formatted: None,
        low_speed_time_secs: Some(
            3,
        ),
        connect_timeout: Some(
            50000,
        ),
        max_speed_limit: None,
        num_versions_retain: Some(
            3,
        ),
        mirrors_auto: None,
    }

[2024-01-20T19:25:41.552Z DEBUG flexo] Attempt to establish new connection
[2024-01-20T19:25:41.552Z DEBUG flexo] Establish a new connection to https://mirrors.kernel.org/archlinux
[2024-01-20T19:25:41.552Z DEBUG flexo::mirror_flexo] Fetch package from remote mirror: https://mirrors.kernel.org/archlinux/extra/os/x86_64/extra.db.
[2024-01-20T19:25:41.552Z DEBUG flexo::mirror_flexo] Set low_speed_time to 3 seconds.
[2024-01-20T19:25:41.552Z DEBUG flexo::mirror_flexo] No speed limit was set.
[2024-01-20T19:25:41.552Z INFO  flexo::mirror_flexo] File is still empty: extra.db-4c74feba-b487-4226-b248-8725366de239
[2024-01-20T19:25:41.552Z DEBUG flexo::mirror_flexo] Start download from https://mirrors.kernel.org/archlinux
[2024-01-20T19:25:41.552Z DEBUG flexo::mirror_flexo] Attempt to create file: "/tmp/flexo/uncacheable/multilib/os/x86_64/multilib.db-2707c6b3-8e99-4777-94ff-ee0b89bc1182"
[2024-01-20T19:25:41.553Z DEBUG flexo::mirror_flexo] Fetch package from remote mirror: https://mirrors.kernel.org/archlinux/multilib/os/x86_64/multilib.db.
[2024-01-20T19:25:41.553Z DEBUG flexo::mirror_flexo] Set low_speed_time to 3 seconds.
[2024-01-20T19:25:41.553Z DEBUG flexo::mirror_flexo] No speed limit was set.
[2024-01-20T19:25:41.553Z INFO  flexo::mirror_flexo] File is still empty: multilib.db-2707c6b3-8e99-4777-94ff-ee0b89bc1182
[2024-01-20T19:25:41.553Z DEBUG flexo::mirror_flexo] Start download from https://mirrors.kernel.org/archlinux
[2024-01-20T19:25:41.553Z DEBUG flexo] Establish a new connection to https://mirrors.kernel.org/archlinux
[2024-01-20T19:25:41.553Z DEBUG flexo::mirror_flexo] Attempt to create file: "/tmp/flexo/uncacheable/community/os/x86_64/community.db-6997fb06-a335-4b11-880a-c92732e3019d"
[2024-01-20T19:25:41.554Z DEBUG flexo::mirror_flexo] Fetch package from remote mirror: https://mirrors.kernel.org/archlinux/community/os/x86_64/community.db.
[2024-01-20T19:25:41.555Z DEBUG flexo::mirror_flexo] Set low_speed_time to 3 seconds.
[2024-01-20T19:25:41.555Z DEBUG flexo::mirror_flexo] No speed limit was set.
[2024-01-20T19:25:41.555Z INFO  flexo::mirror_flexo] File is still empty: community.db-6997fb06-a335-4b11-880a-c92732e3019d
[2024-01-20T19:25:41.555Z DEBUG flexo::mirror_flexo] Start download from https://mirrors.kernel.org/archlinux
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Received complete header from remote mirror
[2024-01-20T19:25:46.967Z INFO  flexo::mirror_flexo] HTTP response code is 301
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Received complete header from remote mirror
[2024-01-20T19:25:46.967Z INFO  flexo::mirror_flexo] HTTP response code is 301
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Received complete header from remote mirror
[2024-01-20T19:25:46.967Z INFO  flexo::mirror_flexo] HTTP response code is 301
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Server sent a redirect: Waiting for next header.
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Server sent a redirect: Waiting for next header.
[2024-01-20T19:25:46.967Z DEBUG flexo::mirror_flexo] Server sent a redirect: Waiting for next header.
[2024-01-20T19:25:46.968Z DEBUG flexo::mirror_flexo] Received complete header from remote mirror
[2024-01-20T19:25:46.968Z INFO  flexo::mirror_flexo] HTTP response code is 301
[2024-01-20T19:25:46.968Z DEBUG flexo::mirror_flexo] Server sent a redirect: Waiting for next header.
[2024-01-20T19:25:48.551Z ERROR flexo] Timeout: Unable to obtain content length.
[2024-01-20T19:25:48.551Z DEBUG flexo::http_headers] Sending header to client: "HTTP/1.1 500 Internal Server Error\r\nServer: flexo\r\nDate: Sat, 20 Jan 2024 19:25:48 GMT\r\nFlexo-Payload-Origin: NoPayload\r\nContent-Length: 0\r\n\r\n"
[2024-01-20T19:25:48.551Z INFO  flexo] Request served [NO PAYLOAD]: "core/os/x86_64/core.db"
[2024-01-20T19:25:48.551Z DEBUG flexo] Reading header from client.
[2024-01-20T19:25:48.552Z ERROR flexo] Timeout: Unable to obtain content length.
[2024-01-20T19:25:48.552Z ERROR flexo] Timeout: Unable to obtain content length.
[2024-01-20T19:25:48.552Z DEBUG flexo::http_headers] Sending header to client: "HTTP/1.1 500 Internal Server Error\r\nServer: flexo\r\nDate: Sat, 20 Jan 2024 19:25:48 GMT\r\nFlexo-Payload-Origin: NoPayload\r\nContent-Length: 0\r\n\r\n"
[2024-01-20T19:25:48.552Z DEBUG flexo::http_headers] Sending header to client: "HTTP/1.1 500 Internal Server Error\r\nServer: flexo\r\nDate: Sat, 20 Jan 2024 19:25:48 GMT\r\nFlexo-Payload-Origin: NoPayload\r\nContent-Length: 0\r\n\r\n"

@nroi
Copy link
Owner

nroi commented Feb 4, 2024

My first guess would be that the mirror selection does not work well for you and flexo is therefore choosing mirrors with a too high latency.

You don't have anything unusual configured regarding networking, like a VPN or so? Also, on which continent are you located? In the past, some users from Australia had issues because of the latencies from Australia to mirrors located in Europe and elsewhere.

Could you please try the following:

  • Remove the file /var/cache/flexo/state/latency_test_results.json, if it exists.
    Use the default TOML config file and leave all default settings, with the only exceptions:
  • allowed_countries should be configured as described here.
  • If you have to change the port or the listen_ip_address, that's also fine, that won't have an influence on the issue you've described.
  • If that still doesn't work, try to increase max_score.

If that still doesn't work, please post the output of the following command, would be interesting to see how fast your connection is:

curl -w @- -o /dev/null -s "https://mirror.moson.org/arch/core/os/x86_64/core.db" <<'EOF' 
    time_namelookup:  %{time_namelookup}\n
       time_connect:  %{time_connect}\n
    time_appconnect:  %{time_appconnect}\n
   time_pretransfer:  %{time_pretransfer}\n
      time_redirect:  %{time_redirect}\n
 time_starttransfer:  %{time_starttransfer}\n
                    ----------\n
         time_total:  %{time_total}\n
EOF

@bwrsandman
Copy link
Author

Trying this on my laptop and it works file.

Trying this on my server which is a TrueNAS Scale x86_64 which uses kubernetes.
When I run in the container, unset all the FLEXO env variables, set the allowed_countries and use the default TOML, I still get time outs with flexo but using curl downloads fairly quickly even with some which timeout with flexo.

Flexo also exits because it can't find any non timed out mirror. Increasing max score to 10 and timeout to 30 seconds doesn't change anything.

Here's the output with moson.org

    time_namelookup:  5.059352
       time_connect:  5.172486
    time_appconnect:  5.300360
   time_pretransfer:  5.300383
      time_redirect:  0.000000
 time_starttransfer:  5.505811
                    ----------
         time_total:  5.707949

@nroi
Copy link
Owner

nroi commented Feb 5, 2024

Here's the output with moson.org

    time_namelookup:  5.059352
       time_connect:  5.172486
    time_appconnect:  5.300360
   time_pretransfer:  5.300383
      time_redirect:  0.000000
 time_starttransfer:  5.505811
                    ----------
         time_total:  5.707949

So it takes more than 5 seconds until name lookup has completed, that is very unusual. This is how it looks for me:

    time_namelookup:  0.001474
       time_connect:  0.009997
    time_appconnect:  0.020877
   time_pretransfer:  0.020898
      time_redirect:  0.000000
 time_starttransfer:  0.036836
                    ----------
         time_total:  0.055268

It's possible that flexo does not behave quite as expected under less-than-ideal networking conditions, but my impression is that some networking strangeness in your Kubernetes setup seems to be at least a contributing factor.

Have you verified that the settings are also applied correctly for Flexo? I'm not an expert on Kubernetes, but with plain docker, here's one example to set the allowed_countries setting via environment vars:

docker container run -it --rm  -e RUST_LOG=debug -e FLEXO_MIRRORS_AUTO_ALLOWED_COUNTRIES=DE,AT,CZ nroi/flexo

During startup, you should then see something like the following to show which settings are applied (this requires a log level of DEBUG or comparable):

[2024-02-05T17:24:18.005Z DEBUG flexo] The following settings were fetched from the TOML file or environment variables: MirrorConfig {
        cache_directory: "/var/cache/flexo/pkg",
        mirrorlist_fallback_file: "/var/cache/flexo/state/mirrorlist",
        mirrorlist_latency_test_results_file: Some(
            "/var/cache/flexo/state/latency_test_results.json",
        ),
        refresh_latency_tests_after: None,
        port: 7878,
        listen_ip_address: Some(
            "0.0.0.0",
        ),
        mirror_selection_method: Auto,
        mirrors_predefined: [],
        custom_repo: None,
        low_speed_limit: Some(
            128000,
        ),
        low_speed_limit_formatted: None,
        low_speed_time_secs: Some(
            3,
        ),
        connect_timeout: None,
        max_speed_limit: None,
        num_versions_retain: Some(
            3,
        ),
        mirrors_auto: Some(
            MirrorsAutoConfig {
                mirrors_status_json_endpoint: "https://archlinux.org/mirrors/status/json/",
                mirrors_status_json_endpoint_fallbacks: [],
                mirrors_blacklist: [],
                https_required: true,
                ipv4: true,
                ipv6: false,
                max_score: 2.5,
                num_mirrors: 8,
                mirrors_random_or_sort: Sort,
                timeout: 350,
                allowed_countries: [
                    "DE",
                    "AT",
                    "CZ",
                ],
            },
        ),
    }

@bwrsandman
Copy link
Author

What you say makes sense, the time takes less long on other devices.
On the truenas itself (outside of the container) I can bring the time_namelookup down to less than a millisecond.
I can also get very good times in other containers.
Is there something special done to the DNS in the nroi/flexo container?

You can see the settings i ran in my original post.

@bwrsandman
Copy link
Author

Using the dig command I was able to find out that the first of my nameservers was an old pihole.
It seems like 5 seconds is the re-try time until it selected my second nameserver which responds in miliseconds.
Changing my DNS settings fixed the issue and flexo is working in the container.

Don't know what we can learn from this to improve flexo since these issues only came up from a bad DNS config.

@nroi
Copy link
Owner

nroi commented Feb 5, 2024

Changing my DNS settings fixed the issue and flexo is working in the container.

Glad to hear that!

Don't know what we can learn from this to improve flexo since these issues only came up from a bad DNS config.

I might try to check if I can reproduce this issue somehow, but as long as it works fine for users with no unusual network settings, I won't invest too much time into it. There are other areas at flexo that have higher priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants