Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fallback to IPv4 if IPv6 times out #25

Open
thesamesam opened this issue Jan 10, 2023 · 8 comments
Open

Implement fallback to IPv4 if IPv6 times out #25

thesamesam opened this issue Jan 10, 2023 · 8 comments

Comments

@thesamesam
Copy link
Contributor

We get a fair amount of users who find emerge --sync hangs when refreshing keys. Often, it's the case that their network has broken IPv6 connectivity.

We should fall back to IPv4 if IPv6 times out, given how common this is.

See also https://bugs.gentoo.org/779766.

@mgorny
Copy link
Member

mgorny commented Apr 29, 2023

Well, I've implemented explicit timeouts — I wonder if this is sufficient to get some implicit IPv4 fallback to work.

@oz123
Copy link

oz123 commented May 15, 2023

I have the problem that very often my ISP breaks IPv6 or IPv4 which forces me to choose explicitly one of the two.
Here is what I do to make gemato work only with IPv4 only:


diff --git a/gemato/openpgp.py b/gemato/openpgp.py
index 483e15f..3543f81 100644
--- a/gemato/openpgp.py
+++ b/gemato/openpgp.py
@@ -37,6 +37,7 @@ from gemato.exceptions import (
 
 try:
     import requests
+    requests.packages.urllib3.util.connection.HAS_IPV6 = False
 except ImportError:
     requests = None

Ofc, this is ugly, and we can make a nice /etc/portage/make.conf option out of this ...

If interested, I can contribute a patch.

@mgorny
Copy link
Member

mgorny commented May 15, 2023

How is an option supposed to be nice when it can only be implemented using an ugly hack?

@oz123
Copy link

oz123 commented May 16, 2023

Well, beauty is in the eye of the beholder. The developers urllib3 offer this switch for choosing IPv4. Adding a command line flag for emerge sync or environment variable would be positively accepted by many users.

@mgorny
Copy link
Member

mgorny commented May 16, 2023

The developers urllib3 offer this switch for choosing IPv4.

Is this documented anywhere? It looks like an implementation detail and not a public-facing "switch".

@oz123
Copy link

oz123 commented May 16, 2023

You are right. It's not something public.

@ModernKiwi
Copy link

ModernKiwi commented Aug 2, 2023

I am so happy to have come across this. I have been having issues with emerge --sync getting stuck at Refreshing keys via WKD ..._.

After doing some investigating I found that my network was getting assigned IPv6 DHCP from my ISP and IPv6 DNS is working, however, any IPv6 traffic (eg ping -6 [URL | IPv6 address]) would get no response.

So I did some digging into Portages source code.

I found the following code snippet:

def _refresh_keys(self, openpgp_env):
        """
        Refresh keys stored in openpgp_env. Raises gemato.exceptions.GematoException
        or asyncio.TimeoutError on failure.

        @param openpgp_env: openpgp environment
        @type openpgp_env: gemato.openpgp.OpenPGPEnvironment
        """

        if openpgp_env.refresh_keys_wkd():
            out.eend(0)
            return
        out.eend(1)

This seems to link to this class function here:

class IsolatedGPGEnvironment(SystemGPGEnvironment):
    """
    An isolated environment for OpenPGP routines. Used to get reliable
    verification results independently of user configuration.

    Remember to close() in order to clean up the temporary directory,
    or use as a context manager (via 'with').
    """

    def __init__(self, debug=False, proxy=None, timeout=None):
        super().__init__(debug=debug)
        self.proxy = proxy
        self.timeout = timeout
        self._home = tempfile.mkdtemp(prefix='gemato.')

Based on this information it looks like a user can set a timeout somehow due to the following code:

if timeout is not None:
                # GPG doesn't accept sub-second timeouts
                gpg_timeout = math.ceil(timeout)
                f.write(f"""
# respect user-specified timeouts
resolver-timeout {gpg_timeout}
connect-timeout {gpg_timeout}
""")

However, I am unable to find any documentation that indicates that this can be set as a variable via a command line or .conf file, admitidly I haven't done much digging into the source code.

After reviewing the refresh_keys_wkd method I found line 589.

resp = requests.get(url, proxies=proxies, timeout=self.timeout)

I then did further digging into the requests module source code and the urllib3 module that requests uses.
At this point it got a bit over my head, however, this is what I was able to take away from it.

  • Gemato uses the Requests module to handle the request.
  • Requests module then uses the urllib3 module
  • urllib3 should then timeout after its default time as Gemato doesn't seem to pass a default to requests to pass on.

For me eventually the request times out and continues on without verifying the keys.

I have a similar issue with checking repositories, however the IPv6 connections timeout and fall back to IPv4.

The only possible solution I can think of now is to modify the refresh_keys_wkd(self): in a way where resp = requests.get(url, proxies=proxies, timeout=self.timeout) is called with a resolved IPv6 address (if obtainable) with a short timeout (eg 10 seconds) and if the response was a failure from timeout to then try with an IPv4 address. There are caveats to this possible solution, however, and that is that I was unable to determine if requests.get supports an IP variable or has to be a hostname.

Sorry for the lengthy message, I hope this maybe is of use to someone else trying to find a possible solution to this issue as I spent a lot of time troubleshooting my network and then digging around to see if I could find a fix before stumbling upon this post. also if I made any mistakes feel free to let me know, its late and I'm very tired.

EDIT:
I posted this info also incase it helps with the message @mgorny posted

Well, I've implemented explicit timeouts — I wonder if this is sufficient to get some implicit IPv4 fallback to work.

EDIT2: fixed area stating it eventually falls back to IPv4 as that's emerge-webrsync.

thesamesam added a commit to thesamesam/portage that referenced this issue Aug 13, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/portage that referenced this issue Aug 17, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

In the process of looking at this, it became clear we weren't initialising
a logger at all for `emerge`, so fix that.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/portage that referenced this issue Aug 17, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/portage that referenced this issue Aug 17, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
gentoo-bot pushed a commit to gentoo/portage that referenced this issue Aug 17, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
@thesamesam
Copy link
Contributor Author

I am so happy to have come across this. I have been having issues with emerge --sync getting stuck at Refreshing keys via WKD ..._.

Yeah, it's really frustrating. I've added --debug support to Portage together with mgorny's help on the gemato side which helps here at least.

After doing some investigating I found that my network was getting assigned IPv6 DHCP from my ISP and IPv6 DNS is working, however, any IPv6 traffic (eg ping -6 [URL | IPv6 address]) would get no response.

So I did some digging into Portages source code.

I found the following code snippet:

def _refresh_keys(self, openpgp_env):
        """
        Refresh keys stored in openpgp_env. Raises gemato.exceptions.GematoException
        or asyncio.TimeoutError on failure.

        @param openpgp_env: openpgp environment
        @type openpgp_env: gemato.openpgp.OpenPGPEnvironment
        """

        if openpgp_env.refresh_keys_wkd():
            out.eend(0)
            return
        out.eend(1)

This seems to link to this class function here:

class IsolatedGPGEnvironment(SystemGPGEnvironment):
    """
    An isolated environment for OpenPGP routines. Used to get reliable
    verification results independently of user configuration.

    Remember to close() in order to clean up the temporary directory,
    or use as a context manager (via 'with').
    """

    def __init__(self, debug=False, proxy=None, timeout=None):
        super().__init__(debug=debug)
        self.proxy = proxy
        self.timeout = timeout
        self._home = tempfile.mkdtemp(prefix='gemato.')

Based on this information it looks like a user can set a timeout somehow due to the following code:

Yeah, it works as a command line arg --timeout (see e.g. gemato gpg-wrap -h) or via the class constructor (for use via e.g. Portage).

if timeout is not None:
                # GPG doesn't accept sub-second timeouts
                gpg_timeout = math.ceil(timeout)
                f.write(f"""
# respect user-specified timeouts
resolver-timeout {gpg_timeout}
connect-timeout {gpg_timeout}
""")

However, I am unable to find any documentation that indicates that this can be set as a variable via a command line or .conf file, admitidly I haven't done much digging into the source code.

After reviewing the refresh_keys_wkd method I found line 589.

resp = requests.get(url, proxies=proxies, timeout=self.timeout)

I then did further digging into the requests module source code and the urllib3 module that requests uses. At this point it got a bit over my head, however, this is what I was able to take away from it.

I was hoping we could just ask requests to fallback and we could use that to influence what to tell gpg to use later on, but apparently not: psf/requests#1691.

* Gemato uses the [Requests](https://github.com/psf/requests/tree/main?rgh-link-date=2023-08-02T11%3A04%3A46Z) module to handle the request.

* Requests module then uses the [urllib3 ](https://github.com/urllib3/urllib3/tree/main?rgh-link-date=2023-08-02T11%3A04%3A46Z)module

* urllib3 should then timeout after its default time as Gemato doesn't seem to pass a default to requests to pass on.

For me eventually the request times out and continues on without verifying the keys.

I have a similar issue with checking repositories, however the IPv6 connections timeout and fall back to IPv4.

FWIW, if it's that bad (not just affecting gpg's custom getaddrinfo), you should either disable IPv6 fully (assuming fixing your network isn't an option) and/or edit /etc/gai.conf to force IPv4.

But I agree it'd be nice to have something here given it's not easy for people to see what's going on at all.

The only possible solution I can think of now is to modify the refresh_keys_wkd(self): in a way where resp = requests.get(url, proxies=proxies, timeout=self.timeout) is called with a resolved IPv6 address (if obtainable) with a short timeout (eg 10 seconds) and if the response was a failure from timeout to then try with an IPv4 address. There are caveats to this possible solution, however, and that is that I was unable to determine if requests.get supports an IP variable or has to be a hostname.

Right, we could do our own probe and use the resolved names. It's not as elegant as I was hoping for, though.

I'm left wondering if we should just add a timer for Portage's refresh stage where it'll tell you to check your IPv6 connectivity if it takes >= 30s or so.

palao pushed a commit to palao/portage that referenced this issue Oct 16, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
palao pushed a commit to palao/portage that referenced this issue Oct 22, 2023
Respect --debug and pass it down to gemato so we get nice debugging output
when e.g. 'refreshing keys' is stuck.

Bug: https://bugs.gentoo.org/646194
Bug: https://bugs.gentoo.org/647696
Bug: https://bugs.gentoo.org/691666
Bug: https://bugs.gentoo.org/779766
Bug: https://bugs.gentoo.org/873133
Bug: https://bugs.gentoo.org/906875
Bug: projg2/gemato#7
Bug: projg2/gemato#25
Signed-off-by: Sam James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants