Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AREDN network storms: What they look like, how they happen, and how to prevent them #106

Open
aanon4 opened this issue Aug 30, 2021 · 11 comments

Comments

@aanon4
Copy link

aanon4 commented Aug 30, 2021

Network storms are the result of the OLSR demon on nodes restarting. A restart randomly resets the sequence number that will be used for new messages. Under certain circumstances these new sequence numbered messages can interact with old messages from the same source which are still in the network and create a message storm. The combined old and new messages confuse the deduplication code so that messages will always appear new, regardless of how many times a node received them, and will always be duplicated to their neighbors. This will continue until all copies of messages time-to-live expires on all nodes.

Full explanation and proposed solutions can be found here: https://docs.google.com/document/d/1OgURb2O36lWF518dKydJLBEEUTYt7khRTVzX8QChwqw/edit?usp=sharing

@storchi
Copy link

storchi commented Aug 31, 2021

great work. thanks

@HRogge
Copy link
Contributor

HRogge commented Aug 31, 2021

Maybe you want to take a look at the deduplication code in OLSRv2 too... I think it has a few more heuristics than the old one.
https://github.com/OLSR/OONF/blob/master/src/base/oonf_duplicate_set.c

@bittorf
Copy link
Contributor

bittorf commented Aug 31, 2021

wow, that reminds me on old headaches - thanks a lot for finding the underlying issue!

@HRogge
Copy link
Contributor

HRogge commented Sep 1, 2021

And its not only the sequence number, it can happen with the AnswerSet-number too...

@aanon4
Copy link
Author

aanon4 commented Sep 1, 2021

Maybe you want to take a look at the deduplication code in OLSRv2 too... I think it has a few more heuristics than the old one.
https://github.com/OLSR/OONF/blob/master/src/base/oonf_duplicate_set.c

I had a quick look at the code and the the IEFT OLSRv2 draft, and they both still seem to use 16-bit sequence numbers with similar wrap around comparison logic. Is there something specific?

@HRogge
Copy link
Contributor

HRogge commented Sep 2, 2021

The sequence number handling code of (my) OLSRv2 implementation handles a jump in the sequence number by counting continuous sequences of "very old" numbers without any new ones... and triggers a "reset" in the duplication code after a while. Unfortunately this doesn't help with OLSRv1 and OLSRv2 ANSN, because they don't necessarily increase all the time (but in practice they do).

@mathisono
Copy link

mathisono commented Sep 2, 2021 via email

@HRogge
Copy link
Contributor

HRogge commented Sep 2, 2021

Yes, its in the current released code...

we (Fraunhofer FKIE) have issues with restarting OLSRv2 nodes, but I have never seen these "storms" you were talking about. What I have seen is that routers ignore changes in the attached network that happens together with a router restart (we use OLSRv2 with a dynamic attached network source)...

this "ignore attached network change" is because of the ANSN issue, which is similar to the SEQNO issue but more difficult to solve because ANSN can remain constant in theory (especially in emulated networks).

the easiest way might be to store the ANSN as well as the SEQNO numbers (to make router restarts generally faster), but I have yet to find the time to write code that stores both message sequence numbers, packet sequence numbers (per interface!) and the ANSN and tries to reload it on a router restart.

@aanon4
Copy link
Author

aanon4 commented Sep 7, 2021

I've submitted a downstream pull request for AREDN only (aredn/aredn_packages#5)

@PolynomialDivision
Copy link
Collaborator

@aanon4 Could you submit a PR based on the patch with all you have write in the github Massage also in the commit message?

PolynomialDivision added a commit to PolynomialDivision/routing that referenced this issue Jun 22, 2022
PolynomialDivision added a commit to PolynomialDivision/routing that referenced this issue Jun 22, 2022
Fixes:
OLSR/olsrd#106

Signed-off-by: Nick Hainke <[email protected]>
(cherry picked from commit bb5bbc6)
PolynomialDivision added a commit to PolynomialDivision/routing that referenced this issue Jun 22, 2022
Fixes:
OLSR/olsrd#106

Signed-off-by: Nick Hainke <[email protected]>
(cherry picked from commit bb5bbc6)
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Oct 21, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Oct 21, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Oct 31, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Nov 12, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Nov 12, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Nov 22, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Nov 28, 2022
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Routing that referenced this issue Dec 16, 2022
@mathiashro
Copy link
Contributor

Hi @PolynomialDivision , should we take over some patches here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants