-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AREDN network storms: What they look like, how they happen, and how to prevent them #106
Comments
great work. thanks |
Maybe you want to take a look at the deduplication code in OLSRv2 too... I think it has a few more heuristics than the old one. |
wow, that reminds me on old headaches - thanks a lot for finding the underlying issue! |
And its not only the sequence number, it can happen with the AnswerSet-number too... |
I had a quick look at the code and the the IEFT OLSRv2 draft, and they both still seem to use 16-bit sequence numbers with similar wrap around comparison logic. Is there something specific? |
The sequence number handling code of (my) OLSRv2 implementation handles a jump in the sequence number by counting continuous sequences of "very old" numbers without any new ones... and triggers a "reset" in the duplication code after a while. Unfortunately this doesn't help with OLSRv1 and OLSRv2 ANSN, because they don't necessarily increase all the time (but in practice they do). |
Henning is "The sequence number handling code of (my) OLSRv2 " in the
current release code?
Would it be safe to say that If ARDEN stopped using the old OLSRv1 and
used OLSRv2 we would stop having paralyzing storms on our networks?
…On Wed, Sep 1, 2021 at 10:40 PM Henning Rogge ***@***.***> wrote:
The sequence number handling code of (my) OLSRv2 implementation handles a
jump in the sequence number by counting continuous sequences of "very old"
numbers without any new ones... and triggers a "reset" in the duplication
code after a while. Unfortunately this doesn't help with OLSRv1 and OLSRv2
ANSN, because they don't necessarily increase all the time (but in practice
they do).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#106 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEW6LEVR72W2TJ2IVJIRL3T74E5TANCNFSM5DCX2KEA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yes, its in the current released code... we (Fraunhofer FKIE) have issues with restarting OLSRv2 nodes, but I have never seen these "storms" you were talking about. What I have seen is that routers ignore changes in the attached network that happens together with a router restart (we use OLSRv2 with a dynamic attached network source)... this "ignore attached network change" is because of the ANSN issue, which is similar to the SEQNO issue but more difficult to solve because ANSN can remain constant in theory (especially in emulated networks). the easiest way might be to store the ANSN as well as the SEQNO numbers (to make router restarts generally faster), but I have yet to find the time to write code that stores both message sequence numbers, packet sequence numbers (per interface!) and the ANSN and tries to reload it on a router restart. |
I've submitted a downstream pull request for AREDN only (aredn/aredn_packages#5) |
@aanon4 Could you submit a PR based on the patch with all you have write in the github Massage also in the commit message? |
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]> (cherry picked from commit bb5bbc6)
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]> (cherry picked from commit bb5bbc6)
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Fixes: OLSR/olsrd#106 Signed-off-by: Nick Hainke <[email protected]>
Hi @PolynomialDivision , should we take over some patches here? |
Network storms are the result of the OLSR demon on nodes restarting. A restart randomly resets the sequence number that will be used for new messages. Under certain circumstances these new sequence numbered messages can interact with old messages from the same source which are still in the network and create a message storm. The combined old and new messages confuse the deduplication code so that messages will always appear new, regardless of how many times a node received them, and will always be duplicated to their neighbors. This will continue until all copies of messages time-to-live expires on all nodes.
Full explanation and proposed solutions can be found here: https://docs.google.com/document/d/1OgURb2O36lWF518dKydJLBEEUTYt7khRTVzX8QChwqw/edit?usp=sharing
The text was updated successfully, but these errors were encountered: