Skip to content

Commit

Permalink
ERR README.md
Browse files Browse the repository at this point in the history
This is to define a new Test for ERR (Extended Route retention). This test demonstrates a new behavior for Graceful restart which is an extension to the existing behavior explained in RFC4724 and RFC8538.

This test also relies on,

- Changes proposed to the gNOI.bgp proto in openconfig/gnoi#214
- Also relies on some OC paths for ERR which arent available yet.
  • Loading branch information
sachendras authored Sep 28, 2024
1 parent 9a67fc3 commit f5618d9
Showing 1 changed file with 94 additions and 35 deletions.
129 changes: 94 additions & 35 deletions feature/bgp/gracefulrestart/ate_tests/err/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,29 +66,30 @@ B <-- EBGP(ASN200) --> C[Port2:ATE];
...

* RT-1.35.1 Validate Graceful-Restart (Baseline)

```
TODO: Following OC-paths need to be added to the Yang model
* /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/graceful-restart/extended-route-retention/state/retention-time <?>
* /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/graceful-restart/extended-route-retention/state/retention-policy <?>
```
* Validate received capabilities at DUT and ATE reflect support for graceful restart and also verify that the restart-time = 220 Secs and stale-routes-timer = 250 Secs.
* Validate ERR retention-time is as configured i.e. 300s
* Validate the ERR retention-policy matches "STALE-ROUTE-POLICY"
* TODO: Following OC-paths need to be added to the Yang model
```
* /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/graceful-restart/extended-route-retention/state/retention-time <?>
* /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/graceful-restart/extended-route-retention/state/retention-policy <?>
```
* Ensure the DUT has learnt all the Prefixes over the IBGP and EBGP sessions and has the correct community list attached to the routes in its post-policy ADJ-RIBIN
a. Ensure the DUT has learnt all the Prefixes over the IBGP and EBGP sessions and has the correct community list attached to the routes in its post-policy ADJ-RIBIN
* IPv4Prefix1 and IPv6Prefix1 has community NO-ERR
* IPv4Prefix2 and IPv6Prefix2 has community ERR-NO-DEPREF
* IPv4Prefix3 and IPv6Prefix3 has community TEST-IBGP and has a local-preference of 200
* IPv4Prefix4 and IPv6Prefix4 has community NO-ERR
* IPv4Prefix5 and IPv6Prefix5 has community ERR-NO-DEPREF
* IPv4Prefix6 and IPv6Prefix6 has community TEST-EBGP and also has a MED value of 50
* On ATE:Port1, ensure the following received from DUT:
b. On ATE:Port1, ensure the following received from DUT:
* IPv4Prefix4 and IPv6Prefix4 with community NO-ERR
* IPv4Prefix5 and IPv6Prefix5 with community ERR-NO-DEPREF
* IPv4Prefix6 and IPv6Prefix6 prefixes are received with a MED of 50 and has the community TEST-EBGP and NEW-EBGP in that order.
* On ATE:Port2, ensure the following received from DUT:
* IPv4Prefix1 and IPv6Prefix1 has community NO-ERR
* IPv4Prefix2 and IPv6Prefix2 has community ERR-NO-DEPREF
* IPv4Prefix3 and IPv6Prefix3 has community TEST-IBGP and NEW-IBGP in that order. Also, ensure that these prefixes have an AS-PATH of "100, 100, 100"
c. On ATE:Port2, ensure the following received from DUT:
* IPv4Prefix1 and IPv6Prefix1 has community NO-ERR
* IPv4Prefix2 and IPv6Prefix2 has community ERR-NO-DEPREF
* IPv4Prefix3 and IPv6Prefix3 has community TEST-IBGP and NEW-IBGP in that order. Also, ensure that these prefixes have an AS-PATH of "100, 100, 100"
* Start traffic as per the Test flows above and ensure 100% success

If any of the above verifications fail, then the test is a failure.
Expand All @@ -101,17 +102,17 @@ B <-- EBGP(ASN200) --> C[Port2:ATE];
a. Restarting DUT speaker whose BGP process was killed gracefully. In this case ERR policy is attached to the BGP neighborship.
* Trigger DUT session restart by gracefully killing the BGP process in the DUT. Please use the `gNOI.killProcessRequest_Signal_Term` as per [gNOI_proto](https://github.com/openconfig/gnoi/blob/main/system/system.proto#L326).
* Please kill the right process to restart BGP. For Juniper it is the `RPD` process. For Arista and Cisco this is the `BGP` process. For Nokia this is `sr_bgp_mgr`.
* Once the BGP process on DUT is killed, configure both ATEs to delay the BGP reestablishment for 330 secs longer than the `HOLD-TIME` and start regular traffic from between ATEs for the prefixes adevertised and verify that the packets are treated as follows. If not, the Test Must fail.
* Once the BGP process on DUT is killed, configure both ATEs to delay the BGP reestablishment for 330 secs longer than the `HOLD-TIME` and start regular traffic between ATEs as directed above and verify that the packets are treated as follows. If not, the Test Must fail.
* Traffic between prefixes IPv4Prefix1, IPv6Prefix1, IPv4Prefix4 and IPv6Prefix4 MUST be successful until the "restart timer" expires and dropped after that.
* Traffic between prefixes IPv4Prefix2, IPv6Prefix2, IPv4Prefix5 and IPv6Prefix5 MUST be successful until the ERR retention-time expires and dropped after that. The routes for these prefixes must also have the community STALE added to the end of the community-list as recieved at the ATE end.
* Traffic between prefixes IPv4Prefix3, IPv6Prefix3, IPv4Prefix6 and IPv6Prefix6 MUST be successful until the retention-time expires and dropped after that. The routes for these prefixes must also have the Local-Preference of "0" and the community value of "STALE" attached to the end of the community-list.
* Post 330 secs, the ATEs are allowed to form the BGP neighborship with the DUT. Readvertisements of the EBGP and IBGP prefixes will takeplace and the state of the routes and their BGP attributes as well as traffic flow is expected to be the same as the baseline results in RT-1.35.1 above. If not, then the test Must fail.
* Traffic between prefixes IPv4Prefix2, IPv6Prefix2, IPv4Prefix5 and IPv6Prefix5 MUST be successful until the ERR retention-time expires and dropped after that. The routes for these prefixes must also have the community STALE added to the end of the community-list as recieved at the ATE end (as per RT-1.35.1.b and RT-1.35.1.c).
* Traffic between prefixes IPv4Prefix3, IPv6Prefix3, IPv4Prefix6 and IPv6Prefix6 MUST be successful until the retention-time expires and dropped after that. The routes for these prefixes must also have the Local-Preference of "0" and the community value of "STALE" attached to the end of the community-list (as called out in RT-1.35.1.b and RT-1.35.1.c) as received by the ATEs.
* Post 330 secs, the ATEs are allowed to form BGP neighborship with the DUT. Readvertisements of the EBGP and IBGP prefixes will takeplace and the state of the routes and their BGP attributes as well as traffic flow is expected to be the same as the baseline results in RT-1.35.1 above. If not, then the test Must fail.

b. Restarting DUT speaker whose BGP process was killed gracefully after removing the ERR policy
* In this case too, Once the BGP process on DUT is killed, configure both ATEs to delay the BGP reestablishment for 330 secs longer than the `HOLD-TIME` and start regular traffic from between ATEs for the prefixes adevertised and verify that the packets are treated as follows. If not, the Test Must fail.
* When ERR has no ERR policy attached, behavior is expected to be as defined in RFC 8538 and RFC 4724 i.e. traffic flow between prefixes is successful only until the restart timer expires. After that, 100% packet drop is expected.
* Since there isnt any ERR policy attached, changes to the community and Local-Pref attributes as defined in that policy (STALE-ROUTE-POLICY) isnt expected. That is, the community-list attached to the routes learnt from the DUT will be the same as the baseline test above i.e. RT-1.35.1. If not, then the test Must fail
* Post 330 secs, the ATEs are allowed to form the BGP neighborship with the DUT. Readvertisements of the EBGP and IBGP prefixes will takeplace and the state of the routes and their BGP attributes as well as traffic flow is expected to be the same as the baseline results in RT-1.35.1 above. If not, then the test Must fail.
* In this case too, Once the BGP process on the DUT is killed, configure both ATEs to delay the BGP reestablishment for 330 secs longer than the `HOLD-TIME` and start regular trafficbetween between ATEs as directed above and verify that the packets are treated as follows. If not, the Test Must fail.
* When ERR has no ERR policy attached, behavior is expected to be as defined in RFC 8538 and RFC 4724 i.e. traffic flow between prefixes is successful only until the "restart timer" expires. After that, 100% packet drop is expected.
* Since there isnt any ERR policy attached, changes to the community and Local-Pref attributes as defined in the ERR policy (STALE-ROUTE-POLICY) isnt expected. That is, the community-list attached to the routes learnt from the DUT as well as their local-preference values will be the same as the baseline test above i.e. RT-1.35.1. If not, then the test Must fail
* Post 330 secs, the ATEs are allowed to form the BGP neighborship with the DUT. Readvertisements of the EBGP and IBGP prefixes will takeplace and the state of the routes and their BGP attributes as well as traffic flow is expected to be the same as the baseline results in RT-1.35.1 above. Also, traffic must be 100% successful. If not, then the test Must fail.


...
Expand Down Expand Up @@ -151,31 +152,89 @@ B <-- EBGP(ASN200) --> C[Port2:ATE];
* Start traffic. Send `gNOI.killProcessRequest_Signal_KILL` as per `gNOI proto` to ATE:Port1 to stop its BGP process abruptly. Configure ATE:Port1 to delay the BGP reestablishment for 330 secs over the Hold-time. Expected behavior in this case is the same as RT-1.35.2.b
* Post 330Secs over Hold-time expiry, BGP on ATE:Port1 is expected to be up and all traffic is expected to be successful.
* Repeat the same test on ATE:Port2


...

* RT-1.35.6
a. Expected behavior when Soft Notification Sent to the peer and the ERR policy is attached

```
TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214
```

a. Expected behavior when "Administrative Reset" Notification (rfc4486) sent to the peer while the ERR policy is attached to the neighborship
* Start traffic as per the flows above
* Trigger BGP soft Notification (code 6 subocde 4) from DUT:Port1 towards ATE:Port1. Please use the `gNOI.ClearBGPNeighborRequest_Soft` message as per [gNOI_proto](https://github.com/openconfig/gnoi/blob/main/bgp/bgp.proto#L41).
* Cease notification of Code 6, subcode 4 will result in tcp connection reset but the routes arent flushed
* Configure ATE:Port1 to not send/accept any more TCP conenctions from the DUT:Port1 until the reset timer on the DUT expires.
* Expected behavior is the same as RT-1.35.2.a
* Trigger BGP Notification (code 6 subocde 4) from DUT:Port1 towards ATE:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message.
* Cease notification of Code 6, subcode 4 will result in tcp connection reset but the routes aren't flushed
* Configure ATE:Port1 to not send/accept any more TCP conenctions from the DUT:Port1 until the "reset timer" on the DUT expires.
* Expected behavior is the same as RT-1.35.2.a
* Revert ATE configurtion to allow for the BGP sessions to be up. Restart traffic and confirm that there is zero packet loss. Expected behavior is same as the base test in RT-1.35.1
* Restart the above procedure for the IBGP peering between DUT:Port-2 and ATE:Port-2
* Restart the above procedure for the EBGP peering between DUT:Port-2 and ATE:Port-2


* Expected behavior when Soft Notification Sent to the peer when ERR policy removed
* RT-1.35.6
* Expected behavior when Soft Notification received from the peer when ERR policy attached
* Expected behavior when Soft Notification received from the peer when ERR policy removed
b. Expected behavior when "Administrative Reset" Notification sent to the peer and ERR policy isnt attached.
* Follow the same process as RT-1.35.6.a. However since the ERR policy isnt attached, expected behavior is the same as RT-1.35.2.b


...

* RT-1.35.7
* Expected behavior when Hard Notification Sent to the peer when ERR policy attached
* Expected behavior when Hard Notification Sent to the peer when ERR policy removed

```
TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214
```

a. Expected behavior when "Administrative Reset" Notification (rfc4486) received from the peer while ERR policy is attached on the neighborship.
* Follow the same procedure as RT-1.35.6.a above. However this time, Trigger BGP Notification (code 6 subocde 4) from ATE:Port1 towards DUT:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message.
* Expected result is same as RT-1.35.2.a above
* Revert ATE configurtion to allow for the BGP sessions to be up. Restart traffic and confirm that there is zero packet loss. Expected behavior is same as the base test in RT-1.35.1
* Restart the above procedure for the EBGP peering between DUT:Port-2 and ATE:Port-2

b. Expected behavior when "Administrative Reset" Notification (rfc4486) is received from the peer and ERR policy isnt attached on the neighborship
* Start traffic and then follow the same process as RT-1.35.7.a above. The only exception in this case is that the ERR policy isn't attached. Expected behavior is the same as the baseline test RT-1.35.2.b above.


...

* RT-1.35.8
* Expected behavior when Hard Notification received from the peer when ERR policy attached
* Expected behavior when Hard Notification received from the peer when ERR policy removed

```
TODO: gNOI.ClearBGPNeighborRequest_HARD used in this case is under review in https://github.com/openconfig/gnoi/pull/214
```

a. Expected behavior when "Hard Reset" Notification sent by the DUT and the ERR policy is attached per neighbor
* Start traffic as per the flows above
* Trigger BGP "HARD RESET" Notification from the DUT:Port1 and DUT:Port2 towards ATE:Port1 and ATE:Port2 respectively by using `gNOI.ClearBGPNeighborRequest_HARD` message of the gNOI PROTO.
* As per [rfc8538#section-3.1](https://datatracker.ietf.org/doc/html/rfc8538#section-3.1), when "N bit" exchanged between peers (i.e. GR negotiated), the "HARD RESET" notification of code 6 subcode 9 must be sent to the peer. However, the subcode for "Administrative Reset" i.e. code 6 subcode 4 must be carried in the data portion of subcode 9 notification message.
* On receipt of the "HARD RESET" Notification message from the DUT, the ATEs MUST flush all the routes. Hence, 100% packet loss MUST be experienced on all the flows irrespective of the ERR configuration and the `STALE-ROUTE-POLICY`. The test MUST fail if this isnt the behavior seen.
* As soon as the BGP peerings are up again between the ATEs and the DUT, traffic flow must be successful and the expected behavior must be the same as RT-1.35.1

b. Expected behavior when Hard Notification Sent to the peer and the ERR policy isn't attached on the neighbor sessions
* Start traffic as per the flows above.
* Follow the steps in RT-1.35.8.a above. The expected results in this case is the same as RT-1.35.8.a since HARD RESET notification MUST result in disconnecting TCP session plus flush all routes irrespective of the ERR configuration
* Once the BGP sessions are up between the DUTs and the ATE, the expected behavior must be the same as RT-1.35.1




* RT-1.35.9

```
TODO: gNOI.ClearBGPNeighborRequest_HARD used in this case is under review in https://github.com/openconfig/gnoi/pull/214
```

a. Expected behavior when Hard Notification received from the peer while ERR policy is attached on the BGP neigborship
* Start traffic as per the flows above
* Trigger BGP "HARD RESET" Notification from the ATE:Port1 to DUT:Port1 by sending `gNOI.ClearBGPNeighborRequest_HARD` message to ATE:Port1. When this happens and the DUT reeives BGP cease notification with subcode 9, the DUT is expected to FLUSH all IBGP learnt routes irrespective of the ERR configuration and therefore traffic between the flows will see 100% failure.
* Once the IBGP peering is reestablished, expected behavior is the same as RT-1.35.1
* Repeat the above process by sending gNOI.ClearBGPNeighborRequest_HARD to the ATE:Port2. Expected behavior here is the same as seen for the IBGP peering.

b. Expected behavior when Hard Notification received from the peer when ERR policy removed
* Start traffic as per the flows above.
* Follow the steps in RT-1.35.9.a above. The expected results in this case is the same as RT-1.35.9.a since HARD RESET notification MUST result in disconnecting TCP session plus flush all routes irrespective of the ERR configuration
* Once the BGP sessions are up between the DUTs and the ATE, the expected behavior must be the same as RT-1.35.1


* RT-1.35.9
* Expected behavior when routes have added communities part of the regular import and export policies apart from the ERR policy

Expand Down

0 comments on commit f5618d9

Please sign in to comment.