Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fixtures to call dpus on smartswtich directly #15695

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

JibinBao
Copy link
Contributor

Description of PR

The mgmt IP of dpus on smartswitch is private IP(169.254.200.0/24), we cannot access it directly out of the switch. To operate dpu like operating switch when impmenting the test case(dash, platform on dup), we add two fixtures: dpuhost and fixture_dpuhosts. They are same as duthost and fixture_duthosts.
Before the calling the fixtures, we need to do the steps as follows:

  1. Enable the NAT configuation on SmartSwitch by running sonic-dpu-mgmt-traffic.sh
    [SmartSwitch] Added inbound traffic capability for DPU management traffic script sonic-buildimage#20635
    e.g.

sonic-dpu-mgmt-traffic.sh inbound -e --dpus all --ports 5021,5022,5023,5024

  1. Add dpu information to ansible/inventory(Notes: the ansible_ssh_port should be same as the above config of NAT)
    e.g.

smartswtich-01 ansible_host=10.200.100.2 sonic_version=v2 sonic_hwsku=Mellanox-SN4280-O28 switch_type="switch"....
smartswtich-01-dpu-0 ansible_host=10.200.100.2 ansible_ssh_port=5021 sonic_version=v2 sonic_hwsku=Nvidia-bf3-
com-dpu switch_type="dpu" ....
smartswtich-01-dpu-1 ansible_host=10.200.100.2 ansible_ssh_port=5022 sonic_version=v2 sonic_hwsku=Nvidia-bf3-
com-dpu switch_type="dpu" ....
smartswtich-01-dpu-2 ansible_host=10.200.100.2 ansible_ssh_port=5023 sonic_version=v2 sonic_hwsku=Nvidia-bf3-
com-dpu switch_type="dpu" ....
smartswtich-01-dpu-3 ansible_host=10.200.100.2 ansible_ssh_port=5024 sonic_version=v2 sonic_hwsku=Nvidia-bf3-
com-dpu switch_type="dpu" ....

  1. Add dpu information to ansible/testbed.yaml
    e.g.
  • conf-name: smartswtich-01-t1-28-lag
    group-name: vm-t2
    topo: t1-28-lag
    ptf_image_name: docker-ptf-mlnx
    ptf: ptf-smartswtich-01
    ptf_ip: 10.200.100.12/24
    ptf_ipv6:
    server: server_72
    vm_base: VM3701
    dut:
    • smartswtich-01
    • smartswtich-01-dpu-0
    • smartswtich-01-dpu-1
    • smartswtich-01-dpu-2
    • smartswtich-01-dpu-3
      comment: smartswitch testbed
  1. Add dpu infromation to ansible/lab

smartswtich-01-dpu-0 ansible_host=10.200.100.2 ansible_ssh_port=5021 ansible_hostv6="fe80::966d:aeff:fe04:1f58" sonic_version=v2 hwsku="Nvidia-bf3-com-dpu" iface_speed=100000 mgmt_subnet_mask_length=22 vm_base=VM0000
smartswtich-01-dpu-1 ansible_host=10.200.100.2 ansible_ssh_port=5022 ansible_hostv6="fe80::966d:aeff:fe04:1f58" sonic_version=v2 hwsku="Nvidia-bf3-com-dpu" iface_speed=100000 mgmt_subnet_mask_length=22 vm_base=VM0000
smartswtich-01-dpu-2 ansible_host=10.200.100.2 ansible_ssh_port=5023 ansible_hostv6="fe80::966d:aeff:fe04:1f58" sonic_version=v2 hwsku="Nvidia-bf3-com-dpu" iface_speed=100000 mgmt_subnet_mask_length=22 vm_base=VM0000
smartswtich-01-dpu-3 ansible_host=10.200.100.2 ansible_ssh_port=5024 ansible_hostv6="fe80::966d:aeff:fe04:1f58" sonic_version=v2 hwsku="Nvidia-bf3-com-dpu" iface_speed=100000 mgmt_subnet_mask_length=22 vm_base=VM0000

So, we can access the dpu directly out of switch such as "ssh [email protected] -p 5021"

Run tests usage:
When runing case only on dpu(For the exsiting tests such as platform, techsupport, we can run them without any case change):

python3 -m pytest platform_tests/cli/test_show_platform.py --testbed smartswitch-01-t1-28-lag --host-pattern smartswitch-01-dpu-0 ....

When running case on smartswitch(NPU) and dpu:

python3 -m pytest dash/test_dash_privatelink.py --testbed smartswitch-01-t1-28-lag --host-pattern smartswitch-01 --dpu-pattern smartswitch-01-dpu-0,smartswitch-01-dpu-1,smartswitch-01-dpu-2,smartswitch-01-dpu-3 .....

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Add fixtures to operate dup directly out of switch like operating switch

How did you do it?

Add fixures : duthost and fixture_duthosts

How did you verify/test it?

run platform case on dpu only and dash case on smartswitch

Any platform specific information?

SmartSwtich

Supported testbed topology if it's a new test case?

Documentation

@JibinBao JibinBao changed the title Add two fixtures dpuhosts and fixture_dpuhosts for calling dpu on sma… Add fixtures to call dpus on smartswtich directly Nov 22, 2024
@param tbinfo: fixture provides information about testbed.
"""
try:
host = DutHosts(ansible_adhoc, tbinfo, get_specified_dpus(request),
Copy link
Contributor

@theasianpianist theasianpianist Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to run sonic-dpu-mgmt-traffic.sh on the NPU automatically here? It should be possible to get the ansible SSH ports using ansible_adhoc().options['inventory_manager'].get_host(<dpu hostname>).vars, then we can avoid a manual step.

If we do this I think it also makes sense to change the fixture scope to 'module' in case any test module causes a device reload, that way we can ensure the DPU mgmt traffic rules are always applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for you quick response.
You mean when the smartswitch is up, the sonic-dpu-mgmt-traffic.sh will run automatically. Right?
If the script of sonic-dpu-mgmt-traffic.sh can save the config regradless of config reload or reboot, I think we can keep it as the scope as session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have question: how the sonic-dpu-mgmt-traffic.sh to get the ssh ports information?
ansible_adhoc().options['inventory_manager'].get_host().vars depends on the config of inventory, Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to call sonic-dpu-mgmt-traffic.sh at the start of every test session from inside of this fixture. If I understand the current implementation correctly, the user needs to manually call sonic-dpu-mgmt-traffic.sh on the NPU before starting the test session.

Also, I'm not sure if the configs changed by sonic-dpu-mgmt-traffic.sh will persist across a reload or reboot, it looks like mostly modifying iptables rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, dpuhosts will depends on duthosts, we need to make sure to call duthosts before calling dpuhosts.
Additionally, currently sonic-dpu-mgmt-traffic.sh will not persist the config after reload or reboot. So, when the test does reload or reboot, we need to re-run the script again, which will make the implement of test more complex.
For now, to persist the nat config, in our regression we run the following code on NPU during deploying the image. Maybe we can offer the other solution which is to add one pre-test to configure the NAT. What do you think?

sudo su
sudo sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/g' /etc/sysctl.conf
sudo echo net.ipv4.conf.eth0.forwarding=1 >> /etc/sysctl.conf
sudo sysctl -p
sudo iptables -t nat -A POSTROUTING -s 169.254.200.0/24 -o eth0 -j MASQUERADE
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 5021 -j DNAT --to-destination 169.254.200.1:22
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 5022 -j DNAT --to-destination 169.254.200.2:22
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 5023 -j DNAT --to-destination 169.254.200.3:22
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 5024 -j DNAT --to-destination 169.254.200.4:22
sudo iptables -t nat -L
sudo iptables-save > /etc/iptables/rules.v4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like a good idea, or possibly even doing it in a session-scoped fixture, but it might be better to address this in a separate PR. We can leave this as-is for now, could you add a short comment here about needing to call sonic-dpu-mgmt-traffic.sh before using the fixture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

############################
# SmartSwitch options #
############################
parser.addoption("--dpu-pattern", action="store", default=None, help="dpu host name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario where we wouldn't want to have access to all DPUs on a device? It may be simpler to just grab all DPUs from the testbed file instead of manually specifying them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage is same as the duthost(switch), if the value is all, it will grab all dpus.
For some common tests shared with regular switch, we don't need any dpus.

Copy link
Contributor

@theasianpianist theasianpianist Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this CLI parameter and just always assume we want to grab all available DPUs? I don't see the benefit in only getting a subset of the available DPUs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @theasianpianist,
It will be convenient for us to run the test when we want to specify one dpu to test. so, it is better to keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I that case, what if we set the default value to "all" for this parameter? That way we don't need to change how the tests are called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

@prabhataravind
Copy link
Contributor

prabhataravind commented Nov 27, 2024

@nissampa @rameshraghupathy for viz

@JibinBao
Copy link
Contributor Author

JibinBao commented Dec 3, 2024

Hi @theasianpianist , Can you please review the reply?

tests/conftest.py Outdated Show resolved Hide resolved
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JibinBao JibinBao force-pushed the infra_ss_operate_dpu branch from 9606cbb to 92e1a77 Compare December 13, 2024 09:25
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1. remove dut with dpu string, otherwise when add topo, we don't need to add topo for dpu
2. remove space for duts, otherwise it when using echo to create fail will fail
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JibinBao
Copy link
Contributor Author

The commit of update testbed-cli.sh is to resove the following issues:

  1. remove dpu dut, because it will affect adding topo for NPU
  2. remove space from duts, because when the dut string includes space, using echo to create a file will fail.

Copy link
Contributor

@theasianpianist theasianpianist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be great to have, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants