Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLAN 42 bridging on Ubiquiti 60 GHz devices #299

Closed
pktpls opened this issue Aug 3, 2022 · 20 comments
Closed

VLAN 42 bridging on Ubiquiti 60 GHz devices #299

pktpls opened this issue Aug 3, 2022 · 20 comments

Comments

@pktpls
Copy link
Contributor

pktpls commented Aug 3, 2022

In the past few months we deployed a number of Ubiquiti 60 GHz devices.
These are mostly AirFiber 60-LR point-to-point devices,
and recently also one pair the new Wave AP and Wave Nano.

The management VLAN 42 is supposed to not cross site boundaries.
In previous versions of AirOS, this was achieved by having the management VLAN
only on eth0/lan0. However AirOS on these 60 GHz devices doesn't currently
have this option. Instead it creates a bridge over wifi0 and eth0 and puts
VLAN 42 on this bridge.

The result is one big L2 management network between all sites with Ubiquiti 60 GHz.
See for yourself, for example: ssh [email protected] ping ff02::1%switch0.42

Disabling the management VLAN on the device doesn't help,
because then it still bridges eth0 and wifi0.

This large L2 network might also be the cause of various looping and reflection issues
which we've been experiencing in recent weeks.

Reports about this:

Which sites are affected?

According to the various snmp.yml definitions, the following 60 GHz links exist:

  • RHNK <-> (Philmel, RHXB, Kiehlufer, Zwingli)
  • RHXB <-> (L105, Flughafen, RHNK, DTMB)
  • Vaterhaus <-> (Zwingli)
  • Sama <-> (Saarbrücker, Zwingli)
  • Flughafen <-> (AK36, RHXB)
  • Philmel <-> (RHNK)
  • Zwingli <-> (Sama, RHNK, Agym, Vaterhaus)
  • Agym <-> (Zwingli)
  • Vaterhaus <-> (Zwingli)

What can we do?

This is a possible plan to mitigate the problem:

  • Make sure no VLAN 42 traffic enters or exits the 60 GHz devices.
    At some sites this can be handled on the managed switch,
    at other sites it needs to be handled on the corerouter.
  • Disable management VLAN on the 60 GHz devices,
    so that SSH and the Web UI can accessed by other means,
    e.g. via link-local tunneling:
    ssh -L 8080:[fe80::265a:4cff:fe2f:f7e0%switch0.24]:443 [email protected]

Additionally we should:

  • Detect unexpected VLAN 42 traffic at sites.
  • Pressure Ubiquiti to bring back the old bridging configuration options.
@spolack
Copy link
Member

spolack commented Aug 3, 2022

thanks for the writeup. Why not use unique MGMT VLAN IDs?

@PolynomialDivision
Copy link
Contributor

Keep in mind that we want to query statistics from the devices. So we need proper adressing. :)

At some sites this can be handled on the managed switch

Can you give an example how that should work?

@pktpls
Copy link
Contributor Author

pktpls commented Aug 9, 2022

Why not use unique MGMT VLAN IDs?

That's an option too, but I'm assuming that Site A's mgmt packets would still be bridged over to Site B, where they'd be dropped. This would successfully break the inter-site mgmt network, but it'd waste a bit of airtime. Ideally A's mgmt packets would never leave A.

Is that correct?

Can you give an example how that should work?

The switch would exclude the 60 GHz ports from VLAN 42. Bigger switches like the one at RHNK can do this, but I'm not sure yet how smaller sites with dumb switches (or with only a corerouter) would do it.

@pktpls
Copy link
Contributor Author

pktpls commented Aug 9, 2022

Keep in mind that we want to query statistics from the devices. So we need proper adressing. :)

The AirOS web UI stuff does work over link-local addresses, but we'd need to check that the metrics tools can deal with that.

@Koltonowski
Copy link
Contributor

Smaller switches like EdgeSwitch10XP can do this too: https://help.ui.com/hc/en-us/articles/360039311974-EdgeSwitch-EdgeSwitch-X-Port-Isolation. However, the router could then become a bottleneck if it is not potent enough.

@spolack
Copy link
Member

spolack commented Sep 7, 2022

I would recommend against using features like "port based acl" aka port isolation in standard setups. IMOT this increases complexity which is simply not necessary.

That's an option too, but I'm assuming that Site A's mgmt packets would still be bridged over to Site B, where they'd be dropped. This would successfully break the inter-site mgmt network, but it'd waste a bit of airtime. Ideally A's mgmt packets would never leave A.
Is that correct?

Well, that applies to broadcast frames, but in my opinion the wasted airtime is negligible. At my wilgu10 its on average about less then 10-20 frames in a minute.

root@wilgu10-core:~# tcpdump -eni switch0.42 'broadcast or multicast'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on switch0.42, link-type EN10MB (Ethernet), capture size 262144 bytes

@pktpls
Copy link
Contributor Author

pktpls commented Sep 16, 2022

Let's do the unique-VLAN-ID-per-location thing. Count up from 420?

  • 421 420 RHNK
  • 422 RHXB
  • 423 DTMB
  • 424 Philmel
  • 425 Zwingli
  • 426 Sama
  • 427 AK36
  • 428 Flughafen
  • 429 L105
  • 430 Saarbrücker
  • 431 Agym
  • 432 Kiehlufer

I would recommend against using features like "port based acl" aka port isolation in standard setups. IMOT this increases complexity which is simply not necessary.

You're right, I've thought about how this would be maintained and it seems very complex and fragile. Suddenly it would matter which port an antenna is plugged into, etc. Very easy to break the setup if one isn't super careful.

@Koltonowski
Copy link
Contributor

Let's do the unique-VLAN-ID-per-location thing. Count up from 420?

* 421 RHNK

* 422 RHXB

* 423 DTMB

* 424 Philmel

* 425 Zwingli

* 426 Sama

* 427 AK36

* 428 Flughafen

* 429 L105

* 430 Saarbrücker

* 431 Agym

* 432 Kiehlufer

I would recommend against using features like "port based acl" aka port isolation in standard setups. IMOT this increases complexity which is simply not necessary.

You're right, I've thought about how this would be maintained and it seems very complex and fragile. Suddenly it would matter which port an antenna is plugged into, etc. Very easy to break the setup if one isn't super careful.

We need a way to monitor the overview of the used VLANs. If we build new sites and have to go through the letter order each time and look at the respective network config, then we have more effort than necessary. Can anyone think of a solution for that - also the possibility that a location will be missing/dissembled? Otherwise, I can live with the solution and this problem will be solved for the next 50 years.

@spolack
Copy link
Member

spolack commented Oct 2, 2022

For now we can maintain a list somewhere, IMO. Once we migrated to some IPAM like netbox, we can write it down there.

@pktpls
Copy link
Contributor Author

pktpls commented Nov 8, 2022

RHNK has been changed to VLAN 420: #369

@robertfoss
Copy link
Contributor

robertfoss commented Dec 15, 2022

I have a suggestion. Could we generate the VLAN automatically in ansible? Something along the lines of using the location name to create a number.

The only downsides I see are:

  • Small chance of collisions
    • The collision would have to be between neighboring bbb nodes to matter
    • Even if two bbb nodes collide there wouldn't be a huge amount of broadcast spam between the two
  • Knowing the vlan of a router
    • Maybe a potential solution is having ansible output a list of all of the vlans along with the firmware binaries
    • Logging the VLAN of any image built
 - set_fact:
     r: "{{ range(1000, 3968) | random(seed=$location_name) }}"
   run_once: yes
   loop:
     - string

 - debug:
     msg: "{{ r }}"

@pktpls
Copy link
Contributor Author

pktpls commented Mar 8, 2023

To list all mgmt VIDs in bbb-configs: grep -C2 'role: mgmt' group_vars/*/* | grep vid

@pktpls
Copy link
Contributor Author

pktpls commented Jun 21, 2023

On RHNK we ended up doing two things:

  • VLAN port isolation on the Mikrotik switch, which hard-binds each port to only certain VLANs. Obviously VLAN 42 isn't the only one that gets transparently bridged - all VLANs are treated this way. This was the reason for a brief period of OLSR peerings over multiple wifi hops, e.g. from Kiehlufer to RHXB.
  • Rename VLAN 42 to a different individual per-site number. We still want the 60g device's mgmt UI available, which means the mgmt VLAN can't be stripped away. Make sure that both ends of the link use different mgmt VLAN IDs.

Step one ^ means we need switches capable of VLAN port isolation wherever we have 60g Ubiquiti devices. Until Ubiquiti re-introduces adequate VLAN settings like back in AirOS, but that's probably never going to happen...

@robertfoss
Copy link
Contributor

robertfoss commented Jul 23, 2023

 * 433 gub37

@Noki
Copy link
Member

Noki commented Sep 21, 2023

434 w38b

@FFHener
Copy link
Contributor

FFHener commented Feb 13, 2024

435 ilr

@PolynomialDivision
Copy link
Contributor

436 scharni

@Noki
Copy link
Member

Noki commented Jul 19, 2024

437 teufelsberg

@FFHener
Copy link
Contributor

FFHener commented Jul 26, 2024

438 philmel

@Noki Noki pinned this issue Sep 25, 2024
@Noki Noki unpinned this issue Sep 25, 2024
@Noki
Copy link
Member

Noki commented Sep 25, 2024

I consolidated this into a new issue that has a table so we won't have unused VLANs in the future and could keep track of the state of migration: #983

I also pinned the new issue and will therefore close this one.

@Noki Noki closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants