Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the suggested "script lab_setup_environment.sh" breaks the previously working Horizon GUI Browser to address 192.168.100.10 #39

Open
ColumGaynor opened this issue Sep 23, 2019 · 13 comments
Assignees

Comments

@ColumGaynor
Copy link

After vagrant up have a working system and the browser session from host computer to the Horizon Container in the controller-01 VM works from the host using the address 192.168.100.10
( br-flat has eth3 enslaves ).

After running the script "lab_setup_environment.sh" the Horizon no longer works from the host machine. Several new linux bridges are introduces abd controller-01 bridge br-flat no longer has eth3 enslaved. Horizon GUI fails...

Is this intended and if not any idea how to fix. I can provide more information from the environment if you let me know was it needed.
-Colum

@uksysadmin uksysadmin self-assigned this Sep 24, 2019
@uksysadmin
Copy link
Contributor

Thanks for raising this - I'll run through things as soon as possible and report back.

@uksysadmin
Copy link
Contributor

Weird. The script does nothing unusual - just executes openstack commands as user.
However I can confirm the problem and where it is.

The script first creates a flat network to use as the external shared public network.
Running openstack network create, and openstack subnet create on this flat network maintains the bridge.

Create the external gateway network: 192.168.100.0/24 via 'flat' bridge (eth3 in the guest)

openstack network create --share --project admin --external --default --provider-network-type flat --provider-physical-network flat GATEWAY_NET
openstack subnet create --project admin --subnet-range 192.168.100.0/24 --dhcp --dns-nameserver 192.168.1.1 --gateway 192.168.1.1 --allocation-pool start=192.168.100.100,end=192.168.100.250 --network GATEWAY_NET GATEWAY_SUBNET

However, weirdly, running:

Create a private tenant network (VXLAN)

openstack network create --project admin private-net

Causes the bridge to be altered as described. This shouldn't be using this network so there must be a misconfiguration somewhere.
Currently troubleshooting.

@ColumGaynor
Copy link
Author

Thanks... I am on vacation from Sunday 29th Sep for two weeks .... so if you would like anything
from my environment, please let me know tomorrow (friday 27th) and I can upload.
-Colum

@ColumGaynor
Copy link
Author

ColumGaynor commented Dec 2, 2019

Hello again, I was wondering if you had any luck with troubleshooting this. Resuming an OpenStack study group later this week and we were trying to use the OpenStack Cookbook and it's Environment scripts.... Colum

@uksysadmin
Copy link
Contributor

Unfortunately no - I got completely distracted away from this issue. I'll troubleshoot further tomorrow and report back. From the original troubleshooting it doesn't make sense. I've put time aside tomorrow to look into this. Apologies!

@ColumGaynor
Copy link
Author

ColumGaynor commented Dec 4, 2019 via email

@uksysadmin
Copy link
Contributor

I've been working on this but can't get past the fact I can see when it changes - it is (and a correction to the above) specifically after the subnet create of the flat external network:

openstack subnet create --project admin --subnet-range 192.168.100.0/24 --dhcp --dns-nameserver 192.168.1.1 --gateway 192.168.1.1 --allocation-pool start=192.168.100.100,end=192.168.100.250 --network GATEWAY_NET GATEWAY_SUBNET

Which makes perfect sense that it is this that is causing it. However I can't see why.

The interface drops out of the bridge and then the reason no access can occur - well partly because the bridge is then wrong - but you lose access because of the routing:

Interface moves outside of the bridge and route gets added:

brq9b54aee1-a9 8000.080027ccd78e no eth3
tapd325a094-91

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 eth0
10.0.3.0 0.0.0.0 255.255.255.0 U 0 0 0 lxcbr0
172.29.236.0 0.0.0.0 255.255.255.0 U 0 0 0 br-mgmt
172.29.240.0 0.0.0.0 255.255.255.0 U 0 0 0 br-vxlan
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 br-flat
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0

I've also noticed that br-vlan - although we specify this and expect this to be created, isn't present. So essentially I've some network work to do. I suspect this is too late to fix for your study group - but I'm chipping away to find out why this is the case.

@ColumGaynor
Copy link
Author

Hi There and thanks a lot for the excellent analysis. It's not too late and I can show the current issue as a way of understanding the networking better, so I definitely appreciate you 'chipping away' on this one. Today we resume the activity and we continue next year, so getting this sorted any time will help... I will have a deeper look in my own environment based on the tips above!
-Colum

@ColumGaynor
Copy link
Author

I just noticed that inside my controller-01 node, the route which you had in your printout above:
:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 eth0
10.0.3.0 0.0.0.0 255.255.255.0 U 0 0 0 lxcbr0
172.29.236.0 0.0.0.0 255.255.255.0 U 0 0 0 br-mgmt
172.29.240.0 0.0.0.0 255.255.255.0 U 0 0 0 br-vxlan
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 br-flat
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3 <<<--------------- !!!
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
:
is missing when I look in my controller-01 :


vagrant@controller-01:~$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.0.2.2        0.0.0.0         UG        0 0          0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U         0 0          0 eth0
10.0.3.0        0.0.0.0         255.255.255.0   U         0 0          0 lxcbr0
172.29.236.0    0.0.0.0         255.255.255.0   U         0 0          0 br-mgmt
172.29.240.0    0.0.0.0         255.255.255.0   U         0 0          0 br-vxlan
192.168.56.0    0.0.0.0         255.255.255.0   U         0 0          0 eth1
192.168.100.0   0.0.0.0         255.255.255.0   U         0 0          0 br-flat
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
vagrant@controller-01:~$ 

@ColumGaynor
Copy link
Author

In your table printout, there was also listed two routes "192.168.100.0" devices br-flat and eth3 ?
Looks odd ?

@uksysadmin
Copy link
Contributor

It occurs when you run this specific command which puts the subnet onto the Flat network

openstack subnet create --project admin --subnet-range 192.168.100.0/24 --dhcp --dns-nameserver 192.168.1.1 --gateway 192.168.1.1 --allocation-pool start=192.168.100.100,end=192.168.100.250 --network GATEWAY_NET GATEWAY_SUBNET

Where GATEWAY_NET is the Flat network created

So is it missing when you first boot or is it missing after you run that command?
This is the problem I'm trying to solve - this moving of eth3 out of br-flat is why things aren't working.

@ColumGaynor
Copy link
Author

Hard to say! Unfortunately I did not check the routing table, before I ran the command. But after I ran the command (in the script) then I noticed this after reading your first answer. I tried a bit of experimentation but messed up the environment badly! Tried to run vagrant destroy followed by vagrant up, but seems a new issue came up - the ansible run is now failing, I scrapped the whole enviroment and cloned everything fresh from git. The three machines are brought in and set running fine in Virtualbox, but the ansible run is now failiing 100% of the time. I captured the screen short with colors preserved to a Libre Office ODT file (+pdf) if you would be interested to guide me why the ansible is now failiing. So close and yet still cannot take advantage of your fine work.

vagrant-openstackcookbook_latestrun-ansible_failure.pdf

@ColumGaynor
Copy link
Author

Solved the ansible run failure Kevin, and now I can start investigating the network issue again from scratch. More on that later.

To fix the Ansible failures, I found a small change to the file:
/roles/prereqs/tasks/main.yml is needed as shown below:

main.yml


  • name: Update Apt Cache
    apt:
    update_cache: yes
    cache_valid_time: 3600

  • name: Install Pre-Req Packages
    apt:
    name: "{{ item }}"
    state: installed ------------------------------->>> change "installed" to "present" **** HERE ****
    cache_valid_time: 3600
    install_recommends: yes
    force: yes
    with_items:

    • bridge-utils
    • debootstrap
    • ifenslave
    • ifenslave-2.6
    • lsof
    • lvm2
    • tcpdump
    • vlan
    • aptitude
    • build-essential
    • git
    • ntp
    • ntpdate
    • python-dev
    • libyaml-dev
    • libpython2.7-dev
    • libffi-dev
    • libssl-dev
    • python-crypto
    • python-yaml
    • python-pip
    • lxc1
    • libvirt-bin
    • ifupdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants