Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot add host to Project: Error 500 #2413

Open
1 task done
dnoliver opened this issue Apr 1, 2019 · 16 comments
Open
1 task done

Cannot add host to Project: Error 500 #2413

dnoliver opened this issue Apr 1, 2019 · 16 comments

Comments

@dnoliver
Copy link

dnoliver commented Apr 1, 2019

Summary

After following the documentation in https://vmware.github.io/vic-product/assets/files/html/1.5, I cannot add a host to a project.

Admiral cannot communicate with the VCH instance
VCH instance logs show errors while trying to stat datastore

Environment information

vSphere 6.7
Single ESXi host 6.7
vCenter Server appliance with embedded Platform controller 6.7
VIC 1.5
VCH deployed with UI Wizard
one single datastore
a bridge network created with virtual switch
default VM Network as public network

vSphere and vCenter Server version

vSphere and vCenter 6.7 update 1

VIC Appliance version

vic-v1.5.2-7206-92ebfaf5

Configuration
  • Embedded or external PSC: Embedded
  • How was the OVA deployed? (Flex client, HTML5 client, ovftool): HTML5
  • Does the VIC appliance recieve configuration by DHCP? YES
  • What stage of the Appliance Lifecycle is the VIC appliance in? Running (I think)
  • IP address of VIC appliance:
  • Hostname of VIC appliance:
  • IP address of vCenter Server:
  • Hostname of vCenter Server:

Details

Was following the documentation step by step to deploy the first VCH host.
VCH host is deployed successfully.
vic-machine-linux ls shows the host
All green checks in VCH admin portal
Used the default-project in admiral, tried to add the VCH host to default-project
No TLS being used. Tried to add the host:

Error connecting to http://192.168.0.110:2376: Unexpected error: Connection refused: /192.168.0.110:2376

Using http since the docs say that use http with no TLS. tried several combinations, none of the works.

Changed type from VCH to DOCKER, received error 500.

Inspect logs in VCH admin portal. Several ERROR messages (but UI have all green checks)

Docker Personality log show several times

Apr  1 2019 23:10:04.393Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default  &{Code:500 Message:cannot stat '[datastore1] virtual-container-host/VIC/423bf6bd-c91b-3c79-a0ac-ae0b26077784/images': No such file}

Port Layer showing the same:

Apr  1 2019 23:10:04.392Z ERROR op=264.404: Error getting image store 423bf6bd-c91b-3c79-a0ac-ae0b26077784: cannot stat '[datastore1] virtual-container-host/VIC/423bf6bd-c91b-3c79-a0ac-ae0b26077784/images': No such file

No problems in Init log

VIC Admin log show same error several times:

Apr  1 2019 23:10:11.204Z ERROR Process docker-engine-server not running: open /.tether/run/docker-engine-server.pid: no such file or directory
Steps to reproduce

Follow docs to deploy VIC, create VCH
Assign VCH to default-project in Admiral

Actual behavior

Cannot establish connection error

Expected behavior

VCH should be added to default-project

Support information

Logs

Not comfortable with posting publicly, private channel is ok

See also

Troubleshooting attempted

  • [ x] Searched GitHub for existing issues. (Mention any similar issues under "See also", above.)
  • [ x] Searched the documentation for relevant troubleshooting guidance.
  • Searched for a relevant VMware KB article.
@dnoliver
Copy link
Author

dnoliver commented Apr 2, 2019

This looks like a similar https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ts_imagestore_error.html error. But in this case, the error message is different, and I did not assigned any container to run yet.

@dnoliver
Copy link
Author

dnoliver commented Apr 2, 2019

I have manually browser to the datastore1 folder, and created the images folder in there. At that point, the Docker Personality log reported success

Apr  2 2019 18:52:56.080Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default  &{Code:500 Message:500 Internal Server Error}
time="2019-04-02T18:52:59Z" level=info msg="Launching docker personality pprof server on 127.0.0.1:6062" 
Apr  2 2019 18:52:59.356Z ERROR Unable to load CAs for registry access in config
Apr  2 2019 18:52:59.356Z INFO  Waiting for portlayer to come up
Apr  2 2019 18:53:01.358Z INFO  Portlayer is up and responding to pings
Apr  2 2019 18:53:01.358Z INFO  Refreshing repository cache
Apr  2 2019 18:53:01.360Z INFO  Image cache initialized successfully
Apr  2 2019 18:53:01.360Z INFO  Repository cache updated successfully
Apr  2 2019 18:53:01.360Z INFO  Layer cache initialized successfully
Apr  2 2019 18:53:01.361Z INFO  Container cache updated successfully
Apr  2 2019 18:53:01.361Z INFO  Creating image store
Apr  2 2019 18:53:01.362Z INFO  TLS enabled
Apr  2 2019 18:53:01.363Z INFO  Listener created for HTTP on 192.168.0.110//tcp
Apr  2 2019 18:53:01.379Z INFO  API listen on 192.168.0.110:2376

But then I restarted the host, and my images folder previously created was removed again. Apparently, that folder is managed by the VCH host, and will run into the same problem after every reboot

@dnoliver
Copy link
Author

dnoliver commented Apr 2, 2019

After this workaround, I was able to add the host to the project! :)

@dnoliver
Copy link
Author

dnoliver commented Apr 2, 2019

But I am still having storage related problems. Trying to deploy a container in fails with the following error:

Retries are prevented. Failure: Error: Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}; Reason: {"errorDetail":{"message":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"},"error":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"}

@dnoliver
Copy link
Author

dnoliver commented Apr 2, 2019

Tried destroying the VCH host created, and create a new one. I run into the same issue. I cannot add the host to a project because of the same error, and after applying the workaround it allows me to do it. But then, I cannot run a container into the host because of the same error in #2413 (comment)

@wjun
Copy link
Contributor

wjun commented Apr 3, 2019

@dnoliver We met the similar issues before when the VC user or the opsuser you use to create VCH do not have the privilege to create the datastore folder. Is that your case?

@dnoliver
Copy link
Author

dnoliver commented Apr 3, 2019

I followed this guide to create the vic-ops user https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/create_ops_user.html

In the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/set_up_ops_user.html docs, I saw:

Grant Any Necessary Permissions
The operations user account must exist before you create a VCH. If you are deploying the VCH to a cluster, vSphere Integrated Containers Engine can configure the operations user account with all of the necessary permissions for you.

IMPORTANT: The option to grant any necessary permissions automatically only applies when deploying VCHs to clusters. If you are deploying the VCH to a standalone host that is managed by vCenter Server, you must configure the operations user account manually. For information about manually configuring the operations user account, see Manually Create a User Account for the Operations User.

I think I am doing a standalone host deployment, so maybe I have to rather change that to the Cluster deployment, or follow the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ops_user_manual.html to assign permissions to that user if I want to do the standalone host deployment.

I will try that and report results backs. Thank you @wjun!

@dnoliver
Copy link
Author

dnoliver commented Apr 3, 2019

I have definitively a datastore permissions problem for my vic-ops user :) thank you for the hint @wjun

  1. I am not using a cluster, deploying directly to a host. So VCH deployment do not help me with permissions
  2. I went trough the manual permissions docs. I cannot guarantee that I did it correctly, given that there are several clicks to be done. in several places. Custom permissions applied to Root VCenter, Datacenter, and ESXi host. Datastore have custom VCH - endpoint - datastore inherited permission
  3. Created the VCH host again with the Wizard, and I left the "apply permissions" checked. This caused that my vic-user permissions to be modified everywhere with the ones created by the tool, instead of the ones that I spend time assigning manually. My mistake, but somebody could add a warning there!
  4. Same error deploying VCH, workaround applied, but cannot create Container (again)
  5. Applied the same permissions that the wizard override. Cannot guarantee that I did it correctly... failed again.
  6. Gave Administrator permission on the datastore for the vic-ops user, I can deploy containers!
  7. Assigned previous permissions on the datastore for the vic-ops user, I run into problems again. This confirm my permission problems.

My VCH - endpoint - datastore permission looks like this:

dvPort group
Modify
Policy operation
Scope operation

Datastore
Allocate space
Browse datastore
Configure datastore
Low level file operations
Remove file

Host
Configuration
System Management

Resource
Assign virtual machine to resource pool
Migrate powered off virtual machine

Virtual machine
Change Configuration
Add existing disk
Add new disk
Add or remove device
Advanced configuration
Modify device settings
Remove disk
Rename
Edit Inventory
Create new
Register
Remove
Unregister
Guest operations
Guest operation modifications
Guest operation program execution
Guest operation queries
Interaction
Connect devices
Power off
Power on

Then, the more accurate question will be: why do vic-ops user run into datastore permission problems while creating VCH and/or running containers, if it have all the permissions specified by the documentation?

@dnoliver
Copy link
Author

dnoliver commented Apr 3, 2019

I did the same deployment, but now using a cluster, and the problem is still there. I have to manually create the images folder to make the VCH host work for the first time, and I need to add administrative role to vic-user on the datastore to make a successful container deployment. So this permission problem happens regardless of doing a cluster or standalone host deployment

@wjun
Copy link
Contributor

wjun commented Apr 8, 2019

I tried VCH create from CLI onto a VC cluster, and it works. Please note --user is an admin user. --ops-user must be combined with --ops-grant-perms so VCH can assign related permissions to this ops-user automatically.

@dnoliver
Copy link
Author

dnoliver commented Apr 8, 2019

Great @wjun, I have only tested this with the UI Wizard, where I think the --user [email protected] is implicit (that is the user I use for log in into vCenter Server). I will give a shot to the CLI command to validate. Thanks!

@dnoliver
Copy link
Author

dnoliver commented Apr 8, 2019

@wjun I have validated the API approach, and I run into the same issue again.

The command used to deploy this VCH was

./vic-machine-linux create --name virtual-container-host-1 \
                                          --compute-resource Cluster \
                                          --image-store 'datastore1 (1)' \
                                          --base-image-size 8GB 
                                          --volume-store 'datastore1 (1):default' \
                                          --bridge-network vic-bridge \
                                          --bridge-network-range 172.16.0.0/12 \
                                          --public-network 'VM Network' \
                                          --tls-cname virtual-container-host-1 \
                                          --certificate-key-size 2048 \
                                          --no-tlsverify --user [email protected] \
                                          --thumbprint <thumb> 
                                          --target 192.168.0.238/Datacenter 
                                          --ops-user [email protected] 
                                          --ops-grant-perms

The command execution log is below:

INFO[0000] ### Installing VCH ####                      
INFO[0000] vSphere password for [email protected]:  
INFO[0003] Loaded server certificate virtual-container-host-1/server-cert.pem 
WARN[0003] Configuring without TLS verify - certificate-based authentication disabled 
INFO[0003] Validating supplied configuration            
INFO[0004] Network configuration OK on "vic-bridge"     
INFO[0004] vCenter settings check OK                    
INFO[0004] Firewall status: ENABLED on "/Datacenter/host/Cluster/192.168.0.217" 
INFO[0004] Firewall configuration OK on hosts:          
INFO[0004] 	"/Datacenter/host/Cluster/192.168.0.217"    
INFO[0004] vCenter settings check OK                    
INFO[0004] License check OK on hosts:                   
INFO[0004]   "/Datacenter/host/Cluster/192.168.0.217"   
INFO[0004] DRS check OK on:                             
INFO[0004]   "/Datacenter/host/Cluster"                 
WARN[0004] Only one host can access all of the image/volume datastores. This may be a point of contention/performance degradation and HA/DRS may not work as intended. 
INFO[0004]                                              
INFO[0005] Creating Resource Pool "virtual-container-host-1" 
INFO[0005] Creating appliance on target                 
INFO[0005] Network role "client" is sharing NIC with "public" 
INFO[0005] Network role "management" is sharing NIC with "public" 
INFO[0005] Creating the VCH folder                      
INFO[0005] Creating the VCH VM                          
INFO[0006] Creating directory [datastore1 (1)] VIC      
INFO[0006] Datastore path is [datastore1 (1)] VIC       
INFO[0007] Uploading ISO images                         
INFO[0008] Uploading appliance.iso as V1.5.2-20879-30B67A14-appliance.iso 
INFO[0027] Uploading bootstrap.iso as V1.5.2-20879-30B67A14-bootstrap.iso 
INFO[0045] Waiting for IP information                   
INFO[0052] Waiting for major appliance components to launch 
INFO[0052] Obtained IP address for client interface: "192.168.0.199" 
INFO[0052] Checking VCH connectivity with vSphere target 
INFO[0052] vSphere API Test: https://192.168.0.238 vSphere API target responds as expected 
ERRO[0225] vic/lib/install/management.(*Dispatcher).CheckDockerAPI: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux 
INFO[0225] Docker API endpoint check failed: context deadline exceeded 
INFO[0225] Collecting 598fc05d-88d3-4d9b-8c5a-f55a274e2db1 vpxd.log 
INFO[0225] 	API may be slow to start - try to connect to API after a few minutes: 
INFO[0225] 		Run command: docker -H 192.168.0.199:2376 --tls info 
INFO[0225] 		If command succeeds, VCH is started. If command fails, VCH failed to install - see documentation for troubleshooting. 
ERRO[0225] vic/cmd/vic-machine/create.(*Create).Run.func3: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux 
ERRO[0225] --------------------                         
ERRO[0225] vic-machine-linux create failed: Creating VCH exceeded time limit of 3m0s. Please increase the timeout using --timeout to accommodate for a busy vSphere target

At least this time I have an error! and not the silent error that the UI Wizard run. In the Docker personality log, I can see the same issue as before:

Apr  8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default  &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}

And I am sure the workaround still apply. If I create the /images directory manually, and assign admin permissions in the datastore to the vic-ops users, this will start working.

I also tried to deploy a VCH keeping the Administrator access for vic-ops in the datastore, and removing the --ops-grant-perms parameter, and it runs into the same issue as before:

Apr  8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default  &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}

So this /images folder error could be not a permissions problem (at least in the datastore), but something else. The Administrator permissions seems to solve the second issue when trying to run a container, but not the initial creation of the images folder.

@wjun
Copy link
Contributor

wjun commented Apr 9, 2019

@dnoliver I tried various combinations of ops-user and datastores, and cannot reproduce in my local env. Could you post your portlayer.log as well where there should be error messages related to images directory creation failure? Another option is to remove --ops-user and --ops-grant-perms during VCH create first and see if you can reproduce the issue or not.

@dnoliver
Copy link
Author

dnoliver commented Apr 9, 2019

Ok, will try to share the portlayer.log file.

The only special thing about my installation is that it is using VM Encryption. I have a KMS, and encryption storage policy, and a couple of encrypted VMs running in the same host. Is that something relevant to this issue?

@bosco777
Copy link

I hate to comment on an old thread, but I have vSAN encryption with vCenter KMS, and experienced the same problem with the 'grant all permissions needed' option, and needing to create the images folder manually for this to work. So this still seems to be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants