Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DHCPCD --deprovision bug #1920

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

Bug-Reaper
Copy link

@Bug-Reaper Bug-Reaper commented Jun 26, 2020

Description

Issue #1890
#1890

Fixes bug where waagent --deprovision would error/crash when trying to delete cached DHCPCD leases.

This error occurs because DHCPCD mounts files onto sub-directories of /var/lib/dhcpcd. These mounted DHCPCD files cannot be deleted whilst mounted (even with the all mighty power of sudo rm). This causes waagent --deprovision to crash when attempting to delete cached DHCPCD leases.

This PR adds the new method umount(path) to common/utils/fileutil.py. If the path argument does not pass os.path.ismount() then nothing happens. Otherwise the argument path is unmounted.

Right before we delete cached DHCPCD leases in the del_dhcp_lease, we call the method fileutil.umount() against paths of known DHCPCD mountpoints that cause this error. Because some mountpoints cannot be unmounted while DHCPCD is in use, we also first make sure that the DHCPCD service is not running. (pa/deprovision/default.py)

In doing so this PR address the crash/error and has been tested to work.


PR information

  • [ x ] The title of the PR is clear and informative.
  • [ x ] There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • [ x ] If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines

___      ___            __       ___      ___       __      
|"  |    |"  |          /""\     |"  \    /"  |     /""\     
||  |    ||  |         /    \     \   \  //   |    /    \    
|:  |    |:  |        /' /\  \    /\\  \/.    |   /' /\  \   
 \  |___  \  |___    //  __'  \  |: \.        |  //  __'  \  
( \_|:  \( \_|:  \  /   /  \\  \ |.  \    /:  | /   /  \\  \ 
 \_______)\_______)(___/    \___)|___|\__/|___|(___/    \___)                                                                                                                          

This is a pretty clean commit. We're adding a umount method to `fileutil.py. Similar to how `rm_file` works, if you pass the umount functiona a path that doesn't exist or isn't mounted then nothing happens.

Then as part of the `del_dhcp_lease` deprovision workflow we're adding a step which removes dirs mounted by DHCPCD.

On all Arch based systems, DHCPCD mounts files which cannot be deleted, even via a `sudo rm`. In turn this causes `waagent --deprovision` to fail with a stack trace when it trys to clear out cached DHCPCD client leases. This commit addresses that issue by umounting dhcpcd related files. For more info see github issue:
Azure#1890
*Fix umount
*Kill DHCPCD service w/systemctl
@ghost
Copy link

ghost commented Jun 26, 2020

CLA assistant check
All CLA requirements met.

@codecov
Copy link

codecov bot commented Jun 26, 2020

Codecov Report

Merging #1920 into develop will decrease coverage by 0.01%.
The diff coverage is 53.84%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1920      +/-   ##
===========================================
- Coverage    69.60%   69.59%   -0.02%     
===========================================
  Files           85       85              
  Lines        11921    11931      +10     
  Branches      1670     1673       +3     
===========================================
+ Hits          8298     8303       +5     
- Misses        3252     3257       +5     
  Partials       371      371              
Impacted Files Coverage Δ
azurelinuxagent/common/utils/fileutil.py 76.81% <28.57%> (-2.58%) ⬇️
azurelinuxagent/pa/deprovision/default.py 68.53% <83.33%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71654c4...67e6f74. Read the comment docs.

Copy link
Contributor

@trstringer trstringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the individual suggestions, how are you testing this solution? What are your steps? Thanks for the contribution!

Comment on lines +129 to +130
# find all possible mounts
for path in glob.glob(paths):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary indentation here after the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the indentation is unnecessary here. I'll amend that.

Comment on lines +132 to +133
command=["umount",path]
shellutil.run_command(command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the usage of the command var necessary here? Looks like a single-use variable, consider combining this into a single line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we can shorten this to one line.

Comment on lines +114 to +120

# For Arch based systems. Kill DHCPCD service. Needed to clear cached leases.
shellutil.run("systemctl stop dhcpcd", chk_err=False)

# For Arch based systems. Files mounted by DHCPCD can't be deleted whilst mounted, not even by root.
dirs_to_umount = ["/var/lib/dhcpcd/run/systemd/journal", "/var/lib/dhcpcd/run/udev", "/var/lib/dhcpcd/sys", "/var/lib/dhcpcd/dev", "/var/lib/dhcpcd/proc"]
actions.append(DeprovisionAction(fileutil.umount, dirs_to_umount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is Arch-specific logic, then it should go into the ArchDeprovisionHandler.

Copy link
Author

@Bug-Reaper Bug-Reaper Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So perhaps I should rephrase my comment here. While it's true this is for Arch systems, it's also for any *Nix distro that's running the latest version of dhcpcd. A quick check on wiki shows a large number of distros can be configured with systemd/dhcpcd.
Screenshot_2020-06-26_17-18-33

The systems that can be configured with systemd/dhcpcd include: Arch Linux, CentOS, CoreOS, Fedora, Mageia, Manjaro Linux, openSUSE, SUSE Linux Enterprise Server, Red Hat Enterprise Linux, Solus, Linux Mint, Ubuntu and Debian.

Among those, many use systemd/dhcpcd by default.

I believe this is the correct place to run this, but I can update my comment to make it clear this is a DHCPCD specific issue and not an Arch one.

@Bug-Reaper
Copy link
Author

Initial testing is covered in the issue.

Testing methodology for this PR has gone something like this:
(On a virtual system that's using the latest systemd/dhcpcd)

  1. Install the latest waagent and run waagent --deprovision to confirm crash is still present on the latest version.
  2. Clone my fork to my home directory.
  3. Copy my forked version of the azurelinuxagent libs to replace the version installed by waagent.
  4. Run waagent --deprovision again and confirm it completes successfully without crashing.
  5. Shutdown my VM.
  6. Upload the corresponding VHD file into Azure storage.
  7. Create an Azure disk from that VHD file and deploy it as an Azure VM.
  8. Confirm that VM is running normally (by SSHing in) and also that valid information is being reported about that VM in the Azure web portal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants