Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix reboot_test.go and collectEntries #2663

Merged
merged 6 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 32 additions & 15 deletions dctest/reboot_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package dctest
import (
"encoding/json"
"fmt"
"slices"
"strings"
"time"

Expand Down Expand Up @@ -229,6 +230,26 @@ func testNecoRebooterRebootGracefully() {
return nil
}).Should(Succeed())

By("getting available racks")
stdout, _, err := execAt(bootServers[0], "kubectl", "get", "nodes", "-o", "json")
Expect(err).NotTo(HaveOccurred())
nodes := corev1.NodeList{}
err = json.Unmarshal(stdout, &nodes)
Expect(err).NotTo(HaveOccurred())
racksTmp := []string{}
for _, node := range nodes.Items {
rack := node.Labels["topology.kubernetes.io/zone"]
if len(rack) == 0 {
continue
}
racksTmp = append(racksTmp, rack[len(rack)-1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you cut out numbers here?
In most reference points, fmt.Sprintf("rack%s", racks[0]) is used. And it making the code difficult to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to extract only rack numbers because the format of "rack" is different between application.

  • k8s label: topology.kubernetes.io/zone=rack1
  • sabakan option: sabactl machines get --rack=1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. How about extracting the number only in neco-rebooter args?
Also, does sh -c work as expected?

rebootCmd := fmt.Sprintf("yes | neco rebooter reboot-worker --rack=%s", racks[0][4:] /* extract a rack number: "rackN" -> "N" */)
execSafeAt(bootServers[0], "sh", "-c", "'"+rebootCmd+"'")

or

execSafeAt(bootServers[0], "yes", "|", "neco", "rebooter", "reboot-worker", "--rack="+racks[0][4:] /* extract a rack number: "rackN" -> "N" */)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. and I fixed the code to extract number only in neco-rebooter command.

And, sh-c works fine.
It had been used before neco-rebooter was implemented like below.

execSafeAt(bootServers[0], "sh", "-c", "yes | neco reboot-worker "+necoRebootWorkerOptions)

}
slices.Sort(racksTmp)
racks := slices.Compact(racksTmp)
if len(racks) < 2 {
Fail("less than 2 racks are available")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use Expect() here too?


By("changing neco-rebooter config")
config := necorebooter.Config{
RebootTimes: []necorebooter.RebootTimes{
Expand Down Expand Up @@ -286,7 +307,7 @@ func testNecoRebooterRebootGracefully() {
},
},
NodeSelector: map[string]string{
"topology.kubernetes.io/zone": "rack1",
"topology.kubernetes.io/zone": fmt.Sprintf("rack%s", racks[0]),
},
},
}
Expand Down Expand Up @@ -317,18 +338,14 @@ func testNecoRebooterRebootGracefully() {
Expect(err).NotTo(HaveOccurred())

By("creating a pod")
stdout, stderr, err := execAtWithInput(bootServers[0], podYaml, "kubectl", "apply", "-f", "-")
if err != nil {
Fail(fmt.Sprintf("stdout: %s, stderr: %s, err: %v", stdout, stderr, err))
}
_, _, err = execAtWithInput(bootServers[0], podYaml, "kubectl", "apply", "-f", "-")
Expect(err).NotTo(HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. But why don't you keep the stdout and stderr?
We can pass the optionalDescription, if we want.

https://github.com/onsi/gomega/blob/7cabed651e981da0c7d0219ad7208626cef58016/gomega_dsl.go#L524

By("creating a pdb")
stdout, stderr, err = execAtWithInput(bootServers[0], pdbYaml, "kubectl", "apply", "-f", "-")
if err != nil {
Fail(fmt.Sprintf("stdout: %s, stderr: %s, err: %v", stdout, stderr, err))
}
_, _, err = execAtWithInput(bootServers[0], pdbYaml, "kubectl", "apply", "-f", "-")
Expect(err).NotTo(HaveOccurred())

By("adding rack1 nodes to reboot-list")
execSafeAt(bootServers[0], "sh", "-c", "yes | neco rebooter reboot-worker --rack=1")
By(fmt.Sprintf("adding rack%s nodes to reboot-list", racks[0]))
execSafeAt(bootServers[0], "sh", "-c", fmt.Sprintf("yes | neco rebooter reboot-worker --rack=%s", racks[0]))

By("enable rebooting")
execSafeAt(bootServers[0], "ckecli", "rq", "enable")
Expand All @@ -346,16 +363,16 @@ func testNecoRebooterRebootGracefully() {
return nil
}).Should(Succeed())

By("adding rack2 nodes to reboot-list")
execSafeAt(bootServers[0], "sh", "-c", "yes | neco rebooter reboot-worker --rack=2")
By(fmt.Sprintf("adding rack%s nodes to reboot-list", racks[1]))
execSafeAt(bootServers[0], "sh", "-c", fmt.Sprintf("yes | neco rebooter reboot-worker --rack=%s", racks[1]))

By("waiting for skipping rack1 and moving to rack2")
By(fmt.Sprintf("waiting for skipping rack%s and moving to rack%s", racks[0], racks[1]))
Eventually(func() error {
stdout, stderr, err := execAt(bootServers[0], "neco", "rebooter", "show-processing-group")
if err != nil {
return fmt.Errorf("stdout: %s, stderr: %s, err: %v", stdout, stderr, err)
}
if string(stdout) != "rack2\n" {
if string(stdout) != fmt.Sprintf("rack%s\n", racks[1]) {
return fmt.Errorf("reboot-queue is not processed")
}
return nil
Expand Down
3 changes: 3 additions & 0 deletions pkg/neco-rebooter/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,9 @@ func (c *Controller) collectEntries(rebootListEntries []*neco.RebootListEntry, r
}
}
for _, rqEntry := range rebootQueueEntries {
if rqEntry.Status == cke.RebootStatusCancelled {
continue
}
rlEntry := findRebootListEntryFromRebootQueueEntry(rebootListEntries, *rqEntry)
if rlEntry == nil {
orphanedEntry = append(orphanedEntry, EntrySet{nil, rqEntry})
Expand Down
4 changes: 4 additions & 0 deletions pkg/neco-rebooter/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,10 @@ groupLabelKey: topology.kubernetes.io/zone
Node: "node7",
Status: cke.RebootStatusQueued,
},
{ // orphaned but cancelled node
Node: "node8",
Status: cke.RebootStatusCancelled,
},
}
collection := c.collectEntries(rebootListEntries, rebootQueueEntry, "group1")
if len(collection.CancelledEntry) != 1 {
Expand Down