Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v2 upgrade): support engine live upgrade #241

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

derekbit
Copy link
Member

@derekbit derekbit commented Nov 17, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9104

Signed-off-by: Derek Su [email protected]

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

@derekbit derekbit self-assigned this Nov 17, 2024
Copy link

coderabbitai bot commented Nov 17, 2024

Walkthrough

The changes in this pull request introduce a new field, StandbyTargetPort, to the Engine struct across several files, including pkg/api/types.go and pkg/spdk/engine.go. The ProtoEngineToEngine function is updated to accommodate this new field. Additionally, various methods in pkg/spdk/engine.go are restructured to enhance error handling and streamline parameters. A new test suite is added in pkg/spdk/engine_test.go to validate the functionality of the updated methods. The EngineCreate method in pkg/client/client.go and pkg/spdk/server.go is also modified to remove the upgradeRequired parameter.

Changes

File Change Summary
pkg/api/types.go Added StandbyTargetPort int32 \json:"standby_target_port"`toEnginestruct; updatedProtoEngineToEngine` function.
pkg/spdk/engine.go Added StandbyTargetPort to Engine; restructured Create, handleFrontend, Delete, and SwitchOverTarget methods; added isNewEngine and checkInitiatorAndTargetCreationRequirements methods; updated error handling and logging.
pkg/spdk/engine_test.go Added tests: TestCheckInitiatorAndTargetCreationRequirements, TestIsNewEngine, and TestReleaseTargetAndStandbyTargetPorts.
pkg/client/client.go Updated EngineCreate method to remove upgradeRequired parameter; enhanced error handling in ReplicaRebuildingSrcStart.
pkg/spdk_test.go Removed TestSPDKEngineCreateWithUpgradeRequired; updated parameters in TestSPDKMultipleThread and TestSPDKMultipleThreadSnapshotOpsAndRebuilding.
pkg/spdk/server.go Updated EngineCreate method to remove req.UpgradeRequired parameter.
pkg/spdk/types.go Updated constant replicaTransportAckTimeout from 14 to 12.

Assessment against linked issues

Objective Addressed Explanation
Support live upgrade for control plane (#9104) The PR does not implement the requested feature.

Possibly related PRs

Suggested reviewers

  • shuo-wu
  • c3y1huang
  • innobead
  • DamiaSan

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1f1a32b and 9ab4265.

📒 Files selected for processing (1)
  • pkg/spdk/types.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pkg/spdk/types.go

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
pkg/api/types.go (1)

132-132: LGTM! Consider adding field documentation.

The new StandbyTargetPort field is well-positioned and follows the codebase conventions. Consider adding a comment to document its purpose in the context of live upgrades.

+	// StandbyTargetPort is used during live upgrades to maintain service availability
 	StandbyTargetPort int32                 `json:"standby_target_port"`
pkg/spdk/engine.go (3)

252-252: Possible misuse of variable in log message

At line 252, the log statement uses e.ReplicaModeMap to display the replicas being connected. However, e.ReplicaModeMap might not represent the list of replicas intended for logging.

Consider using replicaBdevList, which contains the list of replica block devices:

- e.log.Infof("Connecting all available replicas %+v, then launching raid during engine creation", e.ReplicaModeMap)
+ e.log.Infof("Connecting all available replicas %+v, then launching raid during engine creation", replicaBdevList)

This change will provide more accurate logging information about the replicas being connected.


685-688: Remove commented-out code to improve code clarity

The commented-out code at lines 685-688 is not used and can be removed to enhance readability and maintainability.

Apply this diff to remove the unused code:

- // podIP, err := commonnet.GetIPForPod()
- // if err != nil {
- //     return err
- // }

If this code is needed for future development, consider adding a comment explaining its purpose or moving it to a separate location.


Line range hint 1881-1881: Typographical error in function name closeRplicaClients

There is a typo in the function name closeRplicaClients. It should be closeReplicaClients for clarity and consistency.

Apply this diff to correct the function name:

- func (e *Engine) closeRplicaClients(replicaClients map[string]*client.SPDKClient) {
+ func (e *Engine) closeReplicaClients(replicaClients map[string]*client.SPDKClient) {

Make sure to update all references to this function accordingly.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bf16180 and 668e4d2.

⛔ Files ignored due to path filters (2)
  • go.mod is excluded by !go.mod
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (2 hunks)
  • pkg/spdk/engine.go (15 hunks)
🧰 Additional context used
🪛 golangci-lint
pkg/spdk/engine.go

315-315: expected declaration, found '<<'

(typecheck)

🔇 Additional comments (5)
pkg/api/types.go (2)

153-153: LGTM! Field mapping is correct.

The StandbyTargetPort field is properly mapped from the protobuf struct to the Engine type.


Line range hint 132-153: Verify protobuf definition exists.

Ensure that the standby_target_port field is properly defined in the SPDK service protobuf files.

pkg/spdk/engine.go (3)

45-45: New field StandbyTargetPort added to Engine struct

The addition of the StandbyTargetPort field to the Engine struct is appropriate. Ensure that this new field is properly initialized and consistently used throughout the codebase to avoid any unexpected behavior.


108-110: Function isNewEngine correctly determines if the engine is new

The isNewEngine function effectively checks if the IP field is empty to determine if the engine instance is new. This is a simple and efficient approach.


618-618: Field StandbyTargetPort included in getWithoutLock

The StandbyTargetPort field is correctly added to the Engine object returned by the getWithoutLock method. This ensures that the field will be included in any serialized representations of the engine's state.

pkg/spdk/engine.go Outdated Show resolved Hide resolved
@derekbit derekbit changed the title chore(vendor): update dependencies feat(v2 upgrade): support engine live upgrade Nov 17, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

Line range hint 2173-2224: Document target port management logic

The target switching logic correctly handles the port assignments, but would benefit from a comment explaining when and why StandbyTargetPort is reset to 0.

Add a comment before the condition:

+ // Reset StandbyTargetPort when switching to the pod's IP as the target,
+ // since this node is now the primary target
  if targetIP == podIP {
    e.TargetPort = targetPort
    e.StandbyTargetPort = 0
  }
🛑 Comments failed to post (1)
pkg/spdk/engine.go (1)

315-315: ⚠️ Potential issue

Critical: Resolve merge conflict

There is an unresolved merge conflict marker at line 315 (<<<<<<< HEAD). This needs to be resolved before the code can be merged.

Please resolve the merge conflict by:

  1. Running git merge or git rebase to update your branch
  2. Resolving the conflicts by choosing the appropriate code
  3. Removing all conflict markers (<<<<<<<, =======, >>>>>>>)
🧰 Tools
🪛 golangci-lint

315-315: expected declaration, found '<<'

(typecheck)

@derekbit derekbit force-pushed the v2-control-upgrade branch 2 times, most recently from b15f126 to 7f0d7e1 Compare November 17, 2024 09:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

Line range hint 4-4: Consider adding tests for live upgrade scenarios

While the implementation looks solid, consider adding comprehensive test coverage for:

  • Standby target creation and cleanup
  • Target switchover scenarios
  • Error cases during live upgrades

Would you like me to help generate test cases for these scenarios?

🛑 Comments failed to post (1)
pkg/spdk/engine.go (1)

216-218: ⚠️ Potential issue

Add error handling for BdevRaidGet

The empty branch after BdevRaidGet error check could lead to silent failures. Consider adding appropriate error handling.

Apply this diff to add error handling:

if targetCreationRequired {
    _, err := spdkClient.BdevRaidGet(e.Name, 0)
    if err != nil {
+       if !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
+           return nil, errors.Wrapf(err, "failed to get raid bdev %s", e.Name)
+       }
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	if targetCreationRequired {
		_, err := spdkClient.BdevRaidGet(e.Name, 0)
		if err != nil {
			if !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
				return nil, errors.Wrapf(err, "failed to get raid bdev %s", e.Name)
			}
		}
🧰 Tools
🪛 golangci-lint

218-218: SA9003: empty branch

(staticcheck)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
pkg/spdk/engine.go (3)

176-202: Consider simplifying the initialization logic

The current implementation has nested conditions that could be simplified for better readability and maintainability. Consider extracting the logic into separate helper functions.

Example refactor:

-	if podIP == initiatorIP && podIP == targetIP {
-		if e.Port == 0 && e.TargetPort == 0 {
-			e.log.Info("Creating both initiator and target instances")
-			initiatorCreationRequired = true
-			targetCreationRequired = true
-		} else if e.Port != 0 && e.TargetPort == 0 {
-			e.log.Info("Creating a target instance")
-			targetCreationRequired = true
-			if e.StandbyTargetPort != 0 {
-				e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-				return e.getWithoutLock(), nil
-			}
-		} else {
-			return nil, fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
-		}
+   creationMode := determineCreationMode(podIP, initiatorIP, targetIP, e.Port, e.TargetPort)
+   switch creationMode {
+   case createBoth:
+       e.log.Info("Creating both initiator and target instances")
+       initiatorCreationRequired = true
+       targetCreationRequired = true
+   case createTargetOnly:
+       e.log.Info("Creating a target instance")
+       targetCreationRequired = true
+       if e.StandbyTargetPort != 0 {
+           e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+           return e.getWithoutLock(), nil
+       }
+   case createInitiatorOnly:
+       e.log.Info("Creating an initiator instance")
+       initiatorCreationRequired = true
+   default:
+       return nil, fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
+   }

397-399: Add documentation for standby target creation condition

The condition for standby target creation could benefit from a comment explaining when and why it's needed.

+   // Create a standby target if we have an active initiator (e.Port != 0)
+   // but no active target (e.TargetPort == 0)
    standbyTargetCreationRequired := false
    if e.Port != 0 && e.TargetPort == 0 {
        standbyTargetCreationRequired = true
    }

2169-2173: Enhance error handling for pod IP operations

The pod IP retrieval and port updates are critical for live upgrades. Consider adding more detailed error messages and logging.

    podIP, err := commonnet.GetIPForPod()
    if err != nil {
-       return err
+       return errors.Wrapf(err, "failed to get pod IP for engine %s target switchover", e.Name)
    }

    if targetIP == podIP {
+       e.log.Infof("Target IP matches pod IP, updating ports: target=%d, standby=0", targetPort)
        e.TargetPort = targetPort
        e.StandbyTargetPort = 0
    }

Also applies to: 2216-2220

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b15f126 and 5b4f847.

📒 Files selected for processing (2)
  • pkg/api/types.go (2 hunks)
  • pkg/spdk/engine.go (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/types.go
🔇 Additional comments (5)
pkg/spdk/engine.go (5)

45-45: LGTM: Added StandbyTargetPort field for standby target management

The new field is well-integrated with the existing port fields and supports the live upgrade functionality.


108-110: LGTM: Clear and focused helper function

The function provides a clean way to determine if an engine is new based on its IP state.


204-207: LGTM: Clear initialization of engine IPs

The initialization logic correctly sets the IP values for new engines using the helper function.


476-485: LGTM: Clear port assignment logic

The code correctly handles port assignment for both initiator and target instances, including the standby target scenario.


Line range hint 2169-2220: Verify concurrent access during target switchover

The target switchover process involves multiple steps and state changes. Ensure there are no race conditions when:

  1. Multiple goroutines attempt switchover simultaneously
  2. A switchover occurs during engine creation/deletion
  3. Pod IP changes during switchover
✅ Verification successful

Let me generate additional verification scripts to check for proper locking mechanisms and concurrent access patterns.


Based on the verification results, I can now generate the final response:

Proper locking mechanisms are in place during target switchover

The code correctly handles concurrent access during target switchover:

  • SwitchOverTarget method acquires the engine lock before state modifications
  • Related operations like Create, Delete, and DeleteTarget also use proper locking
  • State changes (TargetPort, StandbyTargetPort) are protected by the mutex

The original concerns about race conditions are addressed by the existing implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential concurrent access patterns
ast-grep --pattern $'func \(.*\) SwitchOverTarget\(.*\) {
  $$$
  e.Lock()
  $$$
}'

# Check for other methods that might interact with target state
rg -A 5 "e\.(TargetPort|StandbyTargetPort)"

Length of output: 6650


Script:

#!/bin/bash
# Check for all methods that modify engine state and their locking patterns
ast-grep --pattern $'func \(e \*Engine\) $_($$) {
  $$$
  e.Lock()
  $$$
}'

# Check for methods that might be called during switchover
rg -A 5 "func.*Engine.*\b(Create|Delete|Switch|Connect|Disconnect|Reload)" pkg/spdk/

Length of output: 1794

Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 18.18182% with 189 lines in your changes missing coverage. Please review.

Project coverage is 8.87%. Comparing base (537b20a) to head (9ab4265).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/spdk/engine.go 17.54% 186 Missing and 2 partials ⚠️
pkg/spdk/server.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #241      +/-   ##
========================================
+ Coverage   8.42%   8.87%   +0.44%     
========================================
  Files         21      21              
  Lines       7158    7225      +67     
========================================
+ Hits         603     641      +38     
- Misses      6475    6501      +26     
- Partials      80      83       +3     
Flag Coverage Δ
unittests 8.87% <18.18%> (+0.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
pkg/spdk/engine_test.go (1)

98-115: Improve test execution logic and error handling.

The test execution can be improved in several ways:

  1. Use test case name in log message instead of array index:
-c.Logf("testing TestCheckInitiatorAndTargetCreationRequirements.%v", testName)
+c.Logf("testing TestCheckInitiatorAndTargetCreationRequirements: %s", testCase.name)
  1. Consider extracting engine creation to a helper function for better reusability:
func createTestEngine(port, targetPort, standbyTargetPort int32, name string) *Engine {
    return &Engine{
        Port:              port,
        TargetPort:        targetPort,
        StandbyTargetPort: standbyTargetPort,
        Name:              name,
        log:              logrus.New(),
    }
}
  1. Add validation of log messages to ensure proper error logging.
pkg/spdk/engine.go (3)

Line range hint 398-498: Consider enhancing error handling in deferred function

The deferred function at line 429 contains complex logic for initiator assignment. Consider extracting this into a separate helper function for better maintainability and error handling.

Consider refactoring like this:

+ func (e *Engine) assignInitiator(initiator *nvme.Initiator, dmDeviceBusy bool, standbyTargetCreationRequired bool) {
+     if !standbyTargetCreationRequired {
+         e.initiator = initiator
+         e.dmDeviceBusy = dmDeviceBusy
+         e.Endpoint = initiator.GetEndpoint()
+         e.log = e.log.WithFields(logrus.Fields{
+             "endpoint":   e.Endpoint,
+             "port":      e.Port,
+             "targetPort": e.TargetPort,
+         })
+     }
+     e.log.Infof("Finished handling frontend for engine: %+v", e)
+ }

  defer func() {
      if err == nil {
-         if !standbyTargetCreationRequired {
-             e.initiator = initiator
-             e.dmDeviceBusy = dmDeviceBusy
-             ...
-         }
-         e.log.Infof("Finished handling frontend for engine: %+v", e)
+         e.assignInitiator(initiator, dmDeviceBusy, standbyTargetCreationRequired)
      }
  }()

208-214: Consider standardizing error handling patterns

While error handling is generally good, consider standardizing the approach across the codebase. For example, some errors are wrapped with context while others are returned directly.

Consider consistently using error wrapping with context:

- if err != nil {
-     return nil, err
- }
+ if err != nil {
+     return nil, errors.Wrapf(err, "failed to get pod IP during engine creation")
+ }

Also applies to: 2181-2184


261-262: Address TODO comment about logging

The TODO comment indicates that log messages need improvement. Consider enhancing the log message to include more context about the operation being performed.

Consider improving the log message:

- // TODO: improve the log message
- e.log.Infof("Connecting all available replicas %+v, then launching raid during engine creation", e.ReplicaStatusMap)
+ e.log.Infof("Connecting available replicas for engine %s (replicas: %+v) and initializing RAID configuration", e.Name, e.ReplicaStatusMap)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0aae707 and 3c6c4a3.

📒 Files selected for processing (3)
  • pkg/api/types.go (2 hunks)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/types.go
🔇 Additional comments (7)
pkg/spdk/engine_test.go (2)

1-10: LGTM!

The imports are clean and necessary. The test file is properly structured using the check.v1 testing framework.


1-115: Verify test coverage for live upgrade functionality.

The test file introduces basic validation for the standby target port, which is a step towards supporting live upgrades. However, to ensure robust support for engine live upgrades (as per PR objectives), consider adding more comprehensive test scenarios that validate:

  1. The complete upgrade workflow
  2. State transitions during upgrade
  3. Error recovery scenarios
  4. Upgrade rollback cases
✅ Verification successful

Based on the code inspection results, I can now generate the final response:

Test coverage appears sufficient for the current implementation scope

The test coverage aligns well with the current implementation, which focuses on the standby port allocation and basic validation aspects of the upgrade functionality. The test cases appropriately verify:

  • Standby target port validation
  • Port allocation scenarios
  • Creation requirements for different network configurations

While the initial review suggested adding more test scenarios, the current implementation scope is primarily focused on the port management aspect of upgrades, which is adequately tested. The suggested additional test scenarios would be premature as the complete upgrade workflow implementation is not yet present in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test coverage for upgrade-related code

# Look for other upgrade-related tests
rg -l "upgrade|standby" --type go

# Check for upgrade-related functions in the engine implementation
ast-grep --pattern 'func $_($$) {
  $$$
  standby
  $$$
}'

Length of output: 633


Script:

#!/bin/bash
# Let's examine the engine implementation and related test files more closely

# Check engine.go for standby-related functions and their implementation
rg "standby|Standby" pkg/spdk/engine.go -B 2 -A 2

# Look for upgrade-related functions in engine.go
rg "upgrade|Upgrade" pkg/spdk/engine.go -B 2 -A 2

# Check the API types related to upgrade
rg "upgrade|Upgrade" pkg/api/types.go -B 2 -A 2

Length of output: 2242

pkg/spdk/engine.go (5)

37-49: LGTM: Engine struct changes

The addition of StandbyTargetPort field is well-placed and consistent with other port-related fields in the struct.


108-110: LGTM: Clear and focused helper function

The isNewEngine function provides a clean abstraction for checking engine initialization state.


Line range hint 2181-2232: LGTM: Well-structured target switchover logic

The changes properly handle the standby target port during switchover operations, with appropriate error handling and state management.


112-142: ⚠️ Potential issue

Fix redundant condition check

The function contains a redundant check for e.StandbyTargetPort != 0 at line 123, which is unnecessary since it's already within a block that checks for e.Port != 0 && e.TargetPort == 0.

Apply this diff to fix the redundant condition:

if e.Port != 0 && e.TargetPort == 0 {
    e.log.Info("Creating a target instance")
-   if e.StandbyTargetPort != 0 {
        e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-   } else {
        targetCreationRequired = true
-   }
}

Likely invalid or redundant comment.


112-142: Ensure test coverage for new functionality

The new checkInitiatorAndTargetCreationRequirements function contains complex logic for standby target handling. Please ensure adequate test coverage is added.

pkg/spdk/engine_test.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk_test.go (1)

Line range hint 225-322: Document the new upgrade mechanism architecture

The changes suggest a significant shift in how engine upgrades are handled, moving from a parameter-based approach to potentially a more sophisticated mechanism. To ensure maintainability and clarity:

  1. Consider adding documentation that explains the new upgrade mechanism
  2. Update or create integration tests that specifically validate the live upgrade scenarios
  3. Consider adding a migration guide for users who might be affected by this architectural change

Would you like assistance in creating:

  • Documentation template for the new upgrade mechanism?
  • Integration test scenarios for live upgrades?
  • Migration guide for existing users?
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3c6c4a3 and 92bcd2f.

📒 Files selected for processing (2)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk_test.go (2 hunks)
🔇 Additional comments (3)
pkg/spdk_test.go (2)

225-225: Verify the removal of upgradeRequired parameter

The EngineCreate call has been updated to remove the upgradeRequired parameter. This change should be verified against the engine creation logic to ensure it aligns with the new live upgrade support feature.

#!/bin/bash
# Description: Verify the changes in EngineCreate signature across the codebase

# Check for any remaining references to upgradeRequired in engine creation
rg -l "EngineCreate.*upgradeRequired" 

# Check the updated EngineCreate method signature
ast-grep --pattern 'func $_EngineCreate($_) {
  $$$
}'

250-322: ⚠️ Potential issue

Clarify the status of the commented test implementation

The TestSPDKEngineCreateWithUpgradeRequired test appears to contain important validations for the engine upgrade functionality. If this test is being replaced or modified as part of the live upgrade feature:

  1. Please clarify if this test will be replaced with new test cases
  2. Ensure the new test coverage includes validation of both initiator and target properties
  3. Consider preserving the valuable assertions in any new test implementation
pkg/client/client.go (1)

Line range hint 474-492: LGTM: Method implementation remains consistent

The implementation maintains proper parameter validation, error handling, and gRPC timeout management while removing the upgrade-related parameter.

pkg/client/client.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

112-142: Improve error messages and add test coverage

The function has clear logic but could benefit from:

  1. More descriptive error messages that explain why the addresses are invalid
  2. Unit tests to verify the complex branching logic

Consider improving the error messages:

-err = fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
+err = fmt.Errorf("invalid initiator address %s and target address %s for engine %s creation: addresses must match pod IP %s", initiatorIP, targetIP, e.Name, podIP)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 92bcd2f and 2adf734.

📒 Files selected for processing (1)
  • pkg/spdk/engine.go (14 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 208-214: pkg/spdk/engine.go#L208-L214
Added lines #L208 - L214 were not covered by tests


[warning] 216-220: pkg/spdk/engine.go#L216-L220
Added lines #L216 - L220 were not covered by tests


[warning] 223-226: pkg/spdk/engine.go#L223-L226
Added lines #L223 - L226 were not covered by tests


[warning] 228-231: pkg/spdk/engine.go#L228-L231
Added lines #L228 - L231 were not covered by tests


[warning] 263-264: pkg/spdk/engine.go#L263-L264
Added lines #L263 - L264 were not covered by tests


[warning] 269-269: pkg/spdk/engine.go#L269
Added line #L269 was not covered by tests


[warning] 271-271: pkg/spdk/engine.go#L271
Added line #L271 was not covered by tests


[warning] 282-285: pkg/spdk/engine.go#L282-L285
Added lines #L282 - L285 were not covered by tests


[warning] 292-292: pkg/spdk/engine.go#L292
Added line #L292 was not covered by tests


[warning] 302-303: pkg/spdk/engine.go#L302-L303
Added lines #L302 - L303 were not covered by tests


[warning] 306-315: pkg/spdk/engine.go#L306-L315
Added lines #L306 - L315 were not covered by tests


[warning] 401-401: pkg/spdk/engine.go#L401
Added line #L401 was not covered by tests


[warning] 411-413: pkg/spdk/engine.go#L411-L413
Added lines #L411 - L413 were not covered by tests


[warning] 422-429: pkg/spdk/engine.go#L422-L429
Added lines #L422 - L429 were not covered by tests


[warning] 431-442: pkg/spdk/engine.go#L431-L442
Added lines #L431 - L442 were not covered by tests


[warning] 444-444: pkg/spdk/engine.go#L444
Added line #L444 was not covered by tests


[warning] 448-462: pkg/spdk/engine.go#L448-L462
Added lines #L448 - L462 were not covered by tests


[warning] 464-466: pkg/spdk/engine.go#L464-L466
Added lines #L464 - L466 were not covered by tests


[warning] 468-468: pkg/spdk/engine.go#L468
Added line #L468 was not covered by tests


[warning] 471-473: pkg/spdk/engine.go#L471-L473
Added lines #L471 - L473 were not covered by tests


[warning] 476-476: pkg/spdk/engine.go#L476
Added line #L476 was not covered by tests


[warning] 479-482: pkg/spdk/engine.go#L479-L482
Added lines #L479 - L482 were not covered by tests


[warning] 484-495: pkg/spdk/engine.go#L484-L495
Added lines #L484 - L495 were not covered by tests


[warning] 501-502: pkg/spdk/engine.go#L501-L502
Added lines #L501 - L502 were not covered by tests


[warning] 510-511: pkg/spdk/engine.go#L510-L511
Added lines #L510 - L511 were not covered by tests


[warning] 515-515: pkg/spdk/engine.go#L515
Added line #L515 was not covered by tests


[warning] 517-519: pkg/spdk/engine.go#L517-L519
Added lines #L517 - L519 were not covered by tests


[warning] 628-628: pkg/spdk/engine.go#L628
Added line #L628 was not covered by tests

🔇 Additional comments (3)
pkg/spdk/engine.go (3)

Line range hint 2183-2234: LGTM: Robust target switchover implementation

The target switchover implementation:

  1. Properly validates pod IP
  2. Handles standby target port appropriately for live upgrades
  3. Includes fallback logic for error cases

The error handling and state management look solid.


Line range hint 400-520: Add test coverage for critical paths

The live upgrade functionality includes critical paths that should be tested:

  1. Engine creation with standby target
  2. Target switchover scenarios
  3. Error handling and recovery paths

Consider adding integration tests to verify the end-to-end functionality.

#!/bin/bash
# Check existing test coverage
rg -l "func Test.*Target" 
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 422-429: pkg/spdk/engine.go#L422-L429
Added lines #L422 - L429 were not covered by tests


[warning] 431-442: pkg/spdk/engine.go#L431-L442
Added lines #L431 - L442 were not covered by tests


[warning] 444-444: pkg/spdk/engine.go#L444
Added line #L444 was not covered by tests


[warning] 448-462: pkg/spdk/engine.go#L448-L462
Added lines #L448 - L462 were not covered by tests


[warning] 464-466: pkg/spdk/engine.go#L464-L466
Added lines #L464 - L466 were not covered by tests


[warning] 468-468: pkg/spdk/engine.go#L468
Added line #L468 was not covered by tests


[warning] 471-473: pkg/spdk/engine.go#L471-L473
Added lines #L471 - L473 were not covered by tests


[warning] 476-476: pkg/spdk/engine.go#L476
Added line #L476 was not covered by tests


[warning] 479-482: pkg/spdk/engine.go#L479-L482
Added lines #L479 - L482 were not covered by tests


[warning] 484-495: pkg/spdk/engine.go#L484-L495
Added lines #L484 - L495 were not covered by tests


[warning] 501-502: pkg/spdk/engine.go#L501-L502
Added lines #L501 - L502 were not covered by tests


[warning] 510-511: pkg/spdk/engine.go#L510-L511
Added lines #L510 - L511 were not covered by tests


[warning] 515-515: pkg/spdk/engine.go#L515
Added line #L515 was not covered by tests


[warning] 517-519: pkg/spdk/engine.go#L517-L519
Added lines #L517 - L519 were not covered by tests


108-110: Add documentation and tests for isNewEngine

The function logic is clear but would benefit from documentation explaining what constitutes a "new" engine. Additionally, consider adding unit tests to cover this helper function.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (7)
pkg/spdk/engine_test.go (2)

11-23: Add documentation for the test function and struct fields.

Consider adding documentation to improve code maintainability:

  • Add a function comment explaining the purpose and methodology of the test
  • Document the struct fields, especially explaining the relationship between different IP addresses and ports
+// TestCheckInitiatorAndTargetCreationRequirements validates the logic for determining
+// when new initiator and target instances need to be created based on various
+// network configurations and port settings.
 func (s *TestSuite) TestCheckInitiatorAndTargetCreationRequirements(c *C) {
 	testCases := []struct {
+		// name is a descriptive identifier for the test case
 		name                              string
+		// podIP is the IP address of the pod running the engine
 		podIP                             string
+		// initiatorIP is the IP address for the SPDK initiator
 		initiatorIP                       string
+		// targetIP is the IP address for the SPDK target
 		targetIP                          string

24-96: Add test cases for port validation and error conditions.

While the current test cases cover basic scenarios, consider adding these additional cases to improve coverage:

  1. Port validation:

    • Test negative port values
    • Test port numbers exceeding valid range
    • Test zero port with non-zero standby port
  2. Error conditions for standby target:

    • Test transition from active to standby when active port is in use
    • Test invalid port combinations
 		},
+		{
+			name:                              "Negative port values",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              -1,
+			targetPort:                        -8000,
+			standbyTargetPort:                 -8001,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid port values"),
+		},
+		{
+			name:                              "Invalid port combination",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              0,
+			targetPort:                        0,
+			standbyTargetPort:                 8001,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid port combination: standby port without active port"),
+		},
 	}
pkg/spdk_test.go (1)

Line range hint 1-322: Consider documenting the upgrade architecture changes

The changes suggest a significant architectural shift in how upgrades are handled:

  1. The EngineCreate signature has been simplified
  2. The dedicated upgrade test has been removed
  3. The basic engine creation test remains unchanged

Consider:

  1. Adding documentation to explain the new upgrade architecture
  2. Providing test coverage for the new upgrade mechanism
  3. Including examples of how to perform live upgrades with the new implementation

Would you like help with:

  1. Creating documentation for the new upgrade architecture?
  2. Designing new test cases for the upgrade functionality?
pkg/spdk/engine.go (4)

108-110: Add test coverage for the new helper function.

The isNewEngine function lacks test coverage. Consider adding unit tests to verify the behavior for both new and existing engines.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


223-226: Consider adding debug level logging.

While the logging improvements are good, consider adding debug level logs to help with troubleshooting, especially around the decision points for initiator/target creation.

Also applies to: 306-315

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 223-226: pkg/spdk/engine.go#L223-L226
Added lines #L223 - L226 were not covered by tests


455-468: Document the retry mechanism for NVMe device info loading.

The retry mechanism is well implemented, but consider adding a comment explaining why the retry is necessary and what conditions might trigger retries.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 464-466: pkg/spdk/engine.go#L464-L466
Added lines #L464 - L466 were not covered by tests


[warning] 468-468: pkg/spdk/engine.go#L468
Added line #L468 was not covered by tests


Line range hint 208-519: Improve test coverage for new functionality.

Several new functions and code paths lack test coverage, including:

  • isNewEngine
  • checkInitiatorAndTargetCreationRequirements
  • Parts of the frontend handling logic
  • Target switchover logic

Consider adding comprehensive test cases to verify the behavior of these new components, especially around the upgrade scenarios.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 422-429: pkg/spdk/engine.go#L422-L429
Added lines #L422 - L429 were not covered by tests


[warning] 431-442: pkg/spdk/engine.go#L431-L442
Added lines #L431 - L442 were not covered by tests


[warning] 444-444: pkg/spdk/engine.go#L444
Added line #L444 was not covered by tests


[warning] 448-462: pkg/spdk/engine.go#L448-L462
Added lines #L448 - L462 were not covered by tests


[warning] 464-466: pkg/spdk/engine.go#L464-L466
Added lines #L464 - L466 were not covered by tests


[warning] 468-468: pkg/spdk/engine.go#L468
Added line #L468 was not covered by tests


[warning] 471-473: pkg/spdk/engine.go#L471-L473
Added lines #L471 - L473 were not covered by tests


[warning] 476-476: pkg/spdk/engine.go#L476
Added line #L476 was not covered by tests


[warning] 479-482: pkg/spdk/engine.go#L479-L482
Added lines #L479 - L482 were not covered by tests


[warning] 484-495: pkg/spdk/engine.go#L484-L495
Added lines #L484 - L495 were not covered by tests


[warning] 501-502: pkg/spdk/engine.go#L501-L502
Added lines #L501 - L502 were not covered by tests


[warning] 510-511: pkg/spdk/engine.go#L510-L511
Added lines #L510 - L511 were not covered by tests


[warning] 515-515: pkg/spdk/engine.go#L515
Added line #L515 was not covered by tests


[warning] 517-519: pkg/spdk/engine.go#L517-L519
Added lines #L517 - L519 were not covered by tests

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2adf734 and f654d45.

📒 Files selected for processing (5)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/api/types.go
  • pkg/client/client.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 208-214: pkg/spdk/engine.go#L208-L214
Added lines #L208 - L214 were not covered by tests


[warning] 216-220: pkg/spdk/engine.go#L216-L220
Added lines #L216 - L220 were not covered by tests


[warning] 223-226: pkg/spdk/engine.go#L223-L226
Added lines #L223 - L226 were not covered by tests


[warning] 228-231: pkg/spdk/engine.go#L228-L231
Added lines #L228 - L231 were not covered by tests


[warning] 263-264: pkg/spdk/engine.go#L263-L264
Added lines #L263 - L264 were not covered by tests


[warning] 269-269: pkg/spdk/engine.go#L269
Added line #L269 was not covered by tests


[warning] 271-271: pkg/spdk/engine.go#L271
Added line #L271 was not covered by tests


[warning] 282-285: pkg/spdk/engine.go#L282-L285
Added lines #L282 - L285 were not covered by tests


[warning] 292-292: pkg/spdk/engine.go#L292
Added line #L292 was not covered by tests


[warning] 302-303: pkg/spdk/engine.go#L302-L303
Added lines #L302 - L303 were not covered by tests


[warning] 306-315: pkg/spdk/engine.go#L306-L315
Added lines #L306 - L315 were not covered by tests


[warning] 401-401: pkg/spdk/engine.go#L401
Added line #L401 was not covered by tests


[warning] 411-413: pkg/spdk/engine.go#L411-L413
Added lines #L411 - L413 were not covered by tests


[warning] 422-429: pkg/spdk/engine.go#L422-L429
Added lines #L422 - L429 were not covered by tests


[warning] 431-442: pkg/spdk/engine.go#L431-L442
Added lines #L431 - L442 were not covered by tests


[warning] 444-444: pkg/spdk/engine.go#L444
Added line #L444 was not covered by tests


[warning] 448-462: pkg/spdk/engine.go#L448-L462
Added lines #L448 - L462 were not covered by tests


[warning] 464-466: pkg/spdk/engine.go#L464-L466
Added lines #L464 - L466 were not covered by tests


[warning] 468-468: pkg/spdk/engine.go#L468
Added line #L468 was not covered by tests


[warning] 471-473: pkg/spdk/engine.go#L471-L473
Added lines #L471 - L473 were not covered by tests


[warning] 476-476: pkg/spdk/engine.go#L476
Added line #L476 was not covered by tests


[warning] 479-482: pkg/spdk/engine.go#L479-L482
Added lines #L479 - L482 were not covered by tests


[warning] 484-495: pkg/spdk/engine.go#L484-L495
Added lines #L484 - L495 were not covered by tests


[warning] 501-502: pkg/spdk/engine.go#L501-L502
Added lines #L501 - L502 were not covered by tests


[warning] 510-511: pkg/spdk/engine.go#L510-L511
Added lines #L510 - L511 were not covered by tests


[warning] 515-515: pkg/spdk/engine.go#L515
Added line #L515 was not covered by tests


[warning] 517-519: pkg/spdk/engine.go#L517-L519
Added lines #L517 - L519 were not covered by tests


[warning] 628-628: pkg/spdk/engine.go#L628
Added line #L628 was not covered by tests

🔇 Additional comments (6)
pkg/spdk_test.go (1)

250-322: ⚠️ Potential issue

Reconsider removing the upgrade test case

The commented-out test TestSPDKEngineCreateWithUpgradeRequired appears to contain valuable test coverage for upgrade scenarios, which aligns with the PR's objective of supporting live upgrades. Consider:

  1. Why was this test case removed?
  2. Is there alternative test coverage for the upgrade functionality?
  3. Should this test be adapted rather than removed to support the new upgrade implementation?

Let's check for other upgrade-related test coverage:

#!/bin/bash
# Description: Search for other upgrade-related tests that might provide coverage

# Search for upgrade-related test files and functions
rg -l "Test.*Upgrade" 

# Search for upgrade-related assertions
rg "c\.Assert.*[Uu]pgrade"
pkg/spdk/engine.go (5)

37-49: LGTM! New field addition looks good.

The StandbyTargetPort field is properly integrated with the existing port-related fields and follows the same type pattern.


208-214: LGTM! Good error handling and control flow.

The integration of checkInitiatorAndTargetCreationRequirements is well done with proper error handling and early returns.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 208-214: pkg/spdk/engine.go#L208-L214
Added lines #L208 - L214 were not covered by tests


400-401: LGTM! Good error handling and port management.

The frontend handling changes properly manage both regular and standby target ports with appropriate error handling.

Also applies to: 411-413

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 401-401: pkg/spdk/engine.go#L401
Added line #L401 was not covered by tests


123-128: ⚠️ Potential issue

Redundant condition check on e.StandbyTargetPort

The nested condition checking e.StandbyTargetPort != 0 is redundant since it's already checked in the outer if statement.

Apply this diff to fix the redundant condition:

if e.StandbyTargetPort != 0 {
-   if e.StandbyTargetPort != 0 {
    e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
    return e.getWithoutLock(), nil
-   }
}

Likely invalid or redundant comment.


2183-2187: LGTM! Good target switchover implementation.

The target switchover logic properly handles port management and state updates. The pod IP check is a good addition for ensuring correct target configuration.

Also applies to: 2230-2234

✅ Verification successful

Pod IP validation is properly implemented across the codebase

The GetIPForPod() usage is consistently implemented across the codebase with proper error handling. The target switchover logic correctly validates pod IP against target and initiator IPs to determine the appropriate instance creation. The implementation in the reviewed code segment aligns with the established pattern seen in other files (replica.go, backup.go) and is followed by proper instance handling logic.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the pod IP handling logic
ast-grep --pattern $'podIP, err := commonnet.GetIPForPod()
if err != nil {
  $$$
}
if podIP == targetIP {
  $$$
}'

Length of output: 115


Script:

#!/bin/bash
# Let's search for GetIPForPod usage and its surrounding context
rg -A 5 "GetIPForPod" --type go

# Also search for any IP comparison logic
rg -A 5 "podIP.*==" --type go

Length of output: 2957

pkg/spdk/engine_test.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
pkg/spdk/server.go (1)

878-880: Consider adding documentation for live upgrade feature.

Since this change introduces support for live upgrades, it would be helpful to add documentation explaining:

  • The live upgrade process
  • The role of initiator and target addresses
  • Requirements and prerequisites for live upgrades
pkg/spdk/engine.go (1)

45-45: Add documentation for the StandbyTargetPort field

Please add a comment explaining that this field is used for managing standby targets during live upgrades.

+	// StandbyTargetPort is the port used by the standby target instance during live upgrades
 	StandbyTargetPort int32
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f654d45 and 5d51017.

📒 Files selected for processing (3)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/spdk/engine_test.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 144-144: pkg/spdk/engine.go#L144
Added line #L144 was not covered by tests


[warning] 207-213: pkg/spdk/engine.go#L207-L213
Added lines #L207 - L213 were not covered by tests


[warning] 215-219: pkg/spdk/engine.go#L215-L219
Added lines #L215 - L219 were not covered by tests


[warning] 222-225: pkg/spdk/engine.go#L222-L225
Added lines #L222 - L225 were not covered by tests


[warning] 227-230: pkg/spdk/engine.go#L227-L230
Added lines #L227 - L230 were not covered by tests


[warning] 262-263: pkg/spdk/engine.go#L262-L263
Added lines #L262 - L263 were not covered by tests


[warning] 268-268: pkg/spdk/engine.go#L268
Added line #L268 was not covered by tests


[warning] 270-270: pkg/spdk/engine.go#L270
Added line #L270 was not covered by tests


[warning] 281-284: pkg/spdk/engine.go#L281-L284
Added lines #L281 - L284 were not covered by tests


[warning] 291-291: pkg/spdk/engine.go#L291
Added line #L291 was not covered by tests


[warning] 301-302: pkg/spdk/engine.go#L301-L302
Added lines #L301 - L302 were not covered by tests


[warning] 305-313: pkg/spdk/engine.go#L305-L313
Added lines #L305 - L313 were not covered by tests


[warning] 399-399: pkg/spdk/engine.go#L399
Added line #L399 was not covered by tests


[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests


[warning] 420-427: pkg/spdk/engine.go#L420-L427
Added lines #L420 - L427 were not covered by tests


[warning] 429-440: pkg/spdk/engine.go#L429-L440
Added lines #L429 - L440 were not covered by tests


[warning] 442-442: pkg/spdk/engine.go#L442
Added line #L442 was not covered by tests


[warning] 446-460: pkg/spdk/engine.go#L446-L460
Added lines #L446 - L460 were not covered by tests


[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 474-474: pkg/spdk/engine.go#L474
Added line #L474 was not covered by tests


[warning] 477-480: pkg/spdk/engine.go#L477-L480
Added lines #L477 - L480 were not covered by tests


[warning] 482-493: pkg/spdk/engine.go#L482-L493
Added lines #L482 - L493 were not covered by tests


[warning] 499-500: pkg/spdk/engine.go#L499-L500
Added lines #L499 - L500 were not covered by tests


[warning] 508-509: pkg/spdk/engine.go#L508-L509
Added lines #L508 - L509 were not covered by tests


[warning] 513-513: pkg/spdk/engine.go#L513
Added line #L513 was not covered by tests


[warning] 515-517: pkg/spdk/engine.go#L515-L517
Added lines #L515 - L517 were not covered by tests

🔇 Additional comments (5)
pkg/spdk/server.go (1)

879-879: LGTM! Verify live upgrade functionality.

The changes to the Create method call properly support engine live upgrades by passing the necessary parameters for initiator and target addresses.

Run the following script to verify the engine creation parameters:

✅ Verification successful

Let me gather more context about the engine creation implementation to verify the changes properly.


Let me search for the Create method implementation specifically since it wasn't found in the previous search.


Let me try one more search with a different pattern to locate the Create method implementation.


Based on the gathered evidence, I can now provide the final response:

The changes to engine creation are correctly implemented for live upgrade support

The code changes are verified to be correct and consistent with the live upgrade functionality:

  • The Create method properly handles initiator and target addresses as parameters
  • The implementation includes proper validation and switching logic for target addresses
  • The changes are integrated with the existing port allocation and engine management system

Key validations:

  • Proper validation of target address requirement
  • Handling of target switching via isSwitchOverTargetRequired
  • Safety check for local target with matching addresses
  • Proper connection/disconnection handling during target operations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the engine creation parameters and usage
# Test: Search for engine creation calls to ensure consistent parameter usage
rg -A 5 "e\.Create\(" 

Length of output: 371


Script:

#!/bin/bash
# Search for the Engine struct and Create method implementation
ast-grep --pattern 'type Engine struct {
  $$$
}'

echo "---"

ast-grep --pattern 'func ($_) Create($$$) {
  $$$
}'

Length of output: 7425


Script:

#!/bin/bash
# Search for the Create method implementation and its usage
rg -A 10 "func.*Create\(.*spdkClient.*ReplicaAddressMap" 

echo "---"

# Search for any tests related to the Create functionality
rg -A 5 "TestEngine.*Create"

Length of output: 104


Script:

#!/bin/bash
# Search for Create method in engine.go with broader context
rg -A 15 "func \(e \*Engine\) Create" pkg/spdk/engine.go

echo "---"

# Search for the usage of InitiatorAddress and TargetAddress in the codebase
rg "InitiatorAddress|TargetAddress" pkg/spdk/

Length of output: 2137

pkg/spdk/engine.go (4)

Line range hint 2181-2232: Enhance validation and error handling for target switchover

The target switchover logic is critical for live upgrades. Consider adding:

  1. Validation for the target port range
  2. Retries for port allocation failures
  3. Rollback mechanism for failed switchovers
  4. Metrics/events for monitoring switchover operations

Consider implementing a state machine pattern to manage the switchover lifecycle:

  1. PREPARING -> SWITCHING -> COMPLETED/FAILED
  2. Add metrics for each state transition
  3. Implement automatic rollback on failure
#!/bin/bash
# Check port validation
rg "ValidatePort|PortRange|port.*valid" 

# Check retry/rollback mechanisms
rg "Retry|Rollback|recover|revert" 

Line range hint 1-2232: Improve overall test coverage

The code changes implement critical functionality for live upgrades but have significant gaps in test coverage. Please add:

  1. Integration tests for the live upgrade flow
  2. Unit tests for new helper functions
  3. Error scenario tests
  4. Performance benchmarks for the switchover process
#!/bin/bash
# Check current test coverage
fd -e go -e test 
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 420-427: pkg/spdk/engine.go#L420-L427
Added lines #L420 - L427 were not covered by tests


[warning] 429-440: pkg/spdk/engine.go#L429-L440
Added lines #L429 - L440 were not covered by tests


[warning] 442-442: pkg/spdk/engine.go#L442
Added line #L442 was not covered by tests


[warning] 446-460: pkg/spdk/engine.go#L446-L460
Added lines #L446 - L460 were not covered by tests


[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 474-474: pkg/spdk/engine.go#L474
Added line #L474 was not covered by tests


[warning] 477-480: pkg/spdk/engine.go#L477-L480
Added lines #L477 - L480 were not covered by tests


[warning] 482-493: pkg/spdk/engine.go#L482-L493
Added lines #L482 - L493 were not covered by tests


[warning] 499-500: pkg/spdk/engine.go#L499-L500
Added lines #L499 - L500 were not covered by tests


[warning] 508-509: pkg/spdk/engine.go#L508-L509
Added lines #L508 - L509 were not covered by tests


[warning] 513-513: pkg/spdk/engine.go#L513
Added line #L513 was not covered by tests


[warning] 515-517: pkg/spdk/engine.go#L515-L517
Added lines #L515 - L517 were not covered by tests


123-127: ⚠️ Potential issue

Remove redundant condition check

The nested condition checking e.StandbyTargetPort != 0 is redundant as it's already checked in the outer if statement.

-			if e.StandbyTargetPort != 0 {
-				e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-			} else {
-				targetCreationRequired = true
-			}
+			e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)

Likely invalid or redundant comment.


409-411: Add test coverage for frontend handling

The frontend handling logic, particularly around standby target creation and error handling, lacks test coverage. Please add tests to verify:

  1. Standby target creation logic
  2. Error handling scenarios
  3. Endpoint and logging updates

Also applies to: 420-440

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests

pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

45-45: Add documentation for StandbyTargetPort field

To improve code readability and maintain consistency, consider adding a comment to document the purpose of the new StandbyTargetPort field in the Engine struct.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5d51017 and 4469b89.

📒 Files selected for processing (2)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/spdk/engine_test.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 144-144: pkg/spdk/engine.go#L144
Added line #L144 was not covered by tests


[warning] 207-213: pkg/spdk/engine.go#L207-L213
Added lines #L207 - L213 were not covered by tests


[warning] 215-219: pkg/spdk/engine.go#L215-L219
Added lines #L215 - L219 were not covered by tests


[warning] 222-225: pkg/spdk/engine.go#L222-L225
Added lines #L222 - L225 were not covered by tests


[warning] 227-230: pkg/spdk/engine.go#L227-L230
Added lines #L227 - L230 were not covered by tests


[warning] 262-263: pkg/spdk/engine.go#L262-L263
Added lines #L262 - L263 were not covered by tests


[warning] 268-268: pkg/spdk/engine.go#L268
Added line #L268 was not covered by tests


[warning] 270-270: pkg/spdk/engine.go#L270
Added line #L270 was not covered by tests


[warning] 281-284: pkg/spdk/engine.go#L281-L284
Added lines #L281 - L284 were not covered by tests


[warning] 291-291: pkg/spdk/engine.go#L291
Added line #L291 was not covered by tests


[warning] 301-302: pkg/spdk/engine.go#L301-L302
Added lines #L301 - L302 were not covered by tests


[warning] 305-313: pkg/spdk/engine.go#L305-L313
Added lines #L305 - L313 were not covered by tests


[warning] 399-399: pkg/spdk/engine.go#L399
Added line #L399 was not covered by tests


[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests


[warning] 420-427: pkg/spdk/engine.go#L420-L427
Added lines #L420 - L427 were not covered by tests


[warning] 429-440: pkg/spdk/engine.go#L429-L440
Added lines #L429 - L440 were not covered by tests


[warning] 442-442: pkg/spdk/engine.go#L442
Added line #L442 was not covered by tests


[warning] 446-460: pkg/spdk/engine.go#L446-L460
Added lines #L446 - L460 were not covered by tests


[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 474-474: pkg/spdk/engine.go#L474
Added line #L474 was not covered by tests


[warning] 477-480: pkg/spdk/engine.go#L477-L480
Added lines #L477 - L480 were not covered by tests


[warning] 482-493: pkg/spdk/engine.go#L482-L493
Added lines #L482 - L493 were not covered by tests


[warning] 499-500: pkg/spdk/engine.go#L499-L500
Added lines #L499 - L500 were not covered by tests


[warning] 508-509: pkg/spdk/engine.go#L508-L509
Added lines #L508 - L509 were not covered by tests


[warning] 513-513: pkg/spdk/engine.go#L513
Added line #L513 was not covered by tests


[warning] 515-517: pkg/spdk/engine.go#L515-L517
Added lines #L515 - L517 were not covered by tests

🔇 Additional comments (4)
pkg/spdk/engine.go (4)

215-220: 🛠️ Refactor suggestion

Verify initialization logic for new engines

In the Create method, when isNewEngine(e) returns true, the code initializes e.IP, e.TargetIP, and potentially e.StandbyTargetPort. Ensure that all necessary fields are correctly initialized to prevent issues with uninitialized or stale data.

Consider initializing e.StandbyTargetPort explicitly if required.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 215-219: pkg/spdk/engine.go#L215-L219
Added lines #L215 - L219 were not covered by tests


409-411: ⚠️ Potential issue

Review logic for determining standbyTargetCreationRequired

The condition if e.Port != 0 && e.TargetPort == 0 sets standbyTargetCreationRequired to true. Confirm that this condition accurately reflects the scenarios where a standby target creation is needed, especially when e.StandbyTargetPort may have a value.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests


2228-2231: ⚠️ Potential issue

Potential issue when updating target ports during switchover

In the SwitchOverTarget function, setting e.TargetPort = targetPort and resetting e.StandbyTargetPort = 0 when targetIP == podIP could lead to issues if StandbyTargetPort should be preserved in certain cases. Please verify if this logic aligns with the intended behavior during target switchover.


123-127: ⚠️ Potential issue

Ensure correct handling of StandbyTargetPort condition

The condition within the if statement checks if e.StandbyTargetPort != 0, and then the else block sets targetCreationRequired = true. This logic might skip target creation when it's actually needed. Please verify that the conditions correctly reflect the intended behavior.

Run the following script to check for redundant or conflicting conditions involving StandbyTargetPort:

✅ Verification successful

Let me gather more context about the surrounding code to understand the complete logic flow.


Based on the context gathered, I can now provide the final response:

The StandbyTargetPort condition is correctly implemented

The logic is correct and serves its intended purpose. Here's why:

  • When StandbyTargetPort != 0, it means a standby target instance already exists, so skipping target creation is the right behavior
  • When StandbyTargetPort == 0, setting targetCreationRequired = true is correct as no standby target exists yet
  • This aligns with the engine's lifecycle management where StandbyTargetPort is used to track standby target instances and is properly reset to 0 during target switchover (as seen in the code)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for conditions involving 'StandbyTargetPort' that might lead to logical errors.

# Find all instances where 'StandbyTargetPort' is compared
rg 'if.*StandbyTargetPort\s*(!=|==)\s*0' -A 3

# Check assignments to 'targetCreationRequired'
rg 'targetCreationRequired\s*=' -A 2

Length of output: 924


Script:

#!/bin/bash
# Get more context around the StandbyTargetPort usage
rg "StandbyTargetPort" -B 5 -A 10

# Get the complete function context where this condition exists
ast-grep --pattern 'func ($_ *Engine) $_($$$) $_ {
  $$$
  if $_.StandbyTargetPort != 0 {
    $$$
  }
  $$$
}'

Length of output: 11653

pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
pkg/spdk/engine.go (4)

45-45: Add field documentation for StandbyTargetPort

Please add a comment explaining the purpose of this field and its role in live upgrades.

+	// StandbyTargetPort is the port used by the standby target instance during live upgrades
 	StandbyTargetPort int32

128-130: Improve error message clarity

The error message could be more descriptive by including the current state information.

-			err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+			err = fmt.Errorf("invalid state for engine %s creation: initiator port %v, target port %v with initiator address %v and target address %v", e.Name, e.Port, e.TargetPort, initiatorIP, targetIP)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


262-263: Enhance TODO comment specificity

The TODO comment should be more specific about what aspects of the log message need improvement.

-		// TODO: improve the log message
+		// TODO: improve the log message to include RAID level and configuration details
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 262-263: pkg/spdk/engine.go#L262-L263
Added lines #L262 - L263 were not covered by tests


456-466: Extract retry logic into helper function

Consider extracting this retry logic into a reusable helper function to improve code maintainability.

+func (e *Engine) retryLoadNVMeDeviceInfo(initiator *nvme.Initiator) error {
+	for r := 0; r < maxNumRetries; r++ {
+		err := initiator.LoadNVMeDeviceInfo(initiator.TransportAddress, initiator.TransportServiceID, initiator.SubsystemNQN)
+		if err == nil {
+			e.log.Infof("Loaded NVMe device info for engine")
+			return nil
+		}
+		if !strings.Contains(err.Error(), "failed to get devices") {
+			return errors.Wrapf(err, "failed to load NVMe device info for engine %v", e.Name)
+		}
+		time.Sleep(retryInterval)
+	}
+	return fmt.Errorf("failed to load NVMe device info after %d retries", maxNumRetries)
+}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4469b89 and fe16578.

📒 Files selected for processing (3)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/spdk/engine_test.go
  • pkg/spdk/server.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 144-144: pkg/spdk/engine.go#L144
Added line #L144 was not covered by tests


[warning] 207-213: pkg/spdk/engine.go#L207-L213
Added lines #L207 - L213 were not covered by tests


[warning] 215-219: pkg/spdk/engine.go#L215-L219
Added lines #L215 - L219 were not covered by tests


[warning] 222-225: pkg/spdk/engine.go#L222-L225
Added lines #L222 - L225 were not covered by tests


[warning] 227-230: pkg/spdk/engine.go#L227-L230
Added lines #L227 - L230 were not covered by tests


[warning] 262-263: pkg/spdk/engine.go#L262-L263
Added lines #L262 - L263 were not covered by tests


[warning] 268-268: pkg/spdk/engine.go#L268
Added line #L268 was not covered by tests


[warning] 270-270: pkg/spdk/engine.go#L270
Added line #L270 was not covered by tests


[warning] 281-284: pkg/spdk/engine.go#L281-L284
Added lines #L281 - L284 were not covered by tests


[warning] 291-291: pkg/spdk/engine.go#L291
Added line #L291 was not covered by tests


[warning] 301-302: pkg/spdk/engine.go#L301-L302
Added lines #L301 - L302 were not covered by tests


[warning] 305-313: pkg/spdk/engine.go#L305-L313
Added lines #L305 - L313 were not covered by tests


[warning] 399-399: pkg/spdk/engine.go#L399
Added line #L399 was not covered by tests


[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests


[warning] 420-427: pkg/spdk/engine.go#L420-L427
Added lines #L420 - L427 were not covered by tests


[warning] 429-440: pkg/spdk/engine.go#L429-L440
Added lines #L429 - L440 were not covered by tests


[warning] 442-442: pkg/spdk/engine.go#L442
Added line #L442 was not covered by tests


[warning] 446-460: pkg/spdk/engine.go#L446-L460
Added lines #L446 - L460 were not covered by tests


[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 474-474: pkg/spdk/engine.go#L474
Added line #L474 was not covered by tests


[warning] 477-480: pkg/spdk/engine.go#L477-L480
Added lines #L477 - L480 were not covered by tests


[warning] 482-493: pkg/spdk/engine.go#L482-L493
Added lines #L482 - L493 were not covered by tests


[warning] 499-500: pkg/spdk/engine.go#L499-L500
Added lines #L499 - L500 were not covered by tests


[warning] 508-509: pkg/spdk/engine.go#L508-L509
Added lines #L508 - L509 were not covered by tests


[warning] 513-513: pkg/spdk/engine.go#L513
Added line #L513 was not covered by tests


[warning] 515-517: pkg/spdk/engine.go#L515-L517
Added lines #L515 - L517 were not covered by tests

🔇 Additional comments (1)
pkg/spdk/engine.go (1)

2228-2231: Add test coverage for target switching logic

This critical section handling standby target ports during live upgrades lacks test coverage. Please add test cases to verify:

  1. Target port assignment when target IP matches pod IP
  2. Standby target port reset
  3. Error cases
#!/bin/bash
# Check for existing test coverage
rg -A 5 "TestSwitchOverTarget" 

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine_test.go (2)

97-99: Enable parallel test execution support.

Consider making the tests parallel-safe by using c.Parallel() at the start of each test case. This would require moving the test case execution into a separate function to ensure proper isolation:

-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, tc := range testCases {
+		testCase := tc // capture range variable
+		c.Run(testCase.name, func(c *C) {
+			c.Parallel()

11-11: Document test scenarios and their relationship to live upgrade requirements.

To better align with the PR's objective of supporting live upgrades, consider adding documentation that:

  1. Explains how each test case relates to the live upgrade workflow
  2. Documents the assumptions and preconditions for each scenario
  3. Maps test cases to specific upgrade requirements

Add a comment block before the test function:

// TestCheckInitiatorAndTargetCreationRequirements validates the engine's ability
// to handle live upgrades by testing various scenarios:
// 1. Initial state: Only active target
// 2. Transition state: Both active and standby targets
// 3. Final state: New active target (previously standby)
// 
// Each scenario verifies:
// - Correct port allocation
// - Proper instance creation flags
// - Error handling for invalid configurations
pkg/spdk/engine.go (3)

108-110: Add test coverage for isNewEngine

While the helper function is well-structured, it currently lacks test coverage. Consider adding test cases to verify:

  • New engine (all fields empty/zero)
  • Existing engine with various field combinations

112-142: Simplify complex branching logic

The function has multiple nested conditions that could be simplified for better maintainability. Consider breaking down the logic into smaller, more focused functions:

+func (e *Engine) shouldCreateInitiator(podIP, initiatorIP string) bool {
+    return podIP == initiatorIP
+}

+func (e *Engine) shouldCreateTarget(podIP, targetIP string) bool {
+    return podIP == targetIP && e.StandbyTargetPort == 0
+}

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
-    // Current complex logic
+    initiatorCreationRequired := e.shouldCreateInitiator(podIP, initiatorIP)
+    targetCreationRequired := e.shouldCreateTarget(podIP, targetIP)
+    
+    if !initiatorCreationRequired && !targetCreationRequired {
+        return false, false, fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
+    }
+    
+    return initiatorCreationRequired, targetCreationRequired, nil
 }

2181-2184: Add error handling for IP retrieval

Consider adding retry logic for IP retrieval as network operations can be flaky:

+func getPodIPWithRetry(maxRetries int, retryInterval time.Duration) (string, error) {
+    var lastErr error
+    for i := 0; i < maxRetries; i++ {
+        if ip, err := commonnet.GetIPForPod(); err == nil {
+            return ip, nil
+        } else {
+            lastErr = err
+            time.Sleep(retryInterval)
+        }
+    }
+    return "", fmt.Errorf("failed to get pod IP after %d retries: %v", maxRetries, lastErr)
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fe16578 and be48fa6.

📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/server.go
🔇 Additional comments (2)
pkg/spdk_test.go (1)

Line range hint 1-1354: Verify test coverage after removal of multi-threaded tests.

The removal of TestSPDKMultipleThread and TestSPDKMultipleThreadSnapshotOpsAndRebuilding functions could impact test coverage for concurrent operations. Please confirm:

  1. Are there equivalent tests elsewhere that cover these scenarios?
  2. Will new tests be added to verify concurrent operations with live upgrades?

Let's check for other test files that might cover these scenarios:

#!/bin/bash
# Description: Look for other test files that might cover concurrent operations

# Search for test files with concurrent operation tests
rg -l "sync.WaitGroup|go func\(\)" --type go --glob "*_test.go"

# Search for snapshot operation tests
rg -l "TestSnapshot|snapshot.*test" --type go --glob "*_test.go"
pkg/spdk/engine.go (1)

37-49: LGTM: Engine struct field addition

The addition of StandbyTargetPort field to track standby target instance ports during live upgrades is well-placed and properly typed.

pkg/spdk/engine_test.go Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
pkg/spdk/engine.go (3)

37-49: Add documentation for the new StandbyTargetPort field

The new field StandbyTargetPort should be documented to explain its purpose in the live upgrade process and its relationship with other port fields (Port and TargetPort).

Add a comment above the field:

 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
+	// StandbyTargetPort is used during live upgrades to temporarily store the port
+	// of a standby target instance before switching over
 	StandbyTargetPort int32

116-127: Simplify nested conditions

The nested conditions checking StandbyTargetPort can be simplified to improve readability and reduce complexity.

 if podIP == initiatorIP && podIP == targetIP {
   if e.Port == 0 && e.TargetPort == 0 {
     e.log.Info("Creating both initiator and target instances")
     initiatorCreationRequired = true
     targetCreationRequired = true
   } else if e.Port != 0 && e.TargetPort == 0 {
     e.log.Info("Creating a target instance")
-    if e.StandbyTargetPort != 0 {
-      e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-    } else {
-      targetCreationRequired = true
-    }
+    targetCreationRequired = e.StandbyTargetPort == 0
+    if !targetCreationRequired {
+      e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+    }
   } else {

693-698: Improve error handling and state validation

The commented-out IP retrieval code and subsequent validation could be improved:

-    // podIP, err := commonnet.GetIPForPod()
-    // if err != nil {
-    //     return err
-    // }
-    if e.IP != e.TargetIP {
-        return nil
-    }
+    if e.IP == "" || e.TargetIP == "" {
+        return fmt.Errorf("invalid engine state: IP or TargetIP is empty")
+    }
+    if e.IP != e.TargetIP {
+        e.log.Debugf("Skipping validation for non-local target (IP: %s, TargetIP: %s)", e.IP, e.TargetIP)
+        return nil
+    }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between be48fa6 and 471a59d.

📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (14 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/engine_test.go
  • pkg/spdk/server.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 108-109: pkg/spdk/engine.go#L108-L109
Added lines #L108 - L109 were not covered by tests


[warning] 128-130: pkg/spdk/engine.go#L128-L130
Added lines #L128 - L130 were not covered by tests


[warning] 144-144: pkg/spdk/engine.go#L144
Added line #L144 was not covered by tests


[warning] 207-213: pkg/spdk/engine.go#L207-L213
Added lines #L207 - L213 were not covered by tests


[warning] 215-219: pkg/spdk/engine.go#L215-L219
Added lines #L215 - L219 were not covered by tests


[warning] 222-225: pkg/spdk/engine.go#L222-L225
Added lines #L222 - L225 were not covered by tests


[warning] 227-230: pkg/spdk/engine.go#L227-L230
Added lines #L227 - L230 were not covered by tests


[warning] 262-263: pkg/spdk/engine.go#L262-L263
Added lines #L262 - L263 were not covered by tests


[warning] 268-268: pkg/spdk/engine.go#L268
Added line #L268 was not covered by tests


[warning] 270-270: pkg/spdk/engine.go#L270
Added line #L270 was not covered by tests


[warning] 281-284: pkg/spdk/engine.go#L281-L284
Added lines #L281 - L284 were not covered by tests


[warning] 291-291: pkg/spdk/engine.go#L291
Added line #L291 was not covered by tests


[warning] 301-302: pkg/spdk/engine.go#L301-L302
Added lines #L301 - L302 were not covered by tests


[warning] 305-313: pkg/spdk/engine.go#L305-L313
Added lines #L305 - L313 were not covered by tests


[warning] 399-399: pkg/spdk/engine.go#L399
Added line #L399 was not covered by tests


[warning] 409-411: pkg/spdk/engine.go#L409-L411
Added lines #L409 - L411 were not covered by tests


[warning] 420-427: pkg/spdk/engine.go#L420-L427
Added lines #L420 - L427 were not covered by tests


[warning] 429-440: pkg/spdk/engine.go#L429-L440
Added lines #L429 - L440 were not covered by tests


[warning] 442-442: pkg/spdk/engine.go#L442
Added line #L442 was not covered by tests


[warning] 446-460: pkg/spdk/engine.go#L446-L460
Added lines #L446 - L460 were not covered by tests


[warning] 462-464: pkg/spdk/engine.go#L462-L464
Added lines #L462 - L464 were not covered by tests


[warning] 466-466: pkg/spdk/engine.go#L466
Added line #L466 was not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 474-474: pkg/spdk/engine.go#L474
Added line #L474 was not covered by tests


[warning] 477-480: pkg/spdk/engine.go#L477-L480
Added lines #L477 - L480 were not covered by tests


[warning] 482-493: pkg/spdk/engine.go#L482-L493
Added lines #L482 - L493 were not covered by tests


[warning] 499-500: pkg/spdk/engine.go#L499-L500
Added lines #L499 - L500 were not covered by tests


[warning] 508-509: pkg/spdk/engine.go#L508-L509
Added lines #L508 - L509 were not covered by tests


[warning] 513-513: pkg/spdk/engine.go#L513
Added line #L513 was not covered by tests


[warning] 515-517: pkg/spdk/engine.go#L515-L517
Added lines #L515 - L517 were not covered by tests

🔇 Additional comments (6)
pkg/spdk_test.go (5)

271-271: LGTM! Parameter removal aligns with engine upgrade support.

The removal of the second false parameter from EngineCreate call is consistent with the PR objectives to support engine live upgrades.


354-354: LGTM! Consistent parameter update in test.

The change maintains consistency with the updated EngineCreate signature.


530-530: LGTM! Consistent parameter update in snapshot test.

The change maintains consistency with the updated EngineCreate signature in the snapshot operations test.


1228-1228: LGTM! Consistent parameter updates in revert helper.

Both EngineCreate calls in the revertSnapshot helper function have been updated consistently.

Also applies to: 1243-1243


1354-1354: LGTM! Consistent parameter update in target-only test.

The change maintains consistency with the updated EngineCreate signature in the target-only test case.

Let's verify that all EngineCreate calls have been updated consistently:

✅ Verification successful

All EngineCreate calls are consistent with the updated signature

The verification shows that all EngineCreate calls in the codebase follow the same pattern with 9 parameters (name, volumeName, frontend, size, replicaAddressMap, portCount, initiatorAddress, targetAddress, salvageRequested). The implementation in pkg/client/client.go confirms this signature, and all usage instances in pkg/spdk_test.go consistently follow this pattern. No instances of the old signature were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all EngineCreate calls have been updated to the new signature
# Expect: No instances of the old signature with two false parameters

# Search for any remaining EngineCreate calls with the old signature pattern
rg 'EngineCreate\([^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*\)' --type go

Length of output: 92


Script:

#!/bin/bash
# Let's search for all EngineCreate calls to verify the signature pattern
rg "EngineCreate\(" --type go -A 1

Length of output: 3616

pkg/spdk/engine.go (1)

2228-2232: 🛠️ Refactor suggestion

Add validation for target port updates

The port update logic should validate that the ports are within valid ranges:

+    if targetPort <= 0 || targetPort > 65535 {
+        return fmt.Errorf("invalid target port %d", targetPort)
+    }
     if targetIP == podIP {
         e.TargetPort = targetPort
         e.StandbyTargetPort = 0
     }

Likely invalid or redundant comment.

pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine_test.go (3)

97-117: Improve test execution and error handling

The test execution can be improved in several ways:

  1. Use test case name instead of index for better logging
  2. Add cleanup after each test
  3. Enhance error messages in assertions

Apply these changes:

-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, testCase := range testCases {
+		c.Logf("testing %s", testCase.name)
+		
+		// Clean up after each test
+		defer func() {
+			if engine != nil {
+				engine.log = nil
+			}
+		}()

 		engine := &Engine{

119-162: Add edge cases for engine state validation

The test cases should include additional scenarios:

  1. Edge cases:

    • Invalid IP formats
    • Special IP addresses (localhost, 0.0.0.0)
    • Maximum port values
  2. Upgrade scenarios:

    • Engine during upgrade state
    • Partially configured engine

Add these test cases:

 	}{
+		{
+			name: "Engine during upgrade with both ports configured",
+			engine: &Engine{
+				IP:                "",
+				TargetIP:          "",
+				TargetPort:        8000,
+				StandbyTargetPort: 8001,
+			},
+			expected: false,
+		},
+		{
+			name: "Engine with special IP (localhost)",
+			engine: &Engine{
+				IP:                "127.0.0.1",
+				TargetIP:          "",
+				StandbyTargetPort: 0,
+			},
+			expected: false,
+		},
 	}

163-168: Improve test iteration and error reporting

Similar to the previous test, improve the test execution:

  1. Use test case name instead of index
  2. Add more descriptive error messages

Apply these changes:

-	for testName, testCase := range testCases {
-		c.Logf("testing isNewEngine.%v", testName)
+	for _, testCase := range testCases {
+		c.Logf("testing %s", testCase.name)
 		result := testCase.engine.isNewEngine()
-		c.Assert(result, Equals, testCase.expected, Commentf("Test case '%s': unexpected result", testCase.name))
+		c.Assert(result, Equals, testCase.expected, 
+			Commentf("Test case '%s': got %v, expected %v", 
+				testCase.name, result, testCase.expected))
 	}
pkg/spdk/engine.go (2)

2177-2180: Add context to error return for better debugging

When returning an error from commonnet.GetIPForPod(), it's helpful to wrap it with additional context. This makes debugging easier by providing more information about where and why the error occurred.

Apply this diff:

podIP, err := commonnet.GetIPForPod()
if err != nil {
-    return err
+    return errors.Wrap(err, "failed to get pod IP")
}

477-480: Clarify log and error messages by rephrasing

The term "Blindly" in the log and error messages may be unclear or informal. Consider rephrasing to enhance clarity and professionalism.

Apply this diff:

-e.log.Info("Blindly stopping expose bdev for engine")
+e.log.Info("Stopping expose bdev for engine without prior checks")
if err := spdkClient.StopExposeBdev(e.Nqn); err != nil {
-    return errors.Wrapf(err, "failed to blindly stop expose bdev for engine %v", e.Name)
+    return errors.Wrapf(err, "failed to stop expose bdev for engine %v without prior checks", e.Name)
}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 471a59d and 6228344.

📒 Files selected for processing (2)
  • pkg/spdk/engine.go (15 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
🔇 Additional comments (1)
pkg/spdk/engine_test.go (1)

1-10: LGTM!

The package declaration and imports are appropriate for the testing requirements.

pkg/spdk/engine_test.go Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine_test.go (2)

99-100: Use test case name instead of index in logs

Replace the numeric index with the test case name for better readability and debugging:

-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, testCase := range testCases {
+		c.Logf("testing checkInitiatorAndTargetCreationRequirements: %s", testCase.name)

165-167: Use test case name instead of index in logs

Replace the numeric index with the test case name for better readability and debugging:

-	for testName, testCase := range testCases {
-		c.Logf("testing isNewEngine.%v", testName)
+	for _, testCase := range testCases {
+		c.Logf("testing isNewEngine: %s", testCase.name)
pkg/spdk/engine.go (3)

37-49: Add documentation for the StandbyTargetPort field

The new StandbyTargetPort field would benefit from documentation explaining its purpose in the live upgrade process and its relationship with Port and TargetPort.

 	Port              int32 // Port that initiator is connecting to
 	TargetIP          string
 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
-	StandbyTargetPort int32
+	StandbyTargetPort int32 // Port of the standby target used during live upgrades

112-142: Simplify complex branching logic

The checkInitiatorAndTargetCreationRequirements method has complex nested conditions that could be simplified for better readability and maintainability.

Consider restructuring the logic:

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
-	initiatorCreationRequired, targetCreationRequired := false, false
-	var err error
-
-	if podIP == initiatorIP && podIP == targetIP {
-		if e.Port == 0 && e.TargetPort == 0 {
-			e.log.Info("Creating both initiator and target instances")
-			initiatorCreationRequired = true
-			targetCreationRequired = true
-		} else if e.Port != 0 && e.TargetPort == 0 {
-			e.log.Info("Creating a target instance")
-			if e.StandbyTargetPort != 0 {
-				e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-			} else {
-				targetCreationRequired = true
-			}
-		} else {
-			e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", e.Port, e.TargetPort)
-		}
-	} else if podIP == initiatorIP {
-		e.log.Info("Creating an initiator instance")
-		initiatorCreationRequired = true
-	} else if podIP == targetIP {
-		e.log.Info("Creating a target instance")
-		targetCreationRequired = true
-	} else {
-		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
-	}
+	if podIP != initiatorIP && podIP != targetIP {
+		return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v",
+			e.Name, initiatorIP, targetIP)
+	}
+
+	initiatorCreationRequired := podIP == initiatorIP && e.Port == 0
+	targetCreationRequired := podIP == targetIP && e.TargetPort == 0 && e.StandbyTargetPort == 0
+
+	if initiatorCreationRequired && targetCreationRequired {
+		e.log.Info("Creating both initiator and target instances")
+	} else if initiatorCreationRequired {
+		e.log.Info("Creating an initiator instance")
+	} else if targetCreationRequired {
+		e.log.Info("Creating a target instance")
+	} else if e.StandbyTargetPort != 0 {
+		e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+	} else {
+		e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation",
+			e.Port, e.TargetPort)
+	}
 
-	return initiatorCreationRequired, targetCreationRequired, err
+	return initiatorCreationRequired, targetCreationRequired, nil
 }

2177-2180: Add error handling for GetIPForPod

The error from GetIPForPod should include more context about why the IP retrieval failed.

 	podIP, err := commonnet.GetIPForPod()
 	if err != nil {
-		return err
+		return errors.Wrapf(err, "failed to get pod IP for engine %s target switchover", e.Name)
 	}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6228344 and f7988f5.

📒 Files selected for processing (2)
  • pkg/spdk/engine.go (16 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
🔇 Additional comments (2)
pkg/spdk/engine.go (2)

2224-2228: 🛠️ Refactor suggestion

Add port range validation

The port assignment should validate that the port number is within the valid range (1-65535).

 	if targetIP == podIP {
+		if targetPort <= 0 || targetPort > 65535 {
+			return fmt.Errorf("invalid target port %d: must be between 1 and 65535", targetPort)
+		}
 		e.TargetPort = targetPort
 		e.StandbyTargetPort = 0
 	}

Likely invalid or redundant comment.


108-109: Add unit tests for isNewEngine

The isNewEngine helper function lacks test coverage. Consider adding test cases to verify:

  • New engine (all fields empty/zero)
  • Existing engine with various field combinations

pkg/spdk/engine_test.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
pkg/spdk/engine.go (3)

108-110: Add documentation for the isNewEngine helper function.

Consider adding a doc comment explaining the purpose and return value semantics of this helper function. This will help other developers understand when an engine is considered "new".

+// isNewEngine returns true if the engine instance has not been initialized with any IP addresses
+// or standby target port. This helps determine if this is the first time the engine is being created.
 func (e *Engine) isNewEngine() bool {
     return e.IP == "" && e.TargetIP == "" && e.StandbyTargetPort == 0
 }

429-444: Enhance error handling and logging in frontend handling.

The deferred cleanup logic could benefit from more detailed error logging and state validation.

 defer func() {
     if err == nil {
         if !standbyTargetCreationRequired {
+            e.log.Debug("Updating engine state with new frontend configuration")
             e.initiator = initiator
             e.dmDeviceBusy = dmDeviceBusy
             e.Endpoint = initiator.GetEndpoint()
             e.log = e.log.WithFields(logrus.Fields{
                 "endpoint":   e.Endpoint,
                 "port":       e.Port,
                 "targetPort": e.TargetPort,
             })
+        } else {
+            e.log.Debug("Skipping engine state update due to standby target creation")
         }
         e.log.Infof("Finished handling frontend for engine: %+v", e)
+    } else {
+        e.log.WithError(err).Error("Failed to handle frontend")
     }
 }()

37-49: Add high-level documentation for the Engine struct.

Consider adding comprehensive documentation for the Engine struct and its fields, particularly explaining the role of the new StandbyTargetPort field in the live upgrade process.

+// Engine represents a Longhorn SPDK engine instance that manages volume data access.
+// It handles the coordination between initiators and targets, supporting features like
+// live upgrades through standby target ports.
 type Engine struct {
     sync.RWMutex

     Name              string
     VolumeName        string
     SpecSize          uint64
     ActualSize        uint64
     IP                string
     Port              int32 // Port that initiator is connecting to
     TargetIP          string
     TargetPort        int32 // Port of the target that is used for letting initiator connect to
+    // StandbyTargetPort is used during live upgrades to temporarily store the port
+    // of a standby target instance, enabling seamless switchover between targets
     StandbyTargetPort int32
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0f18b59 and 9b9c0f4.

📒 Files selected for processing (1)
  • pkg/spdk/engine.go (18 hunks)

pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)
pkg/spdk/engine_test.go (2)

99-100: Improve test execution pattern.

Replace index-based iteration with value-based iteration for better readability and maintainability.

-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)

165-167: Improve test execution pattern.

Replace index-based iteration with value-based iteration for better readability and maintainability.

-	for testName, testCase := range testCases {
-		c.Logf("testing isNewEngine.%v", testName)
-		result := testCase.engine.isNewEngine()
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)
+		result := tc.engine.isNewEngine()
pkg/spdk_test.go (1)

1228-1228: Consider extracting revertSnapshot helper function.

The revertSnapshot function contains two identical EngineCreate calls that could be refactored into a helper function to improve maintainability and reduce code duplication.

+ func createEngine(spdkCli *client.SPDKClient, engineName, volumeName string, frontend types.Frontend, size uint64, replicaAddressMap map[string]string, revCount int32, ip string) (*api.Engine, error) {
+     return spdkCli.EngineCreate(engineName, volumeName, frontend, size, replicaAddressMap, revCount, ip, ip, false)
+ }

Also applies to: 1243-1243

pkg/spdk/engine.go (3)

112-142: Simplify complex creation requirements logic

The function has complex branching logic that could be simplified for better maintainability. Consider:

  1. Breaking down the conditions into smaller, well-named helper functions
  2. Adding detailed comments explaining each condition
  3. Using early returns to reduce nesting
 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    // Check if pod is neither initiator nor target
+    if podIP != initiatorIP && podIP != targetIP {
+        return false, false, fmt.Errorf(
+            "invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v",
+            e.Name, initiatorIP, targetIP)
+    }
+
+    // Case 1: Pod is both initiator and target
+    if podIP == initiatorIP && podIP == targetIP {
+        return e.checkLocalCreationRequirements()
+    }
+
+    // Case 2: Pod is only initiator
+    if podIP == initiatorIP {
+        e.log.Info("Creating an initiator instance")
+        return true, false, nil
+    }
+
+    // Case 3: Pod is only target
+    e.log.Info("Creating a target instance")
+    return false, true, nil
 }

+func (e *Engine) checkLocalCreationRequirements() (bool, bool, error) {
+    switch {
+    case e.Port == 0 && e.TargetPort == 0:
+        e.log.Info("Creating both initiator and target instances")
+        return true, true, nil
+    case e.Port != 0 && e.TargetPort == 0:
+        if e.StandbyTargetPort != 0 {
+            e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation",
+                e.StandbyTargetPort)
+            return false, false, nil
+        }
+        e.log.Info("Creating a target instance")
+        return false, true, nil
+    default:
+        e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation",
+            e.Port, e.TargetPort)
+        return false, false, nil
+    }
+}

398-399: Add validation for standby target creation

The standby target creation logic should include additional validation:

  1. Verify port ranges
  2. Validate state transitions
  3. Add logging for standby target creation
 func (e *Engine) handleFrontend(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap, portCount int32, targetAddress string,
     initiatorCreationRequired, targetCreationRequired bool) (err error) {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+    if portCount <= 0 {
+        return fmt.Errorf("invalid port count: %d", portCount)
+    }

     standbyTargetCreationRequired := false
     if e.Port != 0 && e.TargetPort == 0 {
+        e.log.Info("Standby target creation required")
         standbyTargetCreationRequired = true
     }

Also applies to: 409-411


Line range hint 2139-2236: Enhance error handling and recovery in target switchover

The target switchover logic should include:

  1. Atomic state transitions
  2. Rollback mechanism for failed operations
  3. Improved error context
 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    // Track original state for rollback
+    originalState := struct {
+        targetIP    string
+        targetPort  int32
+        standbyPort int32
+    }{
+        targetIP:    e.TargetIP,
+        targetPort:  e.TargetPort,
+        standbyPort: e.StandbyTargetPort,
+    }
+
+    // Rollback function
+    rollback := func() {
+        e.TargetIP = originalState.targetIP
+        e.TargetPort = originalState.targetPort
+        e.StandbyTargetPort = originalState.standbyPort
+        e.log.Info("Rolled back to original state after failed switchover")
+    }

     // ... existing code ...

     if err := e.connectTarget(newTargetAddress); err != nil {
+        rollback()
         return errors.Wrapf(err, "failed to connect target %s for engine %s", newTargetAddress, e.Name)
     }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9b9c0f4 and 4231599.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (18 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/server.go
🔇 Additional comments (7)
pkg/spdk/engine_test.go (1)

13-98: 🛠️ Refactor suggestion

Add test cases for port validation edge cases.

The test suite should include validation for edge cases related to port values:

  • Negative port values
  • Port values exceeding valid ranges
  • Port conflicts between active and standby targets

Add these test cases:

 	}{
+		{
+			name:                              "Negative port values",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              -1,
+			targetPort:                        -2,
+			standbyTargetPort:                 -3,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid port values"),
+		},
+		{
+			name:                              "Port conflict between target and standby",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              100,
+			targetPort:                        8000,
+			standbyTargetPort:                 8000,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("port conflict: standby target port must be different from active target port"),
+		},

Likely invalid or redundant comment.

pkg/spdk_test.go (3)

271-271: LGTM! Parameter removal aligns with engine upgrade support.

The removal of the upgradeRequired parameter from EngineCreate call is consistent with the PR objectives to support engine live upgrades.


354-354: Skip duplicate review.

This change is identical to the previous one, removing the upgradeRequired parameter.


530-530: Skip duplicate review.

This change is identical to the previous ones, removing the upgradeRequired parameter.

pkg/spdk/engine.go (3)

37-49: LGTM: Engine struct modification for standby target support

The addition of StandbyTargetPort field is well-structured and aligns with the existing port management pattern in the Engine struct.


108-110: Add unit tests for isNewEngine function

The function logic is correct, but it requires test coverage to ensure reliability across different engine states.


2406-2426: 🛠️ Refactor suggestion

Add validation for port release operations

The port release logic should:

  1. Validate port numbers before release
  2. Handle allocation bitmap errors gracefully
  3. Add logging for successful releases
 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
+    // Validate port numbers
+    if e.TargetPort < 0 || e.TargetPort > 65535 {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if e.StandbyTargetPort < 0 || e.StandbyTargetPort > 65535 {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort

     if releaseTargetPortRequired {
         if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
             return errors.Wrapf(err, "failed to release target port %d", e.TargetPort)
         }
+        e.log.Infof("Successfully released target port %d", e.TargetPort)
     }
     e.TargetPort = 0

     if releaseStandbyTargetPortRequired {
         if err := superiorPortAllocator.ReleaseRange(e.StandbyTargetPort, e.StandbyTargetPort); err != nil {
             return errors.Wrapf(err, "failed to release standby target port %d", e.StandbyTargetPort)
         }
+        e.log.Infof("Successfully released standby target port %d", e.StandbyTargetPort)
     }
     e.StandbyTargetPort = 0

     return nil
 }

Likely invalid or redundant comment.

pkg/spdk/engine_test.go Show resolved Hide resolved
pkg/spdk_test.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
pkg/api/types.go (1)

43-53: LGTM! Consider adding documentation.

The new NVMe-related types are well-structured and follow Go conventions. The fields in NvmeDevicePath correctly capture NVMe-oF transport parameters.

Consider adding GoDoc comments to document these types, especially explaining:

  • The purpose and usage of each field in NvmeDevicePath
  • The relationship between paths in NvmeSubsystem
pkg/spdk/engine.go (4)

37-50: Add documentation for the StandbyTargetPort field

Consider adding a comment explaining the purpose and lifecycle of the StandbyTargetPort field, similar to how Port and TargetPort are documented. This would help clarify when this port is used during live upgrades.

 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
+	StandbyTargetPort int32 // Port used for the standby target during live upgrades

117-147: Improve error message clarity and add test coverage

The function has complex branching logic that determines when to create initiator and target instances. Consider:

  1. The error message in the else branch could be more descriptive by including the pod IP
  2. The function would benefit from comprehensive test coverage
-		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation: pod IP %v does not match initiator IP %v or target IP %v", e.Name, podIP, initiatorIP, targetIP)

Line range hint 2166-2264: Enhance error handling in target switchover

The target switchover implementation is thorough but could benefit from:

  1. More detailed error messages when specific operations fail
  2. Structured logging for better debugging
  3. Consider adding metrics or events for monitoring switchover operations
 	if err := e.disconnectTarget(currentTargetAddress); err != nil {
-		return errors.Wrapf(err, "failed to disconnect target %s for engine %s", currentTargetAddress, e.Name)
+		return errors.Wrapf(err, "failed to disconnect target %s for engine %s: %v", currentTargetAddress, e.Name, err)
 	}
+	e.log.WithFields(logrus.Fields{
+		"currentTarget": currentTargetAddress,
+		"newTarget":     newTargetAddress,
+		"engineName":    e.Name,
+	}).Info("Successfully disconnected from current target")

Line range hint 2107-2114: Improve error messaging in Suspend and Resume

The error handling in these state transition functions could be more descriptive:

  1. Add context about the current state when errors occur
  2. Include more details in log messages
-				e.log.WithError(err).Warn("Failed to suspend engine")
+				e.log.WithFields(logrus.Fields{
+					"currentState": e.State,
+					"error":       err,
+				}).Error("Failed to suspend engine")

-				e.log.WithError(err).Warn("Failed to resume engine")
+				e.log.WithFields(logrus.Fields{
+					"currentState": e.State,
+					"error":       err,
+				}).Error("Failed to resume engine")

Also applies to: 2138-2145

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4231599 and 752b14e.

⛔ Files ignored due to path filters (5)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (5 hunks)
  • pkg/spdk/engine.go (20 hunks)
🔇 Additional comments (3)
pkg/api/types.go (2)

177-177: LGTM! Field mappings are correct.

The new field mappings in ProtoEngineToEngine are properly implemented and maintain consistency with existing patterns.

Also applies to: 186-186


144-144: LGTM! Verify field usage across the codebase.

The new fields support engine live upgrade functionality:

  • StandbyTargetPort for managing the standby target during upgrades
  • NvmeSubsystem for tracking NVMe paths

Let's verify the usage of these new fields:

Also applies to: 153-153

✅ Verification successful

LGTM! Both fields are properly integrated across the codebase

The verification shows comprehensive and consistent usage:

  • StandbyTargetPort is well-integrated:

    • Properly defined in protobuf and API types
    • Used in engine operations for managing standby targets
    • Covered by unit tests including various scenarios
    • Has proper release mechanisms in place
  • NvmeSubsystem shows complete implementation:

    • Defined in both API types and protobuf
    • Integrated into engine operations for NVMe path management
    • Used consistently for device path handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of new Engine fields across the codebase

# Check StandbyTargetPort usage
echo "=== StandbyTargetPort usage ==="
rg "StandbyTargetPort|standby_target_port" --type go

# Check NvmeSubsystem usage
echo "=== NvmeSubsystem usage ==="
rg "NvmeSubsystem|nvme_subsystem" --type go

Length of output: 6529

pkg/spdk/engine.go (1)

113-115: LGTM! Clear and concise helper function

The function provides a clear way to check if an engine is in its initial state.

pkg/api/types.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine.go (5)

113-115: Add documentation for isNewEngine method

Consider adding a doc comment explaining the criteria that determines a new engine state.

+// isNewEngine returns true if the engine is in its initial state,
+// determined by empty IP, TargetIP and zero StandbyTargetPort.
 func (e *Engine) isNewEngine() bool {
     return e.IP == "" && e.TargetIP == "" && e.StandbyTargetPort == 0
 }

117-147: Improve error messages in checkInitiatorAndTargetCreationRequirements

The error message in the else clause could be more descriptive by including the pod IP.

-        err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+        err = fmt.Errorf("invalid initiator and target addresses for engine %s creation: pod IP %v does not match initiator address %v or target address %v", e.Name, podIP, initiatorIP, targetIP)

Line range hint 2166-2263: Consider implementing transaction-like behavior for SwitchOverTarget

The method handles complex state changes that could benefit from a more structured approach to ensure atomicity and rollback capability.

Consider:

  1. Creating a transaction struct to track state changes
  2. Implementing rollback for each step
  3. Using defer to ensure cleanup on failure

Example approach:

type switchoverTransaction struct {
    oldIP          string
    oldPort        int32
    oldTargetPort  int32
    oldStandbyPort int32
}

func (t *switchoverTransaction) backup(e *Engine) {
    t.oldIP = e.IP
    t.oldPort = e.Port
    t.oldTargetPort = e.TargetPort
    t.oldStandbyPort = e.StandbyTargetPort
}

func (t *switchoverTransaction) rollback(e *Engine) {
    e.IP = t.oldIP
    e.Port = t.oldPort
    e.TargetPort = t.oldTargetPort
    e.StandbyPort = t.oldStandbyPort
}

2107-2108: Standardize error handling in Suspend/Resume methods

The error handling could be more consistent between the two methods.

Consider extracting common error handling logic:

func (e *Engine) handleOperationError(op string, err error) {
    if e.State != types.InstanceStateError {
        e.log.WithError(err).Errorf("Failed to %s engine", op)
    }
    e.ErrorMsg = err.Error()
}

Also applies to: 2138-2139


621-632: Extract NvmeSubsystem conversion to helper method

Consider extracting the NvmeSubsystem conversion logic to improve readability and reusability.

+func (e *Engine) convertNvmeSubsystem() *spdkrpc.NvmeSubsystem {
+    nvmeSubsystem := &spdkrpc.NvmeSubsystem{
+        Paths: map[string]*spdkrpc.NvmeDevicePath{},
+    }
+    for pathName, path := range e.NvmeSubsystem.Paths {
+        nvmeSubsystem.Paths[pathName] = &spdkrpc.NvmeDevicePath{
+            Trtype:  path.Trtype,
+            Traddr:  path.Traddr,
+            Trsvcid: path.Trsvcid,
+            SrcAddr: path.SrcAddr,
+            State:   path.State,
+        }
+    }
+    return nvmeSubsystem
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 752b14e and 60c7419.

⛔ Files ignored due to path filters (1)
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (5 hunks)
  • pkg/spdk/engine.go (20 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/types.go
🔇 Additional comments (3)
pkg/spdk/engine.go (3)

37-50: LGTM: Engine struct changes

The addition of StandbyTargetPort field is well-placed and logically grouped with other port-related fields.


2396-2428: 🛠️ Refactor suggestion

Enhance error handling in DeleteTarget

The method should consider implementing cleanup tracking to ensure resources are properly released even when errors occur.

Consider:

  1. Tracking successful operations
  2. Implementing partial cleanup on failure
  3. Adding debug logging for cleanup steps
 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    type cleanupState struct {
+        exposeBdevStopped bool
+        portsReleased     bool
+        raidDeleted       bool
+        replicasDisconnected map[string]bool
+    }
+    state := &cleanupState{
+        replicasDisconnected: make(map[string]bool),
+    }
+    defer func() {
+        if err != nil {
+            e.log.WithField("cleanupState", state).Error("Target deletion failed, partial cleanup state recorded")
+        }
+    }()

Likely invalid or redundant comment.


2433-2453: 🛠️ Refactor suggestion

Add port validation in releaseTargetAndStandbyTargetPorts

The method should validate port numbers before attempting to release them.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)
pkg/api/types.go (2)

43-53: Add documentation for the new NVMe types and fields.

The new types NvmeDevicePath and NvmeSubsystem would benefit from documentation explaining:

  • The purpose of each type
  • The meaning and expected values for each field
  • Any constraints or requirements for the fields

Add documentation like this:

+// NvmeDevicePath represents an NVMe over Fabrics (NVMeoF) transport path configuration
 type NvmeDevicePath struct {
+    // Trtype specifies the transport type (e.g., "tcp", "rdma")
     Trtype  string `json:"trtype"`
+    // Traddr specifies the transport address (e.g., IP address)
     Traddr  string `json:"traddr"`
     // ... (continue for other fields)
 }

144-144: Document the new Engine fields.

The new fields StandbyTargetPort and NvmeSubsystem would benefit from documentation explaining their roles in the live upgrade process.

Add documentation like this:

 type Engine struct {
     // ... existing fields ...
+    // StandbyTargetPort is the port number for the standby target used during live upgrades
     StandbyTargetPort int32                 `json:"standby_target_port"`
     // ... other fields ...
+    // NvmeSubsystem represents the NVMe subsystem configuration for this engine
     NvmeSubsystem     NvmeSubsystem         `json:"nvme_subsystem"`
 }

Also applies to: 153-153

pkg/spdk/engine.go (4)

113-115: Add documentation for isNewEngine function

Consider adding a function comment explaining the criteria for determining a new engine state.

+// isNewEngine returns true if the engine is in its initial state,
+// determined by empty IP, TargetIP and zero StandbyTargetPort.
 func (e *Engine) isNewEngine() bool {
     return e.IP == "" && e.TargetIP == "" && e.StandbyTargetPort == 0
 }

117-147: Simplify complex creation requirements logic

The function has good logging but could be more maintainable with simplified conditions.

Consider refactoring into smaller, more focused functions:

+func (e *Engine) shouldCreateBothInstances(podIP, initiatorIP, targetIP string) bool {
+    return podIP == initiatorIP && podIP == targetIP && e.Port == 0 && e.TargetPort == 0
+}

+func (e *Engine) shouldCreateTargetOnly(podIP, targetIP string) bool {
+    return podIP == targetIP && e.Port != 0 && e.TargetPort == 0 && e.StandbyTargetPort == 0
+}

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
     initiatorCreationRequired, targetCreationRequired := false, false
-    if podIP == initiatorIP && podIP == targetIP {
-        if e.Port == 0 && e.TargetPort == 0 {
-            // ... existing code
-        }
+    if e.shouldCreateBothInstances(podIP, initiatorIP, targetIP) {
+        e.log.Info("Creating both initiator and target instances")
+        return true, true, nil
+    }
+    if e.shouldCreateTargetOnly(podIP, targetIP) {
+        e.log.Info("Creating a target instance")
+        return false, true, nil
     }
     // ... rest of the function
 }

Line range hint 2166-2264: Improve error handling in target switchover

The target switchover logic should use a transaction-like pattern to ensure proper cleanup on failure.

Consider implementing a rollback mechanism:

+type switchoverState struct {
+    oldTargetAddress string
+    oldTargetPort    int32
+    oldStandbyPort   int32
+    disconnected     bool
+}
+
+func (e *Engine) rollbackSwitchover(state *switchoverState) error {
+    if state.disconnected {
+        if err := e.connectTarget(state.oldTargetAddress); err != nil {
+            return fmt.Errorf("rollback failed: %v", err)
+        }
+        e.TargetPort = state.oldTargetPort
+        e.StandbyTargetPort = state.oldStandbyPort
+    }
+    return nil
+}

 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    state := &switchoverState{
+        oldTargetPort:  e.TargetPort,
+        oldStandbyPort: e.StandbyTargetPort,
+    }
+    defer func() {
+        if err != nil {
+            if rollbackErr := e.rollbackSwitchover(state); rollbackErr != nil {
+                e.log.WithError(rollbackErr).Error("Switchover rollback failed")
+            }
+        }
+    }()

2107-2108: Enhance error logging in Suspend/Resume

The error logging could provide more context about the failure.

-                e.log.WithError(err).Warn("Failed to suspend engine")
+                e.log.WithError(err).WithFields(logrus.Fields{
+                    "state": e.State,
+                    "frontend": e.Frontend,
+                }).Warn("Engine suspension failed")

-                e.log.WithError(err).Warn("Failed to resume engine")
+                e.log.WithError(err).WithFields(logrus.Fields{
+                    "state": e.State,
+                    "frontend": e.Frontend,
+                }).Warn("Engine resume failed")

Also applies to: 2138-2139

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 60c7419 and 59d33f4.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (5 hunks)
  • pkg/spdk/engine.go (20 hunks)
🔇 Additional comments (5)
pkg/api/types.go (2)

157-168: LGTM! The NvmeSubsystem initialization and mapping look good.

The implementation:

  • Properly initializes the NvmeSubsystem with an empty path map
  • Correctly copies paths from the protobuf object
  • Safely handles the new StandbyTargetPort field

Also applies to: 179-179, 188-188


144-144: Verify the usage of new fields across the codebase.

Let's ensure the new fields are properly handled throughout the codebase.

Also applies to: 153-153, 157-168, 179-179, 188-188

✅ Verification successful

New fields are properly integrated across the codebase

The verification shows comprehensive and consistent usage of the new fields:

  • StandbyTargetPort is well-integrated with proper initialization, handling, and cleanup in pkg/spdk/engine.go
  • Proper port management through releaseTargetAndStandbyTargetPorts function
  • Complete test coverage in pkg/spdk/engine_test.go
  • NvmeSubsystem and related structures are properly defined and used across the codebase
  • No potential nil pointer dereference issues found
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of StandbyTargetPort and NvmeSubsystem fields

# Check for StandbyTargetPort field usage
echo "=== Checking StandbyTargetPort usage ==="
rg "StandbyTargetPort|standby_target_port" --type go

# Check for NvmeSubsystem related code
echo -e "\n=== Checking NvmeSubsystem usage ==="
rg "NvmeSubsystem|NvmeDevicePath|nvme_subsystem" --type go

# Check for potential nil pointer dereferences
echo -e "\n=== Checking for potential nil checks ==="
ast-grep --pattern 'if $x.NvmeSubsystem != nil { $$$ }'

Length of output: 9235

pkg/spdk/engine.go (3)

37-50: LGTM: Engine struct changes for standby target support

The addition of StandbyTargetPort field is well-structured and properly integrated with the existing fields. The field ordering and type are consistent with other port-related fields.


2433-2453: 🛠️ Refactor suggestion

Add validation for port release operations

The port release logic should validate port numbers before attempting to release them.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort

+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

Likely invalid or redundant comment.


2396-2428: 🛠️ Refactor suggestion

Improve cleanup handling in DeleteTarget

The target deletion should handle partial failures better and ensure proper cleanup order.

Consider implementing a cleanup tracker:

+type cleanupTracker struct {
+    exposeBdevStopped bool
+    portsReleased     bool
+    raidDeleted       bool
+    replicasDisconnected map[string]bool
+}

 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    tracker := &cleanupTracker{
+        replicasDisconnected: make(map[string]bool),
+    }
+    defer func() {
+        if err != nil {
+            e.rollbackFailedDeletion(tracker)
+        }
+    }()

Likely invalid or redundant comment.

pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
pkg/spdk/engine.go (2)

37-50: Add documentation for the new StandbyTargetPort field

The new field needs documentation to explain its purpose and relationship with other port fields (Port, TargetPort). Consider adding comments explaining:

  • When StandbyTargetPort is used vs TargetPort
  • Valid value ranges and constraints
  • State transitions during target switchover

117-147: Simplify complex branching logic

The function has nested if conditions that make it hard to follow. Consider refactoring to:

  1. Extract the conditions into well-named boolean variables
  2. Use early returns to reduce nesting
  3. Add debug logging for better observability
 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    isLocalPod := podIP == initiatorIP && podIP == targetIP
+    isInitiatorOnly := podIP == initiatorIP && podIP != targetIP
+    isTargetOnly := podIP != initiatorIP && podIP == targetIP
+
+    if !isLocalPod && !isInitiatorOnly && !isTargetOnly {
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
+    }
+
+    if isInitiatorOnly {
+        e.log.Info("Creating an initiator instance")
+        return true, false, nil
+    }
+
+    if isTargetOnly {
+        e.log.Info("Creating a target instance")
+        return false, true, nil
+    }
+
+    // Handle local pod case
     if e.Port == 0 && e.TargetPort == 0 {
         e.log.Info("Creating both initiator and target instances")
         return true, true, nil
     }
     
     if e.Port != 0 && e.TargetPort == 0 {
         e.log.Info("Creating a target instance")
         if e.StandbyTargetPort != 0 {
             e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
             return false, false, nil
         }
         return false, true, nil
     }
     
     e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", 
         e.Port, e.TargetPort)
     return false, false, nil
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 59d33f4 and 1774180.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (5 hunks)
  • pkg/spdk/engine.go (20 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/types.go
🔇 Additional comments (1)
pkg/spdk/engine.go (1)

2433-2453: 🛠️ Refactor suggestion

Add port validation and improve error handling

The port release logic should include validation and better error handling:

  1. Validate port numbers before release
  2. Add cleanup logic for partial failures
  3. Consider using constants for port ranges
 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    const (
+        minPort = 1
+        maxPort = 65535
+    )
+
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < minPort || e.TargetPort > maxPort) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < minPort || e.StandbyTargetPort > maxPort) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }
+
     // Release the target port
     if releaseTargetPortRequired {
         if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
             return errors.Wrapf(err, "failed to release target port %d", e.TargetPort)
         }
+        e.log.Infof("Released target port %d", e.TargetPort)
     }
     e.TargetPort = 0

Likely invalid or redundant comment.

pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
pkg/spdk/engine.go (2)

Line range hint 2197-2294: Add port validation in target switchover

The target switchover logic should validate port numbers before proceeding with the operation to prevent invalid states.

Add port validation:

 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    if newTargetPort <= 0 || newTargetPort > 65535 {
+        return fmt.Errorf("invalid target port %d for engine %s target switchover", newTargetPort, e.Name)
+    }

2427-2484: Improve error handling in DeleteTarget

The target deletion process should use a transaction-like pattern to ensure proper cleanup even in case of partial failures. Consider tracking the state of each operation and implementing rollback capabilities.

Consider implementing a cleanup tracker:

+type cleanupState struct {
+    exposeBdevStopped bool
+    portsReleased     bool
+    raidDeleted       bool
+    replicasDisconnected map[string]bool
+}
+
 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    state := &cleanupState{
+        replicasDisconnected: make(map[string]bool),
+    }
+    defer func() {
+        if err != nil {
+            e.rollbackFailedDeletion(state)
+        }
+    }()
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1774180 and bd9951c.

⛔ Files ignored due to path filters (8)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/jinzhu/copier/.gitignore is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/License is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/README.md is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/copier.go is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/errors.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (1)
  • pkg/spdk/engine.go (22 hunks)
🧰 Additional context used
🪛 golangci-lint (1.62.2)
pkg/spdk/engine.go

692-692: Error return value of copier.Copy is not checked

(errcheck)


668-668: SA4000: identical expressions on the left and right side of the '!=' operator

(staticcheck)

🪛 GitHub Check: CodeFactor
pkg/spdk/engine.go

[notice] 667-667: pkg/spdk/engine.go#L667
Redefinition of the built-in function new. (redefines-builtin-id)

🔇 Additional comments (1)
pkg/spdk/engine.go (1)

39-52: LGTM: Engine struct changes are well-structured

The addition of StandbyTargetPort field with proper documentation aligns well with the live upgrade support objective.

pkg/spdk/engine.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
pkg/api/types.go (1)

45-55: Consider adding documentation for the new types

The NvmeDevicePath and NvmeSubsystem types are well-structured and align with NVMe-oF specifications. Consider adding godoc comments to describe:

  • The purpose of each type
  • The significance of each field
  • Valid values for the State field

Example documentation:

// NvmeDevicePath represents an NVMe-oF connection path
// with transport and addressing information
type NvmeDevicePath struct {
    // Trtype specifies the NVMe transport type (e.g., "tcp", "rdma")
    Trtype  string                          `json:"trtype"`
    // ... (document other fields)
}
pkg/spdk/engine.go (4)

105-108: Add test coverage for NvmeSubsystem initialization.

The NvmeSubsystem initialization in NewEngine is not covered by tests. Consider adding test cases to verify proper initialization of the NvmeSubsystem and its Paths map.

Would you like me to help create test cases for the NvmeSubsystem initialization?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 105-108: pkg/spdk/engine.go#L105-L108
Added lines #L105 - L108 were not covered by tests


Line range hint 2211-2308: LGTM! Consider adding more detailed logging.

The SwitchOverTarget implementation has good error handling and rollback capabilities. Consider adding more detailed logging for better observability during the switchover process.

Add debug logs before critical operations:

+    e.log.Debugf("Current target state - IP: %s, Port: %d, StandbyPort: %d", e.TargetIP, e.TargetPort, e.StandbyTargetPort)
     if newTargetIP == podIP {
         e.TargetPort = newTargetPort
         e.StandbyTargetPort = 0
     } else {
         e.StandbyTargetPort = e.TargetPort
         e.TargetPort = 0
     }
+    e.log.Debugf("New target state - IP: %s, Port: %d, StandbyPort: %d", e.TargetIP, e.TargetPort, e.StandbyTargetPort)

2478-2498: Add port validation before release.

The releaseTargetAndStandbyTargetPorts function should validate port numbers before attempting to release them.

Add validation:

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort <= 0 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort <= 0 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

214-220: Add test coverage for critical engine operations.

Several critical operations lack test coverage, including:

  • Initiator and target creation requirements
  • Engine state management
  • Port allocation and release
  • Target switchover logic

Consider adding comprehensive test cases to verify these operations.

Would you like help creating test cases for these operations?

Also applies to: 222-226, 229-237

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 214-220: pkg/spdk/engine.go#L214-L220
Added lines #L214 - L220 were not covered by tests

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bd9951c and 99f98a7.

⛔ Files ignored due to path filters (12)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/jinzhu/copier/.gitignore is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/License is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/README.md is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/copier.go is excluded by !vendor/**
  • vendor/github.com/jinzhu/copier/errors.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (6 hunks)
  • pkg/spdk/engine.go (22 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/spdk/engine.go

[warning] 105-108: pkg/spdk/engine.go#L105-L108
Added lines #L105 - L108 were not covered by tests


[warning] 135-137: pkg/spdk/engine.go#L135-L137
Added lines #L135 - L137 were not covered by tests


[warning] 151-151: pkg/spdk/engine.go#L151
Added line #L151 was not covered by tests


[warning] 214-220: pkg/spdk/engine.go#L214-L220
Added lines #L214 - L220 were not covered by tests


[warning] 222-226: pkg/spdk/engine.go#L222-L226
Added lines #L222 - L226 were not covered by tests


[warning] 229-237: pkg/spdk/engine.go#L229-L237
Added lines #L229 - L237 were not covered by tests


[warning] 269-270: pkg/spdk/engine.go#L269-L270
Added lines #L269 - L270 were not covered by tests


[warning] 275-275: pkg/spdk/engine.go#L275
Added line #L275 was not covered by tests


[warning] 277-277: pkg/spdk/engine.go#L277
Added line #L277 was not covered by tests


[warning] 288-291: pkg/spdk/engine.go#L288-L291
Added lines #L288 - L291 were not covered by tests


[warning] 298-298: pkg/spdk/engine.go#L298
Added line #L298 was not covered by tests


[warning] 308-309: pkg/spdk/engine.go#L308-L309
Added lines #L308 - L309 were not covered by tests


[warning] 312-320: pkg/spdk/engine.go#L312-L320
Added lines #L312 - L320 were not covered by tests


[warning] 406-406: pkg/spdk/engine.go#L406
Added line #L406 was not covered by tests


[warning] 416-418: pkg/spdk/engine.go#L416-L418
Added lines #L416 - L418 were not covered by tests


[warning] 427-434: pkg/spdk/engine.go#L427-L434
Added lines #L427 - L434 were not covered by tests


[warning] 436-447: pkg/spdk/engine.go#L436-L447
Added lines #L436 - L447 were not covered by tests


[warning] 449-449: pkg/spdk/engine.go#L449
Added line #L449 was not covered by tests


[warning] 453-467: pkg/spdk/engine.go#L453-L467
Added lines #L453 - L467 were not covered by tests


[warning] 469-471: pkg/spdk/engine.go#L469-L471
Added lines #L469 - L471 were not covered by tests


[warning] 473-473: pkg/spdk/engine.go#L473
Added line #L473 was not covered by tests


[warning] 476-478: pkg/spdk/engine.go#L476-L478
Added lines #L476 - L478 were not covered by tests


[warning] 481-481: pkg/spdk/engine.go#L481
Added line #L481 was not covered by tests


[warning] 484-487: pkg/spdk/engine.go#L484-L487
Added lines #L484 - L487 were not covered by tests


[warning] 489-500: pkg/spdk/engine.go#L489-L500
Added lines #L489 - L500 were not covered by tests


[warning] 506-507: pkg/spdk/engine.go#L506-L507
Added lines #L506 - L507 were not covered by tests


[warning] 515-516: pkg/spdk/engine.go#L515-L516
Added lines #L515 - L516 were not covered by tests


[warning] 520-520: pkg/spdk/engine.go#L520
Added line #L520 was not covered by tests


[warning] 522-524: pkg/spdk/engine.go#L522-L524
Added lines #L522 - L524 were not covered by tests


[warning] 623-634: pkg/spdk/engine.go#L623-L634
Added lines #L623 - L634 were not covered by tests

🪛 GitHub Check: CodeFactor
pkg/spdk/engine.go

[notice] 667-667: pkg/spdk/engine.go#L667
Redefinition of the built-in function new. (redefines-builtin-id)

🔇 Additional comments (5)
pkg/api/types.go (3)

7-8: LGTM: Import changes are appropriate

The addition of helpertypes import is necessary for the NVMeControllerState type used in the new NvmeDevicePath struct.


159-172: LGTM: Robust conversion implementation

The conversion logic:

  • Properly initializes the NvmeSubsystem map
  • Includes appropriate nil checks
  • Creates deep copies of path objects
  • Correctly handles the new StandbyTargetPort field

Also applies to: 183-183, 192-192


146-146: Verify port allocation strategy for StandbyTargetPort

The addition of StandbyTargetPort suggests a high-availability setup. Please ensure:

  1. Port allocation doesn't conflict with existing TargetPort
  2. Port range validation is implemented in the engine creation logic

Also applies to: 155-155

✅ Verification successful

Based on the results, I can now generate the final response as there's sufficient evidence about the port allocation and validation strategy.

Port allocation for StandbyTargetPort is properly implemented

The codebase shows proper handling of StandbyTargetPort:

  • Port allocation uses the same superiorPortAllocator bitmap as the main TargetPort
  • Ports are properly released when no longer needed via ReleaseRange
  • Port conflicts are prevented as the code checks for existing port allocations
  • Port switchover logic correctly manages the transition between primary and standby ports
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for port allocation and validation logic
rg -A 5 "StandbyTargetPort.*=|standbyTargetPort.*=" 
rg -A 5 "func.*Create.*Engine"

Length of output: 6047

pkg/spdk/engine.go (2)

115-117: LGTM! Clean and focused helper function.

The isNewEngine helper function has clear logic and follows the single responsibility principle.


130-134: 🛠️ Refactor suggestion

Remove redundant condition check.

The condition check for e.StandbyTargetPort != 0 is redundant as it's already checked in the parent if statement.

Apply this diff to simplify the logic:

-            if e.StandbyTargetPort != 0 {
-                e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-            } else {
-                targetCreationRequired = true
-            }
+            if e.StandbyTargetPort != 0 {
+                e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+                return false, false, nil
+            }
+            targetCreationRequired = true

Likely invalid or redundant comment.

pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
pkg/spdk/engine.go (3)

37-46: Add documentation for the new StandbyTargetPort field.

The new field needs documentation to explain its purpose and usage in the context of live upgrades.

 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
-	StandbyTargetPort int32
+	StandbyTargetPort int32 // Port used for the standby target during live upgrades

112-142: Simplify complex branching logic.

The method has nested if conditions that make it hard to follow. Consider restructuring to reduce complexity and improve readability.

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
-    initiatorCreationRequired, targetCreationRequired := false, false
-    var err error
+    if podIP != initiatorIP && podIP != targetIP {
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
+    }
 
-    if podIP == initiatorIP && podIP == targetIP {
-        if e.Port == 0 && e.TargetPort == 0 {
-            e.log.Info("Creating both initiator and target instances")
-            initiatorCreationRequired = true
-            targetCreationRequired = true
-        } else if e.Port != 0 && e.TargetPort == 0 {
-            // ... existing code
-        }
-    } else if podIP == initiatorIP {
-        // ... existing code
-    }
+    initiatorCreationRequired := podIP == initiatorIP && e.Port == 0
+    targetCreationRequired := podIP == targetIP && e.TargetPort == 0 && e.StandbyTargetPort == 0
+
+    if initiatorCreationRequired && targetCreationRequired {
+        e.log.Info("Creating both initiator and target instances")
+    } else if initiatorCreationRequired {
+        e.log.Info("Creating an initiator instance")
+    } else if targetCreationRequired {
+        e.log.Info("Creating a target instance")
+    } else {
+        e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", 
+            e.Port, e.TargetPort)
+    }
 
-    return initiatorCreationRequired, targetCreationRequired, err
+    return initiatorCreationRequired, targetCreationRequired, nil
 }

Line range hint 2139-2236: Improve error handling and cleanup in target switchover.

The error handling should ensure proper cleanup of resources in all failure scenarios. Consider using a deferred cleanup function.

 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    type switchState struct {
+        oldTargetDisconnected bool
+        newTargetConnected    bool
+        deviceReloaded       bool
+    }
+    state := &switchState{}
+
+    defer func() {
+        if err != nil && state.oldTargetDisconnected && !state.newTargetConnected {
+            if errRollback := e.connectTarget(currentTargetAddress); errRollback != nil {
+                e.log.WithError(errRollback).Error("Failed to rollback target switchover")
+            }
+        }
+    }()

     // ... existing code

     if err := e.disconnectTarget(currentTargetAddress); err != nil {
         return errors.Wrapf(err, "failed to disconnect target %s for engine %s", currentTargetAddress, e.Name)
     }
+    state.oldTargetDisconnected = true

     if err := e.connectTarget(newTargetAddress); err != nil {
         return errors.Wrapf(err, "failed to connect target %s for engine %s", newTargetAddress, e.Name)
     }
+    state.newTargetConnected = true
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 99f98a7 and 6e0cb47.

⛔ Files ignored due to path filters (8)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (18 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/engine_test.go
  • pkg/spdk/server.go
🔇 Additional comments (7)
pkg/spdk_test.go (6)

354-354: Consistent API update for EngineCreate


530-530: Consistent API update for EngineCreate


1228-1228: Consistent API update for EngineCreate


1243-1243: Consistent API update for EngineCreate


1354-1354: Consistent API update for EngineCreate


271-271: API change: Simplified EngineCreate signature

The EngineCreate function signature has been updated to remove one boolean parameter, aligning with the PR's objective to support engine live upgrades.

Let's verify the consistency of this change across the codebase:

✅ Verification successful

Let me verify the function signature change by checking the actual implementation.


Let me try a different approach to find the EngineCreate function definition.


Let me check the proto file to understand the API changes better.


Based on the evidence gathered from the codebase:

  1. The client implementation in pkg/client/client.go shows the current EngineCreate signature with only one boolean parameter salvageRequested
  2. All test calls in pkg/spdk_test.go have been updated to use the new signature with a single boolean parameter
  3. No instances of the old signature (with two boolean parameters) were found in the codebase

API change verified: EngineCreate signature update is consistent

The removal of one boolean parameter from the EngineCreate function has been consistently applied across the codebase. All test cases have been properly updated to match the new signature, and there are no remaining instances of the old two-boolean parameter signature.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all EngineCreate calls have been updated
# Expected: No instances of old signature with two boolean parameters

# Search for any remaining old-style calls
rg "EngineCreate.*,.*,.*,.*false.*false" --type go

# Search for updated calls to ensure consistency
rg "EngineCreate.*,.*,.*,.*false[^,]" --type go

Length of output: 1091


Script:

#!/bin/bash
# Search for EngineCreate function definition
ast-grep --pattern 'func $_EngineCreate($$$) {
  $$$
}'

Length of output: 57


Script:

#!/bin/bash
# Search for EngineCreate function definition using ripgrep
rg "func.*EngineCreate" -A 5 --type go

Length of output: 10623


Script:

#!/bin/bash
# Search for EngineCreateRequest definition in proto files
rg "message EngineCreateRequest" -A 15 --type proto

Length of output: 83

pkg/spdk/engine.go (1)

2406-2426: 🛠️ Refactor suggestion

Add port validation in releaseTargetAndStandbyTargetPorts.

The method should validate port numbers before attempting to release them.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort

+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

     // ... existing code

Likely invalid or redundant comment.

pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
pkg/spdk_test.go (1)

1354-1354: Consider adding test coverage for live upgrade scenarios

While the code correctly updates the EngineCreate call, this test focuses only on target-related scenarios. Given that this PR implements live upgrade support, consider adding specific test cases to verify the live upgrade functionality.

Would you like me to help create additional test cases specifically for live upgrade scenarios?

pkg/spdk/engine.go (3)

37-49: Add documentation for the StandbyTargetPort field.

Consider adding a comment explaining the purpose and lifecycle of the StandbyTargetPort field, particularly its relationship with TargetPort during live upgrades.

 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
-	StandbyTargetPort int32
+	StandbyTargetPort int32 // Port used for the standby target during live upgrades

112-142: Simplify complex branching logic.

The function has nested if conditions that make the logic hard to follow. Consider refactoring to:

  1. Extract the conditions into well-named boolean variables
  2. Use early returns to reduce nesting
  3. Add debug logging for better observability
 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    isLocalPod := podIP == initiatorIP && podIP == targetIP
+    isInitiatorPod := podIP == initiatorIP
+    isTargetPod := podIP == targetIP
+
+    if !isInitiatorPod && !isTargetPod {
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
+    }
+
+    if !isLocalPod {
+        e.log.Infof("Creating %s instance", map[bool]string{true: "an initiator", false: "a target"}[isInitiatorPod])
+        return isInitiatorPod, isTargetPod, nil
+    }
+
+    if e.Port == 0 && e.TargetPort == 0 {
+        e.log.Info("Creating both initiator and target instances")
+        return true, true, nil
+    }
+
+    if e.Port != 0 && e.TargetPort == 0 {
+        if e.StandbyTargetPort != 0 {
+            e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+            return false, false, nil
+        }
+        e.log.Info("Creating a target instance")
+        return false, true, nil
+    }
+
+    e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", 
+        e.Port, e.TargetPort)
+    return false, false, nil
 }

Line range hint 2139-2240: Improve state tracking during switchover.

Consider implementing a transaction-like pattern to track the switchover state and ensure proper cleanup on failures. This would make the rollback logic more robust and easier to maintain.

+type switchoverState struct {
+    oldTargetDisconnected bool
+    newTargetConnected    bool
+    deviceReloaded       bool
+}
+
 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    state := &switchoverState{}
+    
+    defer func() {
+        if err != nil && state.oldTargetDisconnected && !state.newTargetConnected {
+            if errRollback := e.rollbackSwitchover(currentTargetAddress); errRollback != nil {
+                e.log.WithError(errRollback).Error("Failed to rollback target switchover")
+            }
+        }
+    }()
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6e0cb47 and 5b164d0.

📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (18 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/engine_test.go
  • pkg/spdk/server.go
🔇 Additional comments (6)
pkg/spdk_test.go (4)

271-271: LGTM: Parameter removal aligns with live upgrade support

The removal of the upgradeRequired parameter from EngineCreate is consistent with the PR's objective to support live upgrades.


354-354: LGTM: Consistent parameter update in test

The change maintains consistency with the updated EngineCreate signature during replica rebuilding tests.


530-530: LGTM: Consistent parameter update in snapshot test

The change maintains consistency with the updated EngineCreate signature in snapshot operation tests.


1228-1228: LGTM: Consistent parameter updates in revert operations

The changes maintain consistency with the updated EngineCreate signature in snapshot revert operations, both for empty frontend and previous frontend cases.

Also applies to: 1243-1243

pkg/spdk/engine.go (2)

482-498: ⚠️ Potential issue

Add port range validation.

The port allocation logic should validate that allocated ports are within valid ranges (1-65535) before using them.

 port, _, err = superiorPortAllocator.AllocateRange(portCount)
 if err != nil {
     return errors.Wrapf(err, "failed to allocate port for engine %v", e.Name)
 }
+if port <= 0 || port > 65535 {
+    return fmt.Errorf("invalid port %d allocated for engine %v: must be between 1 and 65535", port, e.Name)
+}
 e.log.Infof("Allocated port %v for engine", port)

Likely invalid or redundant comment.


2410-2430: 🛠️ Refactor suggestion

Add port validation before release.

The port release logic should validate port numbers before attempting to release them to prevent potential issues with invalid port numbers.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (14)
pkg/spdk/replica.go (1)

1325-1327: Clarify error message for better readability.

The error message in the fmt.Errorf call is repetitive and may confuse readers. Consider rephrasing it to improve clarity.

Suggested change:

-			return fmt.Errorf("found mismatching between the required dst bdev NVMe controller name %s and the expected dst controller name %s for replica %s rebuilding src attach", dstRebuildingLvolName, controllerName, r.Name)
+			return fmt.Errorf("mismatch between expected dst bdev NVMe controller name %s and actual controller name %s for replica %s during rebuilding source attach", dstRebuildingLvolName, controllerName, r.Name)
pkg/spdk/disk/nvme/nvme.go (1)

18-26: Add documentation for timeout constants and verify values for live upgrades.

The newly added constants lack documentation explaining their purpose and impact. Since these timeouts are crucial for NVMe controller behavior during live upgrades, please consider:

  1. Adding detailed comments explaining:

    • The purpose of each timeout
    • The implications of their values
    • The relationship between related timeouts (e.g., FastIOFail and TransportAck)
  2. Verifying if these timeout values are optimal for live upgrade scenarios

 const (
-	// Timeouts for disk bdev
+	// NVMe controller timeout configurations critical for disk operations and live upgrades
+
+	// Maximum time to wait before declaring a controller as lost
 	diskCtrlrLossTimeoutSec  = 30
+	// Delay between reconnection attempts
 	diskReconnectDelaySec    = 2
+	// Maximum time to wait before failing I/O operations when controller is unresponsive
 	diskFastIOFailTimeoutSec = 15
+	// Maximum time to wait for transport layer acknowledgment
 	diskTransportAckTimeout  = 14
+	// Interval for NVMe keep-alive messages in milliseconds
 	diskKeepAliveTimeoutMs   = 10000
+	// Multipath configuration for NVMe devices
 	diskMultipath            = "disable"
 )
pkg/spdk/util.go (1)

Line range hint 80-99: LGTM! The timeout parameterization enhances upgrade reliability.

The changes improve the function's flexibility by allowing configurable timeouts, which is crucial for reliable live upgrades. The parameterization of ctrlrLossTimeout and fastIOFailTimeoutSec enables fine-tuned control over connection handling during upgrade scenarios.

Consider documenting the recommended timeout values for different scenarios:

  • Normal operations
  • Live upgrade operations
  • Recovery scenarios

This will help operators configure appropriate values based on their use case.

pkg/spdk/restore.go (3)

Line range hint 124-143: Add error handling and cleanup for NVMe initiator operations

The NVMe initiator creation and startup lacks proper cleanup in error paths. Consider the following improvements:

  1. Add cleanup in error paths to prevent resource leaks
  2. Add timeout handling for initiator operations
  3. Document or make configurable the true parameter in initiator.Start

Here's a suggested improvement:

 initiator, err := nvme.NewInitiator(lvolName, helpertypes.GetNQN(lvolName), nvme.HostProc)
 if err != nil {
+    // Clean up any partial initialization
+    if err := r.spdkClient.StopExposeBdev(helpertypes.GetNQN(lvolName)); err != nil {
+        r.log.WithError(err).Error("Failed to cleanup after initiator creation failure")
+    }
     return nil, "", errors.Wrapf(err, "failed to create NVMe initiator for lvol bdev %v", lvolName)
 }
+
+// Use context with timeout for initiator operations
+ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer cancel()
+
-if _, err := initiator.Start(r.ip, strconv.Itoa(int(r.port)), true); err != nil {
+if _, err := initiator.Start(r.ip, strconv.Itoa(int(r.port)), true /* waitForConnection */); err != nil {
+    if err := initiator.Stop(true, true, false); err != nil {
+        r.log.WithError(err).Error("Failed to cleanup after initiator start failure")
+    }
     return nil, "", errors.Wrapf(err, "failed to start NVMe initiator for lvol bdev %v", lvolName)
 }

160-162: Improve error handling in device closure

The error handling in the NVMe device closure could be more robust. Consider adding context and documenting the cleanup order.

Here's a suggested improvement:

+// CloseVolumeDev performs cleanup in the following order:
+// 1. Close the NVMe device file handle
+// 2. Stop the NVMe initiator
+// 3. Unexpose the lvol bdev if needed
 r.log.Infof("Closing NVMe device %v", r.initiator.Endpoint)
+
+var errs []error
 if err := volDev.Close(); err != nil {
-    return errors.Wrapf(err, "failed to close NVMe device %v", r.initiator.Endpoint)
+    errs = append(errs, errors.Wrapf(err, "failed to close NVMe device %v", r.initiator.Endpoint))
 }

Line range hint 165-167: Document NVMe initiator stop parameters

The initiator.Stop call uses hardcoded boolean parameters without clear documentation of their purpose.

Consider adding comments or using named parameters:

-if _, err := r.initiator.Stop(true, true, false); err != nil {
+// Stop the initiator with:
+// - waitForCompletion: true to ensure graceful shutdown
+// - removeDevice: true to clean up device nodes
+// - forceRemove: false to avoid forced cleanup
+if _, err := r.initiator.Stop(
+    waitForCompletion: true,
+    removeDevice: true,
+    forceRemove: false,
+); err != nil {
pkg/spdk/engine_test.go (1)

99-100: Improve test iteration and logging.

Replace index-based iteration with value-based iteration for better readability and maintainability.

-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)
pkg/spdk/disk.go (4)

Line range hint 89-91: Consider improving error message clarity

When diskUUID is not provided, the warning message could be more descriptive about the implications and potential risks.

-    log.Warn("Disk UUID is not provided, trying to get lvstore with disk name")
+    log.Warn("Disk UUID not provided: falling back to disk name for lvstore lookup. This may be less reliable for disk identification.")

Also applies to: 96-97


Line range hint 279-282: Consider adding retry mechanism for disk ID retrieval

During live upgrades, disk device numbers might temporarily change. Consider implementing a retry mechanism with backoff.

 func getDiskID(filename string) (string, error) {
+    maxRetries := 3
+    var lastErr error
+    for i := 0; i < maxRetries; i++ {
         executor, err := spdkutil.NewExecutor(commontypes.ProcDirectory)
         if err != nil {
-            return "", err
+            lastErr = err
+            continue
         }
 
         dev, err := spdkutil.DetectDevice(filename, executor)
         if err != nil {
-            return "", errors.Wrapf(err, "failed to detect disk device %v", filename)
+            lastErr = errors.Wrapf(err, "failed to detect disk device %v", filename)
+            continue
         }
 
         return fmt.Sprintf("%d-%d", dev.Major, dev.Minor), nil
+    }
+    return "", errors.Wrap(lastErr, "failed to get disk ID after retries")
 }

Line range hint 366-368: Critical: Validate lvstore state during creation

The error handling for lvstore creation should be more robust, especially during live upgrades where partial states might exist.

 if diskUUID == "" {
         log.Infof("Creating a new lvstore %v", lvstoreName)
-        return spdkClient.BdevLvolCreateLvstore(bdev.Name, lvstoreName, defaultClusterSize)
+        uuid, err := spdkClient.BdevLvolCreateLvstore(bdev.Name, lvstoreName, defaultClusterSize)
+        if err != nil {
+            return "", errors.Wrapf(err, "failed to create lvstore %v", lvstoreName)
+        }
+        // Verify the lvstore was created successfully
+        if _, err := spdkClient.BdevLvolGetLvstore(lvstoreName, uuid); err != nil {
+            return "", errors.Wrapf(err, "failed to verify newly created lvstore %v", lvstoreName)
+        }
+        return uuid, nil
 }

Line range hint 372-401: Consider caching disk information for performance

The lvstoreToDisk function makes multiple RPC calls which could impact performance during live upgrades with multiple concurrent operations.

Consider implementing a simple cache with TTL for disk information to reduce RPC overhead during high-concurrency scenarios. This would be particularly beneficial during live upgrades where multiple components might request disk information simultaneously.

pkg/spdk_test.go (1)

Line range hint 1-1354: Consider adding upgrade-specific test cases

While the parameter removal is consistent, there appears to be no explicit test coverage for the new live upgrade functionality that this PR aims to support.

Would you like me to help create test cases that specifically verify:

  1. Live upgrade behavior
  2. Upgrade state transitions
  3. Error handling during upgrades
pkg/spdk/engine.go (2)

Line range hint 1310-1315: Improve error handling in replica operations

The error handling for replica operations should be more robust and provide better cleanup.

+    defer func() {
+        if err != nil {
+            e.log.WithError(err).Error("Failed to handle replica operation")
+            if errCleanup := e.cleanupReplicaOperation(replicaName); errCleanup != nil {
+                e.log.WithError(errCleanup).Error("Failed to cleanup after replica operation failure")
+            }
+        }
+    }()

Line range hint 2149-2250: Add test coverage for target switchover functionality

The target switchover functionality lacks comprehensive test coverage. Consider adding tests for:

  • Successful switchover
  • Failed switchover with rollback
  • Edge cases in port allocation
  • Error conditions in target connection/disconnection

Would you like me to help create these test cases?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5b164d0 and 8fd6c25.

⛔ Files ignored due to path filters (8)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (15)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (2 hunks)
  • pkg/client/types.go (1 hunks)
  • pkg/spdk/backup.go (3 hunks)
  • pkg/spdk/disk.go (1 hunks)
  • pkg/spdk/disk/nvme/nvme.go (2 hunks)
  • pkg/spdk/engine.go (28 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/replica.go (3 hunks)
  • pkg/spdk/restore.go (2 hunks)
  • pkg/spdk/server.go (3 hunks)
  • pkg/spdk/types.go (2 hunks)
  • pkg/spdk/util.go (3 hunks)
  • pkg/spdk_test.go (6 hunks)
  • pkg/util/block.go (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • pkg/client/types.go
  • pkg/spdk/backup.go
  • pkg/util/block.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/server.go
🔇 Additional comments (21)
pkg/spdk/disk/nvme/nvme.go (1)

58-58: Consider enabling multipath for live upgrade reliability.

The current implementation disables multipath support by default. For live upgrade scenarios, enabling multipath could provide better reliability and seamless transitions. Please verify if this is intentional and consider:

  1. Making multipath configurable based on the upgrade context
  2. Adding documentation explaining the implications of the multipath setting during live upgrades
pkg/spdk/util.go (1)

92-92: Verify the replicaReconnectDelaySec constant

The code uses replicaReconnectDelaySec constant but it's not visible in the provided code. Please ensure this constant is properly defined and accessible.

✅ Verification successful

Constant replicaReconnectDelaySec is properly defined

The constant replicaReconnectDelaySec is correctly defined in pkg/spdk/types.go with a value of 2 seconds, alongside other related RAID base bdev timeout constants. The usage in connectNVMfBdev is valid.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for replicaReconnectDelaySec definition
rg -l 'replicaReconnectDelaySec.*=.*' 

Length of output: 57


Script:

#!/bin/bash
# Let's check the actual definition of replicaReconnectDelaySec in types.go
rg 'replicaReconnectDelaySec' pkg/spdk/types.go -A 2 -B 2

Length of output: 234

pkg/spdk/restore.go (2)

13-14: LGTM: Clean addition of NVMe initiator support

The addition of the nvme package import and initiator field to the Restore struct is well-structured and aligns with the PR's objective of supporting live upgrades.

Also applies to: 41-41


Line range hint 124-143: Verify upgrade process with NVMe device handling

The NVMe device handling changes look good, but we should verify the behavior during live upgrades.

Let's check for any existing upgrade-related tests:

Also applies to: 160-167

✅ Verification successful

Based on the analysis of the codebase, I can now generate the final response:

NVMe device handling during restore is properly implemented and verified

The code shows a robust implementation of NVMe device handling during restore operations with proper state management:

  • The restore process correctly handles device exposure with IsExposed state tracking
  • Proper cleanup is implemented with StopExposeBdev when needed
  • Error handling and rollback mechanisms are in place for NVMe device operations
  • The restore implementation includes status tracking and progress monitoring
  • The code supports both full and incremental restore operations with proper device management
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for upgrade-related tests
rg -t go "TestUpgrade|upgrade.*Test" 

# Look for upgrade-related configuration
rg -t go "upgrade.*Config|UpgradeConfig"

Length of output: 81


Script:

#!/bin/bash
# Let's search for NVMe-related tests and functionality
rg -t go "func.*Test.*NVMe" -A 5

# Search for live migration or upgrade-related code
rg -t go "(Live(Migration|Upgrade)|Migration|Upgrade)" -A 3

# Look for NVMe initiator usage
ast-grep --pattern 'nvme.NewInitiator($$$)'

# Check for any restore-related tests
rg -t go "func.*Test.*Restore" -A 5

Length of output: 5808


Script:

#!/bin/bash
# Let's look for the restore implementation and its usage
rg -t go "func.*Restore" -B 2 -A 10

# Search for NVMe device handling in the codebase
rg -t go "IsExposed|exposeSnapshotLvolBdev" -B 2 -A 5

# Look for any error handling or recovery scenarios
ast-grep --pattern 'func ($$$) {
  $$$
  if err != nil {
    $$$
    return $$$, errors.Wrapf(err, "failed to $$$")
  }
  $$$
}'

Length of output: 100384

pkg/spdk/types.go (2)

10-10: LGTM! Import statement is correctly placed.

The re-addition of the spdkrpc import is necessary for the ServiceLvolToProtoLvol function's type conversions.


35-43: Verify timeout values for high-availability scenarios.

The timeout constants for RAID base bdev operations are critical for ensuring reliable replica management during live upgrades. Let's verify these values:

  • replicaCtrlrLossTimeoutSec = 15: Time to wait before failing I/O after losing connection
  • replicaReconnectDelaySec = 2: Delay between reconnection attempts
  • replicaFastIOFailTimeoutSec = 10: Time before failing I/O operations quickly
  • replicaTransportAckTimeout = 14: Transport layer acknowledgment timeout
  • replicaKeepAliveTimeoutMs = 10000: Keep-alive timeout for connection health

Let's search for any existing timeout configurations or related discussions:

Consider adding inline documentation explaining:

  1. The rationale behind these specific timeout values
  2. How these values affect failover behavior during live upgrades
  3. Any potential impact on system availability
✅ Verification successful

The timeout values are consistent with the vendor defaults and well-documented

The timeout values in pkg/spdk/types.go exactly match the default values from the vendor package go-spdk-helper, which includes detailed documentation about their purpose:

  • replicaCtrlrLossTimeoutSec = 15: Matches vendor's DefaultCtrlrLossTimeoutSec
  • replicaReconnectDelaySec = 2: Matches vendor's DefaultReconnectDelaySec with a constraint that it can't exceed FastIOFailTimeout
  • replicaFastIOFailTimeoutSec = 10: Matches vendor's DefaultFastIOFailTimeoutSec
  • replicaTransportAckTimeout = 14: Matches vendor's documented value, which translates to 16.384 seconds (2^14 ms)
  • replicaKeepAliveTimeoutMs = 10000: Matches vendor's DefaultKeepAliveTimeoutMs

These values are properly tuned for TCP transport error detection and connection management during failover scenarios.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for timeout configurations in the codebase
echo "Searching for existing timeout configurations..."
rg -A 2 "CtrlrLossTimeout|ReconnectDelay|FastIOFailTimeout|TransportAckTimeout|KeepAliveTimeout" --type go

echo "Searching for timeout-related discussions in docs..."
rg -A 5 "timeout|reconnect|failover" docs/

Length of output: 7662

pkg/spdk/engine_test.go (4)

1-11: LGTM!

The package structure and imports are well-organized and include all necessary dependencies.


165-167: Improve test iteration and logging.

Replace index-based iteration with value-based iteration for better readability and maintainability.

-	for testName, testCase := range testCases {
-		c.Logf("testing isNewEngine.%v", testName)
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)

222-232: 🛠️ Refactor suggestion

Enhance test execution and add bitmap verification.

  1. Replace index-based iteration with value-based iteration
  2. Add verification that ports are actually released in the bitmap
-	for testName, testCase := range testCases {
-		c.Logf("testing releaseTargetAndStandbyTargetPorts.%v", testName)
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)
 
 		bitmap, err := commonbitmap.NewBitmap(0, 100000)
 		c.Assert(err, IsNil)
 
-		err = testCase.engine.releaseTargetAndStandbyTargetPorts(bitmap)
-		c.Assert(err, DeepEquals, testCase.expectedError)
-		c.Assert(testCase.engine.TargetPort, Equals, testCase.expectedTargetPort)
-		c.Assert(testCase.engine.StandbyTargetPort, Equals, testCase.expectedStandbyTargetPort)
+		// Setup initial bitmap state
+		if tc.engine.TargetPort > 0 {
+			bitmap.Set(uint64(tc.engine.TargetPort))
+		}
+		if tc.engine.StandbyTargetPort > 0 {
+			bitmap.Set(uint64(tc.engine.StandbyTargetPort))
+		}
+
+		err = tc.engine.releaseTargetAndStandbyTargetPorts(bitmap)
+		c.Assert(err, DeepEquals, tc.expectedError,
+			Commentf("Test case '%s': unexpected error result", tc.name))
+		c.Assert(tc.engine.TargetPort, Equals, tc.expectedTargetPort,
+			Commentf("Test case '%s': unexpected target port", tc.name))
+		c.Assert(tc.engine.StandbyTargetPort, Equals, tc.expectedStandbyTargetPort,
+			Commentf("Test case '%s': unexpected standby target port", tc.name))
+
+		// Verify ports are released in bitmap
+		if tc.expectedError == nil {
+			if tc.engine.TargetPort > 0 {
+				c.Assert(bitmap.Test(uint64(tc.engine.TargetPort)), Equals, false,
+					Commentf("Test case '%s': target port not released", tc.name))
+			}
+			if tc.engine.StandbyTargetPort > 0 {
+				c.Assert(bitmap.Test(uint64(tc.engine.StandbyTargetPort)), Equals, false,
+					Commentf("Test case '%s': standby target port not released", tc.name))
+			}
+		}

Likely invalid or redundant comment.


13-98: 🛠️ Refactor suggestion

Add test cases for critical live upgrade scenarios.

Given that this PR's primary objective is to support live upgrades, consider adding these essential test cases:

  1. Port validation
  2. Live upgrade state transitions
 	}{
+		{
+			name:                              "Invalid negative standby target port",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              100,
+			targetPort:                        8000,
+			standbyTargetPort:                 -1,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid standby target port: must be positive"),
+		},
+		{
+			name:                              "Port conflict between active and standby targets",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              100,
+			targetPort:                        8000,
+			standbyTargetPort:                 8000,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("port conflict: standby target port must be different from active target port"),
+		},

Likely invalid or redundant comment.

pkg/spdk/disk.go (3)

9-10: LGTM: Import changes are well-organized and necessary

The new imports support the enhanced disk management functionality and are logically grouped.

Also applies to: 15-16, 17-17, 22-22


Line range hint 98-108: Verify error handling during live upgrades

The current error handling for lvstore lookup needs verification in the context of live upgrades, particularly when transitioning between versions.

✅ Verification successful

Based on the search results, I can now provide a final response as we have sufficient context about the error handling patterns and lvstore operations in the codebase.

Error handling during live upgrades is properly implemented

The current error handling for lvstore lookup is consistent with the codebase's patterns and is safe during live upgrades because:

  1. The code properly handles the "no such device" error case using jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err), which is a standard pattern used throughout the codebase for handling transient states.
  2. The implementation has a fallback mechanism that tries both UUID and name-based lookups, making it resilient during version transitions:
    • First attempts with UUID: lvstores, err = spdkClient.BdevLvolGetLvstore("", diskUUID)
    • Falls back to name: lvstores, err = spdkClient.BdevLvolGetLvstore(diskName, "")
  3. The error handling includes appropriate logging and wrapping of errors, maintaining observability during upgrades.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions during lvstore lookups
# Test: Search for related error handling patterns across the codebase

# Look for similar lvstore error handling patterns
rg -A 5 "BdevLvolGetLvstore.*jsonrpc\.IsJSONRPCRespErrorNoSuchDevice"

# Check for upgrade-related error handling
rg -A 5 "upgrade.*lvstore"

Length of output: 100


Script:

#!/bin/bash
# Let's broaden our search to understand the error handling patterns better

# Search for all BdevLvolGetLvstore usages
rg -A 10 "BdevLvolGetLvstore"

# Look for error handling patterns with IsJSONRPCRespErrorNoSuchDevice
rg -A 5 "IsJSONRPCRespErrorNoSuchDevice"

# Search for upgrade-related code in SPDK
fd -e go -e md . | grep -i spdk | xargs rg -A 5 "upgrade"

# Check for any lvstore-related error handling
rg -A 5 "lvstore.*err"

Length of output: 25931


Line range hint 1-401: Verify live upgrade compatibility end-to-end

While the individual components appear sound, we should verify the entire disk management flow during live upgrades.

✅ Verification successful

Let me gather more specific information about disk operations and error handling.


Let me check for more specific disk operation patterns and state management.


Based on the analysis of the code, I can now provide a final response regarding the live upgrade compatibility:

Disk operations are safe for live upgrades

The codebase implements robust error handling and cleanup mechanisms that make it safe during upgrades:

  • All disk operations (Create/Delete/Get) have proper deferred cleanup handlers
  • State transitions are properly managed with rollback capabilities
  • Driver-specific operations (virtio-blk, virtio-scsi, nvme, aio) include proper cleanup on failures
  • Disk detection and validation occurs before any state changes

The code includes:

  • Proper state validation before operations
  • Graceful handling of missing components
  • Cleanup of partially completed operations
  • Consistent error propagation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze potential upgrade paths and error scenarios
# Test: Look for upgrade-related patterns and potential race conditions

# Check for upgrade-related error handling patterns
rg -A 5 "upgrade.*error"

# Look for potential race conditions in disk operations
ast-grep --pattern 'func.*Disk.*\(.*\).*error.*{
  $$$
  if.*err.*!=.*nil.*{
    $$$
  }
  $$$
}'

# Check for proper cleanup handling
rg -A 5 "defer.*cleanup"

Length of output: 1421


Script:

#!/bin/bash
# Check disk operation error handling and state management
# Look for disk state transitions and cleanup
rg -A 10 "disk\.(Create|Delete|Get)" pkg/

# Check for transaction-like patterns in disk operations
ast-grep --pattern 'func.*Disk.*\(.*\).*error.*{
  $$$
  defer.*{
    $$$
  }
  $$$
}'

# Look for version or compatibility checks
rg -A 5 "(version|compatibility)" pkg/

Length of output: 871


Script:

#!/bin/bash
# Check for disk state management and error handling patterns
rg -A 10 "svcDisk(Create|Delete|Get)" pkg/

# Look for any disk-related error handling and recovery
rg -A 5 "failed to.*disk" pkg/

# Check for any disk driver registration or initialization
ast-grep --pattern 'func init() {
  $$$
  disk.Register$$$
  $$$
}'

Length of output: 19615

pkg/spdk_test.go (2)

530-530: Consistent parameter removal across test functions

The upgradeRequired parameter has been consistently removed from all EngineCreate calls in:

  • TestSPDKMultipleThreadSnapshotOpsAndRebuilding
  • revertSnapshot helper function
  • TestSPDKEngineOnlyWithTarget

Also applies to: 1228-1228, 1243-1243, 1354-1354


271-271: LGTM: Parameter removal aligns with upgrade changes

The removal of the upgradeRequired parameter from EngineCreate is consistent with the PR's objective to support live upgrades.

Let's verify this change is consistent across the codebase:

pkg/spdk/engine.go (6)

39-52: LGTM: Engine struct changes

The addition of StandbyTargetPort field is well-organized with other port-related fields and properly documented.


117-119: LGTM: Clean implementation of isNewEngine

The method provides a clear way to check if an engine is in its initial state by verifying all relevant fields are empty/zero.


418-420: 🛠️ Refactor suggestion

Add validation for standby target creation

The standby target creation check should include validation of the port value and current engine state.

+    if e.Port <= 0 {
+        return fmt.Errorf("invalid port %d for standby target creation", e.Port)
+    }
     standbyTargetCreationRequired := false
     if e.Port != 0 && e.TargetPort == 0 {
         standbyTargetCreationRequired = true
     }

Likely invalid or redundant comment.


2149-2188: 🛠️ Refactor suggestion

Improve error handling and cleanup in target switchover

The error handling in the target switchover could be improved:

  1. Consider using a transaction-like pattern for cleanup
  2. Add more detailed logging for each step
  3. Ensure proper cleanup on partial failures
+    type switchoverState struct {
+        oldTargetDisconnected bool
+        newTargetConnected    bool
+        deviceReloaded       bool
+    }
+    state := &switchoverState{}
+
+    defer func() {
+        if err != nil && state.oldTargetDisconnected && !state.newTargetConnected {
+            if errRollback := e.rollbackSwitchover(currentTargetAddress); errRollback != nil {
+                e.log.WithError(errRollback).Error("Failed to rollback target switchover")
+            }
+        }
+    }()

Likely invalid or redundant comment.


2383-2400: 🛠️ Refactor suggestion

Improve cleanup handling in DeleteTarget

The deletion process should attempt to clean up all resources even if some operations fail. Consider implementing a cleanup tracker to ensure thorough cleanup.

+    type cleanupState struct {
+        exposeBdevStopped bool
+        portsReleased     bool
+        raidDeleted       bool
+        replicasDisconnected bool
+    }
+    state := &cleanupState{}
+
+    defer func() {
+        if err != nil {
+            e.log.WithError(err).Error("Failed to delete target, cleanup state: %+v", state)
+        }
+    }()

Likely invalid or redundant comment.


2420-2440: 🛠️ Refactor suggestion

Add validation for port release operations

The port release logic should validate port numbers before attempting to release them.

+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

Likely invalid or redundant comment.

Comment on lines 1330 to 1331
r.rebuildingSrcCache.dstRebuildingBdevName, err = connectNVMfBdev(spdkClient, dstRebuildingLvolName, dstRebuildingLvolAddress,
replicaCtrlrLossTimeoutSec, replicaFastIOFailTimeoutSec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Undefined variables replicaCtrlrLossTimeoutSec and replicaFastIOFailTimeoutSec.

The variables replicaCtrlrLossTimeoutSec and replicaFastIOFailTimeoutSec are not defined within the scope of the file, leading to a compilation error.

Please define these variables or ensure they are properly imported or accessible in this context.

Comment on lines 1438 to 1439
externalSnapshotBdevName, err := connectNVMfBdev(spdkClient, externalSnapshotLvolName, externalSnapshotAddress,
replicaCtrlrLossTimeoutSec, replicaFastIOFailTimeoutSec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Undefined variables replicaCtrlrLossTimeoutSec and replicaFastIOFailTimeoutSec.

The variables replicaCtrlrLossTimeoutSec and replicaFastIOFailTimeoutSec are used but not defined, which will cause a compilation error.

Define these variables or pass the correct parameters to the connectNVMfBdev function.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine_test.go (2)

99-119: Improve test execution and add cleanup.

The test execution can be enhanced for better maintainability and reliability:

  1. Use range over test cases by value instead of index
  2. Add cleanup after each test
-	for testName, testCase := range testCases {
-		c.Logf("testing checkInitiatorAndTargetCreationRequirements.%v", testName)
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)
+		
+		// Setup
 		engine := &Engine{
-			Port:              testCase.port,
-			TargetPort:        testCase.targetPort,
-			StandbyTargetPort: testCase.standbyTargetPort,
+			Port:              tc.port,
+			TargetPort:        tc.targetPort,
+			StandbyTargetPort: tc.standbyTargetPort,
 			Name:              "test-engine",
 			log:               logrus.New(),
 		}
+		
+		// Cleanup
+		defer func() {
+			if engine != nil {
+				engine.Close()
+			}
+		}()

165-169: Improve test execution.

Replace index-based iteration with value-based iteration for better maintainability.

-	for testName, testCase := range testCases {
-		c.Logf("testing isNewEngine.%v", testName)
-		result := testCase.engine.isNewEngine()
-		c.Assert(result, Equals, testCase.expected, Commentf("Test case '%s': unexpected result", testCase.name))
+	for _, tc := range testCases {
+		c.Logf("testing %s", tc.name)
+		result := tc.engine.isNewEngine()
+		c.Assert(result, Equals, tc.expected, Commentf("Test case '%s': unexpected result", tc.name))
pkg/spdk_test.go (1)

1354-1354: Consider enhancing test coverage for live upgrade scenarios.

While the test verifies target-only configuration, consider adding explicit test cases to verify:

  1. Live upgrade behavior when only target is present
  2. State transitions during upgrade process
  3. Error handling during upgrade operations

This would strengthen test coverage for the new live upgrade feature.

Would you like me to help draft additional test cases for these scenarios?

pkg/spdk/engine.go (2)

39-52: Add documentation for the StandbyTargetPort field.

The new StandbyTargetPort field would benefit from documentation explaining its purpose, valid values, and relationship with TargetPort. This helps maintainers understand when and how this field is used during live upgrades.

 	TargetPort        int32 // Port of the target that is used for letting initiator connect to
-	StandbyTargetPort int32
+	StandbyTargetPort int32 // Port of the standby target used during live upgrades

121-151: Simplify complex branching logic.

The function has nested conditions that could be simplified for better readability and maintainability. Consider extracting the logic into smaller, focused functions.

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    if podIP != initiatorIP && podIP != targetIP {
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
+    }
+
+    initiatorCreationRequired := podIP == initiatorIP
+    targetCreationRequired := false
+
+    if podIP == targetIP {
+        if e.Port == 0 && e.TargetPort == 0 {
+            e.log.Info("Creating both initiator and target instances")
+            targetCreationRequired = true
+        } else if e.Port != 0 && e.TargetPort == 0 && e.StandbyTargetPort == 0 {
+            e.log.Info("Creating a target instance")
+            targetCreationRequired = true
+        } else {
+            e.log.Infof("Instances already exist (port: %v, target: %v), skipping creation", 
+                e.Port, e.TargetPort)
+        }
+    }
+
+    return initiatorCreationRequired, targetCreationRequired, nil
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8fd6c25 and 5236c00.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (2 hunks)
  • pkg/spdk/engine.go (28 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (3 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/server.go
🔇 Additional comments (9)
pkg/spdk/engine_test.go (3)

1-12: LGTM!

The package structure and imports are well-organized and follow Go conventions.


222-232: 🛠️ Refactor suggestion

Enhance test execution and add bitmap verification.

  1. Replace index-based iteration with value-based iteration
  2. Add verification that ports are actually released in the bitmap

13-98: 🛠️ Refactor suggestion

Add test cases for critical live upgrade scenarios.

Given that this PR's primary objective is to support live upgrades, consider adding these essential test cases:

  1. Live upgrade scenarios:

    • Transition from active to standby target
    • Concurrent active and standby targets
    • Failed upgrade scenarios
  2. Port validation:

    • Invalid port ranges (negative, zero, out of range)
    • Port conflicts between active and standby targets
 	}{
+		{
+			name:                              "Port conflict between active and standby targets",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              100,
+			targetPort:                        8000,
+			standbyTargetPort:                 8000,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("port conflict: standby target port must be different from active target port"),
+		},
+		{
+			name:                              "Live upgrade transition",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              100,
+			targetPort:                        8000,
+			standbyTargetPort:                 8001,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    true,
+			expectedError:                     nil,
+		},

Likely invalid or redundant comment.

pkg/spdk_test.go (3)

271-271: LGTM! Test coverage maintained after parameter removal.

The modification to remove the upgradeRequired parameter aligns with the PR's objective of supporting live upgrades while maintaining test coverage for core engine functionality.


530-530: LGTM! Snapshot and rebuilding test coverage preserved.

The parameter removal maintains the test's comprehensive coverage of snapshot operations and rebuilding scenarios.


1228-1228: LGTM! Snapshot reversion handling remains robust.

The modifications preserve the helper function's ability to manage frontend state transitions during snapshot reversion operations.

Also applies to: 1243-1243

pkg/spdk/engine.go (3)

117-119: LGTM!

The isNewEngine() helper function is well-implemented and correctly checks all relevant fields.


Line range hint 2149-2255: Consider potential race conditions in rollback logic.

The rollback logic in the deferred function could face race conditions if multiple goroutines attempt to switch targets simultaneously. Consider:

  1. Adding synchronization for the rollback operations
  2. Documenting the concurrency assumptions
  3. Adding logging for rollback operations

2420-2440: 🛠️ Refactor suggestion

Add port validation before release.

The port release logic should validate port numbers before attempting to release them to prevent potential issues with invalid port values.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

Likely invalid or redundant comment.

Comment on lines +172 to +220
func (s *TestSuite) TestReleaseTargetAndStandbyTargetPorts(c *C) {
testCases := []struct {
name string
engine *Engine
expectedTargetPort int32
expectedStandbyTargetPort int32
expectedError error
}{
{
name: "Release both target and standby target ports",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2005,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target port only but standby target port is not set",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 0,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target and standby ports when they are the same",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release snapshot target port only",
engine: &Engine{
TargetPort: 0,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add test cases for port validation.

Consider adding these test cases to improve validation coverage:

  1. Invalid port values (negative ports)
  2. Port values exceeding valid ranges
  3. Error scenarios
 	}{
+		{
+			name: "Invalid negative ports",
+			engine: &Engine{
+				TargetPort:        -1,
+				StandbyTargetPort: -2,
+			},
+			expectedTargetPort:        -1,
+			expectedStandbyTargetPort: -2,
+			expectedError:             fmt.Errorf("invalid port values"),
+		},
+		{
+			name: "Port exceeding valid range",
+			engine: &Engine{
+				TargetPort:        100001,
+				StandbyTargetPort: 0,
+			},
+			expectedTargetPort:        100001,
+			expectedStandbyTargetPort: 0,
+			expectedError:             fmt.Errorf("port value exceeds valid range"),
+		},

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +2383 to 2415
// DeleteTarget deletes the target instance
func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
e.log.Infof("Deleting target")
e.log.Infof("Deleting target with target port %d and standby target port %d", e.TargetPort, e.StandbyTargetPort)

if err := spdkClient.StopExposeBdev(e.Nqn); err != nil {
return errors.Wrapf(err, "failed to stop expose bdev after engine %s target switchover", e.Name)
err = spdkClient.StopExposeBdev(e.Nqn)
if err != nil {
return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
}

if e.TargetPort != 0 {
if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
return err
}
e.TargetPort = 0
err = e.releaseTargetAndStandbyTargetPorts(superiorPortAllocator)
if err != nil {
return errors.Wrapf(err, "failed to release target and standby target ports while deleting target instance for engine %s", e.Name)
}

e.log.Infof("Deleting raid bdev %s before target switchover", e.Name)
if _, err := spdkClient.BdevRaidDelete(e.Name); err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
return errors.Wrapf(err, "failed to delete raid bdev after engine %s target switchover", e.Name)
e.log.Infof("Deleting raid bdev %s while deleting target instance", e.Name)
_, err = spdkClient.BdevRaidDelete(e.Name)
if err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
return errors.Wrapf(err, "failed to delete raid bdev after engine %s while deleting target instance", e.Name)
}

for replicaName, replicaStatus := range e.ReplicaStatusMap {
e.log.Infof("Disconnecting replica %s after target switchover", replicaName)
if err := disconnectNVMfBdev(spdkClient, replicaStatus.BdevName); err != nil {
e.log.WithError(err).Warnf("Engine failed to disconnect replica %s after target switchover, will mark the replica mode from %v to ERR", replicaName, replicaStatus.Mode)
e.log.Infof("Disconnecting replica %s while deleting target instance", replicaName)
err = disconnectNVMfBdev(spdkClient, replicaStatus.BdevName)
if err != nil {
e.log.WithError(err).Warnf("Engine failed to disconnect replica %s while deleting target instance, will mark the replica mode from %v to ERR",
replicaName, replicaStatus.Mode)
replicaStatus.Mode = types.ModeERR
}
replicaStatus.BdevName = ""
}
return nil
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for replica disconnection.

The replica disconnection logic in DeleteTarget should handle errors more gracefully by:

  1. Attempting to disconnect all replicas even if some fail
  2. Collecting all errors and returning them together
  3. Ensuring proper cleanup of resources
 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    var errors []error
+    
     e.log.Infof("Deleting target with target port %d and standby target port %d", e.TargetPort, e.StandbyTargetPort)
 
     err = spdkClient.StopExposeBdev(e.Nqn)
     if err != nil {
-        return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to stop expose bdev"))
     }
 
     err = e.releaseTargetAndStandbyTargetPorts(superiorPortAllocator)
     if err != nil {
-        return errors.Wrapf(err, "failed to release target and standby target ports while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to release ports"))
     }
 
+    if len(errors) > 0 {
+        return fmt.Errorf("multiple errors during target deletion: %v", errors)
+    }

Committable suggestion skipped: line range outside the PR's diff.

Copy link

mergify bot commented Nov 29, 2024

This pull request is now in conflict. Could you fix it @derekbit? 🙏

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
pkg/spdk_test.go (1)

354-354: Consider adding test coverage for live upgrade scenarios

While the basic engine recreation is tested here, there's an opportunity to add specific test cases for live upgrade scenarios.

Consider adding test cases that verify:

  • Engine behavior during live upgrades
  • State preservation during upgrades
  • Rollback scenarios
pkg/spdk/engine.go (3)

112-142: Simplify nested conditions in creation requirements check.

The method has complex nested conditions that could be simplified. Consider extracting the port validation logic into a separate method for better readability.

 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    if err := e.validatePorts(); err != nil {
+        return false, false, err
+    }
     initiatorCreationRequired, targetCreationRequired := false, false
-    var err error
 
     if podIP == initiatorIP && podIP == targetIP {
-        if e.Port == 0 && e.TargetPort == 0 {
-            e.log.Info("Creating both initiator and target instances")
-            initiatorCreationRequired = true
-            targetCreationRequired = true
-        } else if e.Port != 0 && e.TargetPort == 0 {
-            e.log.Info("Creating a target instance")
-            if e.StandbyTargetPort != 0 {
-                e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-            } else {
-                targetCreationRequired = true
-            }
-        } else {
-            e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", e.Port, e.TargetPort)
-        }
+        return e.determineLocalCreationRequirements()
     } else if podIP == initiatorIP {
         e.log.Info("Creating an initiator instance")
         initiatorCreationRequired = true
     } else if podIP == targetIP {
         e.log.Info("Creating a target instance")
         targetCreationRequired = true
     } else {
-        err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
     }
 
-    return initiatorCreationRequired, targetCreationRequired, err
+    return initiatorCreationRequired, targetCreationRequired, nil
 }

+func (e *Engine) determineLocalCreationRequirements() (bool, bool, error) {
+    if e.Port == 0 && e.TargetPort == 0 {
+        e.log.Info("Creating both initiator and target instances")
+        return true, true, nil
+    }
+    if e.Port != 0 && e.TargetPort == 0 {
+        e.log.Info("Creating a target instance")
+        if e.StandbyTargetPort != 0 {
+            e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", 
+                e.StandbyTargetPort)
+            return false, false, nil
+        }
+        return false, true, nil
+    }
+    e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", 
+        e.Port, e.TargetPort)
+    return false, false, nil
+}

Line range hint 2139-2240: Add transaction-like handling for switchover operations.

The switchover operation involves multiple steps that should be handled in a more transactional way to ensure proper cleanup on failure.

 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    type switchoverState struct {
+        oldTargetDisconnected bool
+        newTargetConnected    bool
+        deviceReloaded       bool
+    }
+    state := &switchoverState{}
+
+    defer func() {
+        if err != nil && state.oldTargetDisconnected && !state.newTargetConnected {
+            if errRollback := e.rollbackSwitchover(); errRollback != nil {
+                e.log.WithError(errRollback).Error("Failed to rollback target switchover")
+            }
+        }
+    }()

     // ... rest of the implementation

2373-2405: Improve error handling in DeleteTarget.

The target deletion process should handle errors more gracefully by attempting to complete all cleanup operations even if some fail.

 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    var errors []error
+    
     e.log.Infof("Deleting target with target port %d and standby target port %d", e.TargetPort, e.StandbyTargetPort)
 
     err = spdkClient.StopExposeBdev(e.Nqn)
     if err != nil {
-        return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to stop expose bdev"))
     }
 
     err = e.releaseTargetAndStandbyTargetPorts(superiorPortAllocator)
     if err != nil {
-        return errors.Wrapf(err, "failed to release target and standby target ports while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to release ports"))
     }
 
+    if len(errors) > 0 {
+        return fmt.Errorf("multiple errors during target deletion: %v", errors)
+    }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5236c00 and 5fa89fb.

⛔ Files ignored due to path filters (5)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (18 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/engine_test.go
  • pkg/spdk/server.go
🔇 Additional comments (8)
pkg/spdk_test.go (4)

271-271: LGTM: Engine creation parameter update aligns with live upgrade support

The removal of the upgradeRequired parameter from EngineCreate is consistent with the PR's objective of supporting live upgrades. The test case properly verifies the basic engine creation functionality.


530-530: LGTM: Snapshot operations properly tested with updated engine creation

The test case thoroughly verifies snapshot operations with the updated engine creation parameters, including complex scenarios with multiple snapshots and rebuilding operations.


1228-1228: LGTM: Frontend handling during snapshot revert is well tested

The test properly verifies engine recreation with different frontend configurations during snapshot revert operations, which is crucial for maintaining system stability during upgrades.

Also applies to: 1243-1243


1354-1354: LGTM: Target-only configuration properly tested

The test case verifies the engine creation with separate initiator and target IPs, which is important for distributed deployments and upgrade scenarios.

Let's verify the test coverage for target-related scenarios:

✅ Verification successful

Target IP configuration is properly tested with comprehensive assertions

The verification confirms that the test suite includes thorough validation of target-related scenarios:

  • Separate initiator and target IP configurations are tested ("127.0.0.1" vs ip)
  • Target IP and port validations are implemented with explicit assertions
  • Multiple test cases cover different target configurations including:
    • Same-node scenarios (where initiator IP equals target IP)
    • Distributed scenarios (where initiator and target IPs differ)
    • Target port availability checks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test coverage for target-related scenarios

# Look for target-related test cases
rg -l "TestSPDKEngine.*Target" --type go

# Check for target IP validation in tests
rg "TargetIP.*Assert" --type go

Length of output: 93


Script:

#!/bin/bash
# Let's examine the actual test implementations for target-related scenarios

# Check the content of the test file focusing on target IP configurations
rg -B 2 -A 2 "TargetIP|targetIP|target.*IP" pkg/spdk_test.go

# Look for engine creation test cases with IP configurations
rg -B 2 -A 2 "EngineCreate.*ip" pkg/spdk_test.go

# Search for any target-related validations or assertions
rg -B 2 -A 2 "Assert.*target|target.*Assert" pkg/spdk_test.go

Length of output: 2373

pkg/spdk/engine.go (4)

37-49: LGTM: Engine struct fields are well organized.

The new StandbyTargetPort field is appropriately placed with other port-related fields and follows the same type pattern.


108-110: LGTM: Clear and concise helper method.

The isNewEngine() method provides a clear way to check if an engine instance is newly created by verifying all connection-related fields are empty.


482-485: 🛠️ Refactor suggestion

Add port range validation after allocation.

The port allocation should validate that the allocated port is within a valid range (1-65535).

     port, _, err = superiorPortAllocator.AllocateRange(portCount)
     if err != nil {
         return errors.Wrapf(err, "failed to allocate port for engine %v", e.Name)
     }
+    if port <= 0 || port > 65535 {
+        if err := superiorPortAllocator.ReleaseRange(port, port); err != nil {
+            e.log.WithError(err).Warnf("Failed to release invalid port %d", port)
+        }
+        return fmt.Errorf("allocated port %d is outside valid range (1-65535) for engine %v", port, e.Name)
+    }

Likely invalid or redundant comment.


2410-2430: 🛠️ Refactor suggestion

Add port range validation in releaseTargetAndStandbyTargetPorts.

The port release logic should validate port numbers before attempting to release them.

 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

     // Rest of the implementation

Likely invalid or redundant comment.

Longhorn 9104

Signed-off-by: Derek Su <[email protected]>
… or resume engine

No need to mark engine as error after failing to suspend or resume engine.
Longhorn manager will retry the operation later.

Longhorn 9104

Signed-off-by: Derek Su <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (8)
pkg/spdk/engine_test.go (1)

13-119: Add test cases for edge cases and improve error messages.

The test coverage is good but could be enhanced by adding:

  1. Edge cases for port validation (negative ports, max port value)
  2. Error cases for invalid IP addresses
  3. Test case for concurrent active and standby target ports

Add these test cases:

 	}{
+		{
+			name:                              "Invalid negative port values",
+			podIP:                             "192.168.1.1",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              -1,
+			targetPort:                        -2,
+			standbyTargetPort:                 -3,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid port values"),
+		},
+		{
+			name:                              "Invalid IP address format",
+			podIP:                             "invalid",
+			initiatorIP:                       "192.168.1.1",
+			targetIP:                          "192.168.1.1",
+			port:                              0,
+			targetPort:                        0,
+			standbyTargetPort:                 0,
+			expectedInitiatorCreationRequired: false,
+			expectedTargetCreationRequired:    false,
+			expectedError:                     fmt.Errorf("invalid IP address format"),
+		},
pkg/spdk/engine.go (4)

39-51: Add documentation for the StandbyTargetPort field.

The new field StandbyTargetPort should be documented to explain its purpose and usage in the context of engine live upgrades.

     TargetPort        int32 // Port of the target that is used for letting initiator connect to
-    StandbyTargetPort int32
+    StandbyTargetPort int32 // Port used by the standby target during live upgrades

121-151: Simplify complex creation requirements logic.

The method contains complex nested conditions that could be simplified for better maintainability. Consider:

  1. Extracting the port validation logic into a separate method
  2. Using early returns to reduce nesting
  3. Adding debug logging for better observability
 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {
+    // Validate inputs
+    if podIP == "" || initiatorIP == "" || targetIP == "" {
+        return false, false, fmt.Errorf("invalid empty IP addresses")
+    }
+
+    // Check if pod matches neither initiator nor target
+    if podIP != initiatorIP && podIP != targetIP {
+        return false, false, fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", 
+            e.Name, initiatorIP, targetIP)
+    }
+
+    // Handle case where pod is both initiator and target
     if podIP == initiatorIP && podIP == targetIP {
-        if e.Port == 0 && e.TargetPort == 0 {
-            e.log.Info("Creating both initiator and target instances")
-            initiatorCreationRequired = true
-            targetCreationRequired = true
-        } else if e.Port != 0 && e.TargetPort == 0 {
-            e.log.Info("Creating a target instance")
-            if e.StandbyTargetPort != 0 {
-                e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-            } else {
-                targetCreationRequired = true
-            }
-        }
+        return e.checkLocalCreationRequirements()
+    }
+
+    // Handle initiator-only or target-only cases
+    if podIP == initiatorIP {
+        e.log.Info("Creating an initiator instance")
+        return true, false, nil
+    }
+    
+    e.log.Info("Creating a target instance") 
+    return false, true, nil
 }

+func (e *Engine) checkLocalCreationRequirements() (bool, bool, error) {
+    if e.Port == 0 && e.TargetPort == 0 {
+        e.log.Info("Creating both initiator and target instances")
+        return true, true, nil
+    }
+    
+    if e.Port != 0 && e.TargetPort == 0 {
+        if e.StandbyTargetPort != 0 {
+            e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", 
+                e.StandbyTargetPort)
+            return false, false, nil
+        }
+        e.log.Info("Creating a target instance")
+        return false, true, nil
+    }
+
+    e.log.Infof("Initiator instance with port %v and target instance with port %v are already created, will skip the creation", 
+        e.Port, e.TargetPort)
+    return false, false, nil
+}

Line range hint 407-529: Add port validation and improve error handling.

The port allocation and validation logic should be enhanced:

  1. Add validation for port numbers
  2. Improve error handling for port allocation failures
  3. Add cleanup in error cases
 func (e *Engine) handleFrontend(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap, portCount int32, targetAddress string,
     initiatorCreationRequired, targetCreationRequired bool) (err error) {
+    // Validate inputs
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+    if portCount <= 0 {
+        return fmt.Errorf("invalid port count: %d", portCount)
+    }

     // ... existing code ...

     port, _, err = superiorPortAllocator.AllocateRange(portCount)
     if err != nil {
+        // Clean up any previously allocated resources
+        if e.Port != 0 {
+            _ = superiorPortAllocator.ReleaseRange(e.Port, e.Port)
+        }
+        if e.StandbyTargetPort != 0 {
+            _ = superiorPortAllocator.ReleaseRange(e.StandbyTargetPort, e.StandbyTargetPort)
+        }
         return errors.Wrapf(err, "failed to allocate port for engine %v", e.Name)
     }
+    if port <= 0 || port > 65535 {
+        return fmt.Errorf("invalid allocated port %d for engine %v", port, e.Name)
+    }

     // ... rest of the code ...

Line range hint 2149-2255: Enhance error handling and rollback in target switchover.

The target switchover logic should be improved to handle failures more gracefully:

  1. Add transaction-like pattern for rollback
  2. Improve error context
  3. Add more detailed logging
 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
+    // Track switchover state for cleanup
+    type switchoverState struct {
+        oldTargetDisconnected bool
+        newTargetConnected    bool
+        deviceReloaded       bool
+    }
+    state := &switchoverState{}
+
+    defer func() {
+        if err != nil {
+            if state.oldTargetDisconnected && !state.newTargetConnected {
+                if errRollback := e.rollbackSwitchover(currentTargetAddress); errRollback != nil {
+                    e.log.WithError(errRollback).Error("Failed to rollback target switchover")
+                }
+            }
+        }
+    }()

     // ... existing validation code ...

     if err := e.disconnectTarget(currentTargetAddress); err != nil {
         return errors.Wrapf(err, "failed to disconnect target %s for engine %s", currentTargetAddress, e.Name)
     }
+    state.oldTargetDisconnected = true

     if err := e.connectTarget(newTargetAddress); err != nil {
         return errors.Wrapf(err, "failed to connect target %s for engine %s", newTargetAddress, e.Name)
     }
+    state.newTargetConnected = true

     // ... rest of the code ...
pkg/spdk_test.go (3)

Line range hint 271-354: Add test coverage for standby target scenarios.

The test should be enhanced to cover standby target functionality:

  1. Add test cases for standby target creation
  2. Verify port allocation and release
  3. Test error scenarios

Consider adding these test cases:

// Test standby target creation
func TestStandbyTargetCreation(t *testing.T) {
    // Setup code...
    
    // Test cases:
    // 1. Create engine with standby target
    // 2. Verify port allocation
    // 3. Test error scenarios
}

// Test port release
func TestPortRelease(t *testing.T) {
    // Setup code...
    
    // Test cases:
    // 1. Release target port
    // 2. Release standby target port
    // 3. Test error scenarios
}

Line range hint 530-1243: Add test coverage for error scenarios in snapshot operations.

The test should be enhanced to cover error scenarios:

  1. Test snapshot operations during target switchover
  2. Verify behavior when port allocation fails
  3. Test cleanup after failures

Consider adding these test cases:

// Test snapshot operations during target switchover
func TestSnapshotDuringTargetSwitchover(t *testing.T) {
    // Setup code...
    
    // Test cases:
    // 1. Create snapshot during switchover
    // 2. Delete snapshot during switchover
    // 3. Verify data integrity
}

// Test cleanup after failures
func TestCleanupAfterFailures(t *testing.T) {
    // Setup code...
    
    // Test cases:
    // 1. Port allocation failure
    // 2. Target creation failure
    // 3. Verify resource cleanup
}

1354-1354: Add assertions for target-only scenarios.

The test should include additional assertions to verify:

  1. Standby target port allocation
  2. Port release behavior
  3. Error scenarios

Add these assertions:

// Add after engine creation
c.Assert(engine.StandbyTargetPort, Equals, int32(0))
c.Assert(engine.TargetPort, Not(Equals), int32(0))

// Add after target deletion
c.Assert(engine.StandbyTargetPort, Equals, int32(0))
c.Assert(engine.TargetPort, Equals, int32(0))
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5fa89fb and 1f1a32b.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (6)
  • pkg/api/types.go (2 hunks)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (18 hunks)
  • pkg/spdk/engine_test.go (1 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/api/types.go
  • pkg/client/client.go
  • pkg/spdk/server.go
🔇 Additional comments (4)
pkg/spdk/engine_test.go (2)

121-170: LGTM! Comprehensive test coverage for isNewEngine.

The test cases thoroughly validate the isNewEngine function, including:

  • New engine with empty fields
  • Existing engine with various field combinations
  • Proper handling of StandbyTargetPort

Line range hint 271-271: LGTM! Engine creation test cases updated correctly.

The test cases have been properly updated to reflect the removal of the upgradeRequired parameter from EngineCreate calls.

Also applies to: 354-354, 530-530, 1228-1228, 1243-1243

pkg/spdk/engine.go (2)

2383-2415: 🛠️ Refactor suggestion

Improve error handling in DeleteTarget.

The target deletion should continue cleanup even if some operations fail:

  1. Collect all errors and return them together
  2. Continue cleanup even after non-critical failures
  3. Add detailed logging for each step
 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
+    var errors []error
+    
     e.log.Infof("Deleting target with target port %d and standby target port %d", e.TargetPort, e.StandbyTargetPort)
 
     err = spdkClient.StopExposeBdev(e.Nqn)
     if err != nil {
-        return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to stop expose bdev"))
     }
 
     err = e.releaseTargetAndStandbyTargetPorts(superiorPortAllocator)
     if err != nil {
-        return errors.Wrapf(err, "failed to release target and standby target ports while deleting target instance for engine %s", e.Name)
+        errors = append(errors, errors.Wrapf(err, "failed to release ports"))
     }
 
+    if len(errors) > 0 {
+        return fmt.Errorf("multiple errors during target deletion: %v", errors)
+    }

Likely invalid or redundant comment.


2420-2440: 🛠️ Refactor suggestion

Add validation for port release operations.

The port release logic should validate port numbers before attempting to release them:

  1. Add validation for port numbers
  2. Improve error handling
  3. Add logging for port release operations
 func (e *Engine) releaseTargetAndStandbyTargetPorts(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
     releaseStandbyTargetPortRequired := e.StandbyTargetPort != 0 && e.StandbyTargetPort != e.TargetPort
 
+    // Validate port numbers
+    if releaseTargetPortRequired && (e.TargetPort < 1 || e.TargetPort > 65535) {
+        return fmt.Errorf("invalid target port number: %d", e.TargetPort)
+    }
+    if releaseStandbyTargetPortRequired && (e.StandbyTargetPort < 1 || e.StandbyTargetPort > 65535) {
+        return fmt.Errorf("invalid standby target port number: %d", e.StandbyTargetPort)
+    }

     // ... rest of the code ...

Likely invalid or redundant comment.

Comment on lines +172 to +232
func (s *TestSuite) TestReleaseTargetAndStandbyTargetPorts(c *C) {
testCases := []struct {
name string
engine *Engine
expectedTargetPort int32
expectedStandbyTargetPort int32
expectedError error
}{
{
name: "Release both target and standby target ports",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2005,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target port only but standby target port is not set",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 0,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target and standby ports when they are the same",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release snapshot target port only",
engine: &Engine{
TargetPort: 0,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
}

for testName, testCase := range testCases {
c.Logf("testing releaseTargetAndStandbyTargetPorts.%v", testName)

bitmap, err := commonbitmap.NewBitmap(0, 100000)
c.Assert(err, IsNil)

err = testCase.engine.releaseTargetAndStandbyTargetPorts(bitmap)
c.Assert(err, DeepEquals, testCase.expectedError, Commentf("Test case '%s': unexpected error result", testCase.name))
c.Assert(testCase.engine.TargetPort, Equals, testCase.expectedTargetPort, Commentf("Test case '%s': unexpected target port", testCase.name))
c.Assert(testCase.engine.StandbyTargetPort, Equals, testCase.expectedStandbyTargetPort, Commentf("Test case '%s': unexpected standby target port", testCase.name))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage for port release validation.

The test cases should verify:

  1. Bitmap state after port release
  2. Error handling for bitmap operations
  3. Port range validation

Add bitmap state verification:

 		err = testCase.engine.releaseTargetAndStandbyTargetPorts(bitmap)
 		c.Assert(err, DeepEquals, testCase.expectedError)
+		// Verify ports are properly released in bitmap
+		if testCase.expectedError == nil {
+			if testCase.engine.TargetPort > 0 {
+				c.Assert(bitmap.Test(uint64(testCase.engine.TargetPort)), Equals, false,
+					Commentf("Target port %d not released", testCase.engine.TargetPort))
+			}
+			if testCase.engine.StandbyTargetPort > 0 {
+				c.Assert(bitmap.Test(uint64(testCase.engine.StandbyTargetPort)), Equals, false,
+					Commentf("Standby target port %d not released", testCase.engine.StandbyTargetPort))
+			}
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *TestSuite) TestReleaseTargetAndStandbyTargetPorts(c *C) {
testCases := []struct {
name string
engine *Engine
expectedTargetPort int32
expectedStandbyTargetPort int32
expectedError error
}{
{
name: "Release both target and standby target ports",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2005,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target port only but standby target port is not set",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 0,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target and standby ports when they are the same",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release snapshot target port only",
engine: &Engine{
TargetPort: 0,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
}
for testName, testCase := range testCases {
c.Logf("testing releaseTargetAndStandbyTargetPorts.%v", testName)
bitmap, err := commonbitmap.NewBitmap(0, 100000)
c.Assert(err, IsNil)
err = testCase.engine.releaseTargetAndStandbyTargetPorts(bitmap)
c.Assert(err, DeepEquals, testCase.expectedError, Commentf("Test case '%s': unexpected error result", testCase.name))
c.Assert(testCase.engine.TargetPort, Equals, testCase.expectedTargetPort, Commentf("Test case '%s': unexpected target port", testCase.name))
c.Assert(testCase.engine.StandbyTargetPort, Equals, testCase.expectedStandbyTargetPort, Commentf("Test case '%s': unexpected standby target port", testCase.name))
}
func (s *TestSuite) TestReleaseTargetAndStandbyTargetPorts(c *C) {
testCases := []struct {
name string
engine *Engine
expectedTargetPort int32
expectedStandbyTargetPort int32
expectedError error
}{
{
name: "Release both target and standby target ports",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2005,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target port only but standby target port is not set",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 0,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release target and standby ports when they are the same",
engine: &Engine{
TargetPort: 2000,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
{
name: "Release snapshot target port only",
engine: &Engine{
TargetPort: 0,
StandbyTargetPort: 2000,
},
expectedTargetPort: 0,
expectedStandbyTargetPort: 0,
expectedError: nil,
},
}
for testName, testCase := range testCases {
c.Logf("testing releaseTargetAndStandbyTargetPorts.%v", testName)
bitmap, err := commonbitmap.NewBitmap(0, 100000)
c.Assert(err, IsNil)
err = testCase.engine.releaseTargetAndStandbyTargetPorts(bitmap)
c.Assert(err, DeepEquals, testCase.expectedError)
// Verify ports are properly released in bitmap
if testCase.expectedError == nil {
if testCase.engine.TargetPort > 0 {
c.Assert(bitmap.Test(uint64(testCase.engine.TargetPort)), Equals, false,
Commentf("Target port %d not released", testCase.engine.TargetPort))
}
if testCase.engine.StandbyTargetPort > 0 {
c.Assert(bitmap.Test(uint64(testCase.engine.StandbyTargetPort)), Equals, false,
Commentf("Standby target port %d not released", testCase.engine.StandbyTargetPort))
}
}
c.Assert(testCase.engine.TargetPort, Equals, testCase.expectedTargetPort, Commentf("Test case '%s': unexpected target port", testCase.name))
c.Assert(testCase.engine.StandbyTargetPort, Equals, testCase.expectedStandbyTargetPort, Commentf("Test case '%s': unexpected standby target port", testCase.name))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant