Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-22315 Make raft-client starting only once and only with raft-client and replica together #3956

Merged
merged 37 commits into from
Jul 9, 2024

Conversation

JAkutenshi
Copy link
Contributor

@JAkutenshi JAkutenshi commented Jun 19, 2024

JIRA Ticket: IGNITE-22315 | Make raft-client starting only once and only with raft-client and replica together

The goal

The goal of the ticket is to remove excessive raft-clients starting.

The reason

If a local node is in assignments, then and only then we should start a whole replication group, that consists of raft-node, raft-client and replica itself. Every entity is a single copy per group and neither should be started outside of the group. Before this ticket every partition started a raft-client.

The solution

  1. Related completion stage that create raft-client is removed.
  2. Tests are adjusted for such behavior and check that raft-clients started and updated only on replication group and only.
  3. ReplicaManager#startReplica chains all raft-entities starting together with replica starting to a common future.
  4. Replica manager now handle with replicas starting executor. The motivation do not grab it from table manager is better control of the lifecycle: the executor starts inside replica manager and there is terminated.
  5. TableManager#startPartitionAndStartClient is refactored due to simplicity of the code.

Abandoned PR due to unstable tests flaking is closed

@JAkutenshi JAkutenshi marked this pull request as draft June 19, 2024 16:17
@JAkutenshi JAkutenshi changed the title IGNITE-22315 [Draft] [Fresh redone branch] ]Make raft-client starting only once and only with raft-client and replica together IGNITE-22315 Make raft-client starting only once and only with raft-client and replica together Jun 21, 2024
@JAkutenshi JAkutenshi marked this pull request as ready for review June 21, 2024 18:05
if (localMemberAssignment == null) {
// (0) in case if node not in the assignments
return nullCompletedFuture();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make an empty line here, pls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// TODO: will be replaced with replica usage in https://issues.apache.org/jira/browse/IGNITE-22218
RaftGroupService raftGroupService;
try {
raftGroupService = internalTable.tableRaftService().partitionRaftGroupService(partitionId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this try-catch now? Could you place a comment, why the simple continue is enough? Shouldn't we write the log with warn?

Copy link
Contributor Author

@JAkutenshi JAkutenshi Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test's logic is go through every partition and try to get raft client from TableRaftService if presented. The method TableRaftService#partitionRaftGroupService says in javadoc: IgniteInternalException if partition can't be found.. So, the only way for now safely check that there no raft client is to catch the exception. The good news is this code is already removed in the next related PR, so, I guess we can leave it now?

} catch (IgniteInternalException e) {
return false;
}
return raftClient != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add empty lines here and there :) according to the recommendation "The source code should be broken into minimal semantic units and all such units must be separated by one empty line. To simplify the recognition of semantic units, every line of the source code should be separated by one empty line" from the https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines#CodingGuidelines-SemanticUnits

Copy link
Contributor Author

@JAkutenshi JAkutenshi Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote it a little

@@ -713,7 +713,7 @@ public TableViewInternal startTable(String tableName, SchemaDescriptor schemaDes
storageIndexTracker,
completedFuture(listener)
);
} catch (NodeStoppingException e) {
} catch (Throwable e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it can be not only node stopping now, but any NPE for example and etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try-catch is removed it looks like excessive there.

@JAkutenshi JAkutenshi requested a review from kgusakov June 24, 2024 16:09
# Conflicts:
#	modules/table/src/main/java/org/apache/ignite/internal/table/distributed/TableManager.java
@@ -713,9 +711,6 @@ public TableViewInternal startTable(String tableName, SchemaDescriptor schemaDes
storageIndexTracker,
completedFuture(listener)
);
} catch (NodeStoppingException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?
Do you mean that will through unchecked exception instead? Which one if true? Or there's no longer network communication on replica creation and thus it's not possible to get NodeStoppingException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There deprecated startReplica is used that doesn't throw NodeStoppingException explicitly == there are no calls that throw the checked exception, but the true one startReplica does, but isn't used in the test and will be fixed in IGNITE-22373 | Delete startReplica overloads from ReplicaManager

@sanpwc sanpwc merged commit 84b760b into apache:main Jul 9, 2024
1 check passed
@sanpwc sanpwc deleted the ignite-22315-redone branch July 9, 2024 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants