Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport the cluster scripts to 2.1 #5174

Open
wants to merge 1 commit into
base: 2.1
Choose a base branch
from

Conversation

ctubbsii
Copy link
Member

This backports the improvements to the accumulo-cluster and accumulo-service scripts to 2.1 to improve script maintenance and functionality across the branches.

Notable differences with the main branch:

  • This version includes deprecated commands that are replaced by newer versions: start-non-tservers, start-servers, stop-servers, start-tservers, and stop-tservers
  • This vesion includes the ability to control starting a compaction-coordinator, which was removed in the main branch for 4.0
  • This version does not allow specifying tserver groups (unsupported by its cluster.yaml version) and uses a hard-coded value of "default"
  • As before these changes, this version checks for old deprecated config files present in conf/
  • As before these changes, this version parses the cluster.yaml that was supported for the 2.1 and 3.1 branches
  • This version is modified to translate the variables emitted from its version of the ClusterConfigParser to the newer names used in the main branch
  • This version does not require compactor groups to be configured (since external compactions are optional)
  • This version uses -a <host> instead of -o general.process.bind.address=<host> for the bind address
  • This version uses -q and -g for the group parameter

This includes relevant changes from #4966 to standardize the service names, #5066 to reduce ssh connections to hosts, #5114 to support starting/stopping groups of servers by type, #5126 to support resource group overrides, #5116 to refactor and simplify the script argument parsing using getopt, and the bugfixes and improvements from #5167

@ctubbsii ctubbsii added this to the 2.1.4 milestone Dec 12, 2024
@ctubbsii ctubbsii self-assigned this Dec 12, 2024
@ctubbsii
Copy link
Member Author

The diffs here are probably not going to be very helpful. The more relevant diff is between these changed files and the current contents of 3.1 and main. Some of these differences go away when merging into 3.1, and most of the rest goes away when merging into main. There are a few small improvements in here that I do expect to include in the main branch, though. Those are specifically:

diff --git a/assemble/bin/accumulo-cluster b/assemble/bin/accumulo-cluster
index a87ad4938a..31dff57d81 100755
--- a/assemble/bin/accumulo-cluster
+++ b/assemble/bin/accumulo-cluster
@@ -38,7 +38,7 @@ Commands:
   start-here                 Deprecated. Alias for "start --local --all"
   stop-here                  Deprecated. Alias for "stop --local --all"
   restart [--local] [--all | [--manager] [--gc] [--monitor] [--tservers[=group]] [--sservers[=group]] [--compactors[=group]]
-                             Restarts the Accumulo cluster
+                             Restarts Accumulo cluster services
   start [--local] [--all | [--manager] [--gc] [--monitor] [--tservers[=group]] [--sservers[=group]] [--compactors[=group]]
                              Starts Accumulo cluster services
   stop [--local] [--all | [--manager] [--gc] [--monitor] [--tservers[=group]] [--sservers[=group]] [--compactors[=group]]
@@ -53,7 +53,7 @@ Commands:
   accumulo-cluster start --local                           # start all local services
   accumulo-cluster start --local --manager                 # start local manager services
   accumulo-cluster start --tservers                        # start all tservers
-  accumulo-cluster start --tservers=group1                 # start all group1 tservers
+  accumulo-cluster start --sservers=group1                 # start all group1 sservers
   accumulo-cluster start --local --manager --tservers      # Start the local manager and local tservers
 
 EOF
@@ -140,7 +140,7 @@ function parse_args {
   if [[ $ARG_ALL == 1 && ($ARG_MANAGER == 1 ||
     $ARG_GC == 1 || $ARG_MONITOR == 1 || $ARG_TSERVER == 1 ||
     $ARG_SSERVER == 1 || $ARG_COMPACTOR == 1) ]]; then
-    echo "--all cannot be used with other options"
+    echo -- "--all cannot be used with other options"
     print_usage
     exit 1
   fi
@@ -152,20 +152,17 @@ function parse_args {
     ARG_ALL=1
   fi
 
-  if isDebug; then
-    echo "DEBUG=$DEBUG"
-    echo "ARG_ALL=$ARG_ALL"
-    echo "ARG_LOCAL=$ARG_LOCAL"
-    echo "ARG_MANAGER=$ARG_MANAGER"
-    echo "ARG_GC=$ARG_GC"
-    echo "ARG_MONITOR=$ARG_MONITOR"
-    echo "ARG_TSERVER=$ARG_TSERVER"
-    echo "ARG_TSERVER_GROUP=$ARG_TSERVER_GROUP"
-    echo "ARG_SSERVER=$ARG_SSERVER"
-    echo "ARG_SSERVER_GROUP=$ARG_SSERVER_GROUP"
-    echo "ARG_COMPACTOR=$ARG_COMPACTOR"
-    echo "ARG_COMPACTOR_GROUP=$ARG_COMPACTOR_GROUP"
-  fi
+  debug "ARG_ALL=$ARG_ALL"
+  debug "ARG_LOCAL=$ARG_LOCAL"
+  debug "ARG_MANAGER=$ARG_MANAGER"
+  debug "ARG_GC=$ARG_GC"
+  debug "ARG_MONITOR=$ARG_MONITOR"
+  debug "ARG_TSERVER=$ARG_TSERVER"
+  debug "ARG_TSERVER_GROUP=$ARG_TSERVER_GROUP"
+  debug "ARG_SSERVER=$ARG_SSERVER"
+  debug "ARG_SSERVER_GROUP=$ARG_SSERVER_GROUP"
+  debug "ARG_COMPACTOR=$ARG_COMPACTOR"
+  debug "ARG_COMPACTOR_GROUP=$ARG_COMPACTOR_GROUP"
 }
 
 function invalid_args {
diff --git a/assemble/bin/accumulo-service b/assemble/bin/accumulo-service
index ae5c424de8..e56e647227 100755
--- a/assemble/bin/accumulo-service
+++ b/assemble/bin/accumulo-service
@@ -111,7 +111,7 @@ function start_service() {
     rotate_log "$outfile"
     rotate_log "$errfile"
 
-    nohup "${bin}/accumulo" "$service_type" "$@" "${PROPERTY_OVERRIDES[@]}" >"$outfile" 2>"$errfile" </dev/null &
+    nohup "${bin}/accumulo" "$service_type" "$@" >"$outfile" 2>"$errfile" </dev/null &
     echo "$!" >"${pid_file}"
 
   done

This backports the improvements to the accumulo-cluster and
accumulo-service scripts to 2.1 to improve script maintenance and
functionality across the branches.

Notable differences with the main branch:

* This version includes deprecated commands that are replaced by newer
  versions: start-non-tservers, start-servers, stop-servers,
  start-tservers, and stop-tservers
* This vesion includes the ability to control starting a
  compaction-coordinator, which was removed in the main branch for 4.0
* This version does not allow specifying tserver groups (unsupported by
  its cluster.yaml version) and uses a hard-coded value of "default"
* As before these changes, this version checks for old deprecated config
  files present in conf/
* As before these changes, this version parses the cluster.yaml that was
  supported for the 2.1 and 3.1 branches
* This version is modified to translate the variables emitted from its
  version of the ClusterConfigParser to the newer names used in the main
  branch
* This version does not require compactor groups to be configured (since
  external compactions are optional)
* This version uses `-a <host>` instead of `-o
  general.process.bind.address=<host>` for the bind address
* This version uses `-q` and `-g` for the group parameter

This includes relevant changes from apache#4966 to standardize the service
names, apache#5066 to reduce ssh connections to hosts, apache#5114 to support
starting/stopping groups of servers by type, apache#5126 to support resource
group overrides, apache#5116 to refactor and simplify the script argument
parsing using getopt, and the bugfixes and improvements from apache#5167
Copy link
Contributor

@ddanielr ddanielr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added definitions for the external compaction services caused an error.

Section added to cluster.yaml

compaction:
  coordinator:
    - localhost
  compactor:
    - q1:
        - localhost
    - q2:
        - localhost
        
tservers_per_host: 2
sservers_per_host:
 - default: 1
compactors_per_host:
 - q1: 1
 - q2: 1

Error:

$./bin/accumulo-cluster start --dry-run
DEBUG: args:  --dry-run --
DEBUG: ARG_ALL=1
DEBUG: ARG_LOCAL=0
DEBUG: ARG_MANAGER=0
DEBUG: ARG_GC=0
DEBUG: ARG_MONITOR=0
DEBUG: ARG_COORDINATOR=0
DEBUG: ARG_TSERVER=0
DEBUG: ARG_TSERVER_GROUP=
DEBUG: ARG_SSERVER=0
DEBUG: ARG_SSERVER_GROUP=
DEBUG: ARG_COMPACTOR=0
DEBUG: ARG_COMPACTOR_GROUP=
DEBUG: LOCAL_HOST_ADDRESSES=::1 10-113-15-93 10.113.15.93 127.0.0.1 172.17.0.1 192.168.49.1 ip-10-113-15-93.evoforge.org ip-172-17-0-1.ec2.internal ip-192-168-49-1.ec2.internal localhost localhost4 localhost4.localdomain4 localhost6 localhost6.localdomain6 localhost.localdomain
DEBUG: Parsed config:
MANAGER_HOSTS="localhost"
MONITOR_HOSTS="localhost"
GC_HOSTS="localhost"
TSERVER_HOSTS="localhost"
COORDINATOR_HOSTS="localhost"
COMPACTION_QUEUES="q1 q2"
COMPACTOR_HOSTS_q1="localhost"
NUM_COMPACTORS_q1="1"
COMPACTOR_HOSTS_q2="localhost"
NUM_COMPACTORS_q2="1"
SSERVER_GROUPS="default"
SSERVER_HOSTS_default="localhost"
NUM_SSERVERS_default="1"
NUM_TSERVERS="${NUM_TSERVERS:=2}"
starting tablet servers for group default
DEBUG: ssh -qnf -o ConnectTimeout=2 localhost bash\ -c\ \'ACCUMULO_CLUSTER_ARG=2\ \"/workspace/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/bin/accumulo-service\"\ tserver\ start\ -a\ localhost\ \' 
done
DEBUG: ssh -qnf -o ConnectTimeout=2 localhost bash\ -c\ \'ACCUMULO_CLUSTER_ARG=1\ \"/workspace/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/bin/accumulo-service\"\ manager\ start\ -a\ localhost\ \' 
DEBUG: ssh -qnf -o ConnectTimeout=2 localhost bash\ -c\ \'ACCUMULO_CLUSTER_ARG=1\ \"/workspace/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/bin/accumulo-service\"\ gc\ start\ -a\ localhost\ \' 
DEBUG: ssh -qnf -o ConnectTimeout=2 localhost bash\ -c\ \'ACCUMULO_CLUSTER_ARG=1\ \"/workspace/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/bin/accumulo-service\"\ monitor\ start\ -a\ localhost\ \' 
./bin/accumulo-cluster: line 365: COMPACTION-COORDINATORS_PER_HOST_default: invalid variable name

Removed all the cluster changes and only adding a definition for the compaction-coordinator triggered the error.

compaction:
  coordinator:
    - localhost

@@ -31,3 +31,8 @@ instance.secret=DEFAULT

## Set to false if 'accumulo-util build-native' fails
tserver.memory.maps.native.enabled=true

## (optional) include additional property files for a resource group
## based on the ACCUMULO_RESOURCE_GROUP env var set in accumulo-service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment about this additional property file not being supported for tservers since they do not have resource groups in 2.1?

Copy link
Contributor

@ddanielr ddanielr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes don't seem to be working for the external compaction functionality

@ctubbsii
Copy link
Member Author

Ah, right, I thought we might run into this. The bash variable is based on the service name, and the coordinator has a dash in the service name, which you can't have in a bash variable. I'll work around that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants