Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskExecutor Connection refused: wordcount-operator-example #238

Open
eyerra opened this issue Aug 24, 2021 · 0 comments
Open

TaskExecutor Connection refused: wordcount-operator-example #238

eyerra opened this issue Aug 24, 2021 · 0 comments

Comments

@eyerra
Copy link

eyerra commented Aug 24, 2021

Task Executor Log:

Starting Task Manager
config file:
jobmanager.rpc.address: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 8
parallelism.default: 1
jobmanager.execution.failover-strategy: region
blob.server.port: 6124
query.server.port: 6125
Starting taskexecutor as a console application on host wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2.
2021-08-24 20:49:12,617 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------
2021-08-24 20:49:12,618 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting TaskManager (Version: 1.9.1, Rev:4d56de8, Date:30.09.2019 @ 11:32:19 CST)
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current user: flink
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.232-b09
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 922 MiBytes
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /usr/local/openjdk-8
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  No Hadoop Dependency available
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM Options:
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:+UseG1GC
2021-08-24 20:49:12,619 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xms922M
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xmx922M
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:MaxDirectMemorySize=8388607T
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program Arguments:
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     --configDir
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     /opt/flink/conf
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath: /opt/flink/lib/flink-table-blink_2.12-1.9.1.jar:/opt/flink/lib/flink-table_2.12-1.9.1.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.9.1.jar:::
2021-08-24 20:49:12,620 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------
2021-08-24 20:49:12,621 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Registered UNIX signal handlers for [TERM, HUP, INT]
2021-08-24 20:49:12,625 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Maximum number of open file descriptors is 1048576.
2021-08-24 20:49:12,640 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2
2021-08-24 20:49:12,640 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2021-08-24 20:49:12,640 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2021-08-24 20:49:12,641 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2021-08-24 20:49:12,641 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 8
2021-08-24 20:49:12,641 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2021-08-24 20:49:12,642 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-08-24 20:49:12,643 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: blob.server.port, 6124
2021-08-24 20:49:12,643 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: query.server.port, 6125
2021-08-24 20:49:12,720 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2021-08-24 20:49:12,746 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2021-08-24 20:49:12,764 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2021-08-24 20:49:13,169 INFO  org.apache.flink.configuration.Configuration                  - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2021-08-24 20:49:13,174 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-08-24 20:49:13,174 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
2021-08-24 20:49:13,178 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved new target address wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123.
2021-08-24 20:49:13,942 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123
2021-08-24 20:49:13,942 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126': Connection refused (Connection refused)
2021-08-24 20:49:19,550 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123
2021-08-24 20:49:19,550 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126': Connection refused (Connection refused)
2021-08-24 20:49:19,551 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.10.1.126': Connection refused (Connection refused)
2021-08-24 20:49:19,551 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.10.1.126': Connection refused (Connection refused)
2021-08-24 20:49:19,551 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
2021-08-24 20:49:19,551 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/10.10.1.126': Connection refused (Connection refused)
2021-08-24 20:49:19,552 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
2021-08-24 20:49:23,175 WARN  org.apache.flink.runtime.net.ConnectionUtils                  - Could not connect to wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123. Selecting a local address using heuristics.
2021-08-24 20:49:23,178 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - TaskManager will use hostname/address 'wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2' (10.10.1.126) for communication.
2021-08-24 20:49:23,183 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 10.10.1.126:0
2021-08-24 20:49:23,878 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2021-08-24 20:49:23,949 INFO  akka.remote.Remoting                                          - Starting remoting
2021-08-24 20:49:24,154 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://[email protected]:36883]
2021-08-24 20:49:24,262 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at akka.tcp://[email protected]:36883
2021-08-24 20:49:24,276 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics reporter configured, no metrics will be exposed/reported.
2021-08-24 20:49:24,277 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 10.10.1.126:0
2021-08-24 20:49:24,298 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2021-08-24 20:49:24,302 INFO  akka.remote.Remoting                                          - Starting remoting
2021-08-24 20:49:24,314 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://[email protected]:39517]
2021-08-24 20:49:24,323 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at akka.tcp://[email protected]:39517
2021-08-24 20:49:24,328 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/MetricQueryService_9ec096409864e7f929ccf67ddb82d623 .
2021-08-24 20:49:24,341 INFO  org.apache.flink.runtime.blob.PermanentBlobCache              - Created BLOB cache storage directory /tmp/blobStore-a7f62aeb-4ae4-49e6-9d98-700da67e6abe
2021-08-24 20:49:24,344 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-85058363-5041-42b0-87d4-30a33f7bce70
2021-08-24 20:49:24,345 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Starting TaskManager with ResourceID: 9ec096409864e7f929ccf67ddb82d623
2021-08-24 20:49:24,462 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary file directory '/tmp': total 19 GB, usable 16 GB (84.21% usable)
2021-08-24 20:49:24,465 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl       - FileChannelManager uses directory /tmp/flink-io-53a83a0b-8397-4e38-8a52-ccacec254556 for spill files.
2021-08-24 20:49:24,473 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig [server address: /10.10.1.126, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 8 (manual), number of client threads: 8 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-08-24 20:49:24,475 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl       - FileChannelManager uses directory /tmp/flink-netty-shuffle-0d1966a5-0a61-406a-9522-699f9e7fd945 for spill files.
2021-08-24 20:49:24,592 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768).
2021-08-24 20:49:24,598 INFO  org.apache.flink.runtime.io.network.NettyShuffleEnvironment   - Starting the network environment and its components.
2021-08-24 20:49:24,638 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 39 ms).
2021-08-24 20:49:24,684 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 45 ms). Listening on SocketAddress /10.10.1.126:41177.
2021-08-24 20:49:24,686 INFO  org.apache.flink.runtime.taskexecutor.KvStateService          - Starting the kvState service and its components.
2021-08-24 20:49:24,686 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting managed memory to 0.7 of the currently free heap space (640 MB), memory will be allocated lazily.
2021-08-24 20:49:24,695 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have a max timeout of 10000 ms
2021-08-24 20:49:24,700 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .
2021-08-24 20:49:24,712 INFO  org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job leader service.
2021-08-24 20:49:24,713 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-fdb6f8ed-2afa-41b9-9033-aa71fb320f1e
2021-08-24 20:49:24,715 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123/user/resourcemanager(00000000000000000000000000000000).
2021-08-24 20:49:24,785 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123
2021-08-24 20:49:24,785 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123]] Caused by: [java.net.ConnectException: Connection refused: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123]
2021-08-24 20:49:24,789 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123/user/resourcemanager..
2021-08-24 20:49:34,815 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123
2021-08-24 20:49:34,816 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123]] Caused by: [java.net.ConnectException: Connection refused: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123]
2021-08-24 20:49:34,816 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2:6123/user/resourcemanager..
2021-08-24 20:49:44,834 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: wordcount-operator-example-7f35b763-tm-8566b86844-l6dk2/10.10.1.126:6123

Deployment yaml file:

apiVersion: flink.k8s.io/v1beta1
kind: FlinkApplication
metadata:
  name: wordcount-operator-example
  namespace: flink-operator
  annotations:
  labels:
    environment: development
spec:
  image: docker.io/lyft/wordcount-operator-example:e9054a414590b267178eefbe9331bd611863ff15
  deleteMode: None
  flinkConfig:
    taskmanager.heap.size: 200
    taskmanager.network.memory.fraction: 0.1
    taskmanager.network.memory.min: 10m
    state.backend.fs.checkpointdir: file:///checkpoints/flink/checkpoints
    state.checkpoints.dir: file:///checkpoints/flink/externalized-checkpoints
    state.savepoints.dir: file:///checkpoints/flink/savepoints
    web.upload.dir: /opt/flink
  jobManagerConfig:
    resources:
      requests:
        memory: "200Mi"
        cpu: "1"
    replicas: 1
  `taskManagerConfig:`
    taskSlots: 2
    resources:
      requests:
        memory: "200Mi"
        cpu: "0.5"
  flinkVersion: "1.8"
  jarName: "wordcount-operator-example-1.0.0-SNAPSHOT.jar"
  parallelism: 3
  entryClass: "org.apache.flink.WordCount"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant