ChunJun

Introduce

ChunJun(formerly known as FlinkX), is a data integration tool based on Flink, which is stable, easy to use, efficient, and integrated with DataStream/DataSet API. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far.

Official website of ChunJun: https://dtstack.github.io/chunjun/

Features of ChunJun

ChunJun abstracts different databases into reader/source plugins, writer/sink plugins and lookup plugins, and it has the following features:

Based on the real-time computing engine--Flink, and supports JSON template and SQL script configuration tasks. The SQL script is compatible with Flink SQL syntax;
Support distributed operation, support flink-standalone, yarn-session, yarn-per job and other submission methods;
Support Docker one-click deployment, support deploy and run on k8s;
Supports a variety of heterogeneous data sources, and supports synchronization and calculation of more than 20 data sources such as MySQL, Oracle, SQLServer, Hive, Kudu, etc.
Easy to expand, highly flexible, newly expanded data source plugins can integrate with existing data source plugins instantly, plugin developers do not need to care about the code logic of other plugins;
Not only supports full synchronization, but also supports incremental synchronization and interval training;
Not only supports offline synchronization and calculation, but also compatible with real-time scenarios;
Support dirty data storage, and provide indicator monitoring, etc.;
Cooperate with the flink checkpoint mechanism to achieve breakpoint resuming, task disaster recovery;
Not only supports synchronizing DML data, but also supports DDL synchronization, like 'CREATE TABLE', 'ALTER COLUMN', etc.;

Build And Compilation

Get the code

Use the git to clone the code of ChunJun

git clone https://github.com/DTStack/chunjun.git

build

Execute the command in the project directory.

./mvnw clean package -DskipTests

Or execute

sh build/build.sh

Multi-platform compatible

Chunjun currently supports tdh and open-source hadoop platforms, and different platforms need to be packaged with different maven commands.

Hadoop Platformas		Comment
tdh	mvn clean package -DskipTests -P default,tdh	Package the inceport plugin and plugins supported by default
default	mvn clean package -DskipTests -P default	Package the all plugins except the inceptor plugin.

Common problem

1.Can not find dependencies

Solution: There are some driver packages in the directory '$ChunJun_HOME/jars', and you can install these dependencies manually or execute the command below:

## windows
./$CHUNJUN_HOME/bin/install_jars.bat

## unix
./$CHUNJUN_HOME/bin/install_jars.sh

2. Compiling module 'ChunJun-core' then throws 'Failed to read artifact descriptor for com.google.errorprone:javac-shaded'

Error message：

[ERROR]Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check(spotless-check)on project flinkx-core:
        Execution spotless-check of goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check failed:Unable to resolve dependencies:
        Failed to collect dependencies at com.google.googlejavaformat:google-java-format:jar:1.7->com.google.errorprone:javac-shaded:jar:9+181-r4173-1:
        Failed to read artifact descriptor for com.google.errorprone:javac-shaded:jar:9+181-r4173-1:Could not transfer artifact
        com.google.errorprone:javac-shaded:pom:9+181-r4173-1 from/to aliyunmaven(https://maven.aliyun.com/repository/public): 
        Access denied to:https://maven.aliyun.com/repository/public/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.pom -> [Help 1]

Solution： Download the 'javac-shaded-9+181-r4173-1.jar' from url 'https://repo1.maven.org/maven2/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.jar', and then install locally by using command below:

mvn install:install-file -DgroupId=com.google.errorprone -DartifactId=javac-shaded -Dversion=9+181-r4173-1 -Dpackaging=jar -Dfile=./jars/javac-shaded-9+181-r4173-1.jar

Quick Start

The following table shows the correspondence between the branches of ChunJun and the version of flink. If the versions are not aligned, problems such as 'Serialization Exceptions', 'NoSuchMethod Exception', etc. mysql occur in tasks.

Branches	Flink version
master	1.12.7
1.12_release	1.12.7
1.10_release	1.10.1
1.8_release	1.8.3

ChunJun supports running tasks in multiple modes. Different modes depend on different environments and steps. The following are

Local

Local mode does not depend on the Flink environment and Hadoop environment, and starts a JVM process in the local environment to perform tasks.

Steps

Go to the directory of 'chunjun-dist' and execute the command below:

sh bin/chunjun-local.sh  -job $SCRIPT_PATH

The parameter of "$SCRIPT_PATH" means 'the path where the task script is located'. After execute, you can perform a task locally.

Reference video

Standalone

Standalone mode depend on the Flink Standalone environment and does not depend on the Hadoop environment.

Steps

1. Start Flink Standalone Cluster

sh $FLINK_HOME/bin/start-cluster.sh

After the startup is successful, the default port of Flink Web is 8081, which you can configure in the file of 'flink-conf.yaml'. We can access the 8081 port of the current machine to enter the flink web of standalone cluster.

2. Submit task

Go to the directory of 'chunjun-dist' and execute the command below:

sh bin/chunjun-standalone.sh -job chunjun-examples/json/stream/stream.json

After the command execute successfully, you can observe the task staus on the flink web.

Reference video

Yarn Session

YarnSession mode depends on the Flink jars and Hadoop environments, and the yarn-session needs to be started before the task is submitted.

Steps

1. Start yarn-session environment

Yarn-session mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance, and we need to upload 'chunjun-dist' with yarn-session '-t' parameter.

cd $FLINK_HOME/bin
./yarn-session -t $CHUNJUN_HOME -d

2. Submit task

Get the application id $SESSION_APPLICATION_ID corresponding to the yarn-session through yarn web, then enter the directory 'chunjun-dist' and execute the command below:

sh ./bin/chunjun-yarn-session.sh -job chunjun-examples/json/stream/stream.json -confProp {\"yarn.application.id\":\"SESSION_APPLICATION_ID\"}

'yarn.application.id' can also be set in 'flink-conf.yaml'. After the submission is successful, the task status can be observed on the yarn web.

Reference video

Yarn Per-Job

Yarn Per-Job mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance.

Steps

The yarn per-job task can be submitted after the configuration is correct. Then enter the directory 'chunjun-dist' and execute the command below:

sh ./bin/chunjun-yarn-perjob.sh -job chunjun-examples/json/stream/stream.json

After the submission is successful, the task status can be observed on the yarn web.

Docs of Connectors

For details, please visit：https://dtstack.github.io/chunjun/documents/

Contributors

Thanks to all contributors! We are very happy that you can contribute Chunjun.

License

ChunJun is under the Apache 2.0 license. Please visit LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5,002 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
bin		bin
build		build
chunjun-assembly		chunjun-assembly
chunjun-clients		chunjun-clients
chunjun-connectors		chunjun-connectors
chunjun-core		chunjun-core
chunjun-dirty		chunjun-dirty
chunjun-docker		chunjun-docker
chunjun-examples		chunjun-examples
chunjun-formats		chunjun-formats
chunjun-local-test		chunjun-local-test
chunjun-metrics		chunjun-metrics
chunjun-restore		chunjun-restore
chunjun-sql		chunjun-sql
docs_zh		docs_zh
jars		jars
website		website
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_CH.md		README_CH.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChunJun

Introduce

Features of ChunJun

Build And Compilation

Get the code

build

Multi-platform compatible

Common problem

1.Can not find dependencies

2. Compiling module 'ChunJun-core' then throws 'Failed to read artifact descriptor for com.google.errorprone:javac-shaded'

Quick Start

Local

Steps

Standalone

Steps

1. Start Flink Standalone Cluster

2. Submit task

Yarn Session

Steps

1. Start yarn-session environment

2. Submit task

Yarn Per-Job

Steps

Docs of Connectors

Contributors

License

About

Releases

Packages

Languages

License

GenfaXu/chunjun

Folders and files

Latest commit

History

Repository files navigation

ChunJun

Introduce

Features of ChunJun

Build And Compilation

Get the code

build

Multi-platform compatible

Common problem

1.Can not find dependencies

2. Compiling module 'ChunJun-core' then throws 'Failed to read artifact descriptor for com.google.errorprone:javac-shaded'

Quick Start

Local

Steps

Standalone

Steps

1. Start Flink Standalone Cluster

2. Submit task

Yarn Session

Steps

1. Start yarn-session environment

2. Submit task

Yarn Per-Job

Steps

Docs of Connectors

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages