Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization][CDCSOURCE] Optimized CDCSOURCE from Mysql to Doris, with support for light_schema_change #3146

Closed
3 tasks done
kindbgen opened this issue Feb 6, 2024 · 1 comment · Fixed by #3151
Closed
3 tasks done
Assignees
Labels
Optimization Optimization function
Milestone

Comments

@kindbgen
Copy link
Contributor

kindbgen commented Feb 6, 2024

Search before asking

  • I had searched in the issues and found no similar optimization requirement.

Description

MySQL 通过 CDCSSOURCE 整库到 Doris 字段模式演变优化,主要优化如下:

1、优化 'sink.connector' = 'datastream-doris-schema-evolution',修复报错:org.dinky.data.exception.MetaDataException: Missing DataSource Type:【datastream-doris-schema-evolution】

2、Doris 建表时默认开启 light_schema_change=true, 支持 MySQL、Oracle、SQLServer 和 PostgreSQL 整库表结构到 Doris 的 schema change

3、优化 Doris schema change, 支持表列名、列类型、默认值、注释同步,支持多列修改,支持列名重命名,需要通过配置use-new-schema-change来启用

4、修复由于 Dinky 设置 labelPrefix 过长导致 DorisWriter 写入 Doris 失败问题

5、通过自定义 MySQL、Oracle、SQLServer 和 PostgreSQL 四种 debezium 转换器,优化整库到 Doris datetime 类型相差8小时问题,并支持 datetime format 精确到毫秒

优化后的 FlinkSQL 如下:
EXECUTE CDCSOURCE demo_doris_schema_evolution WITH (
'connector' = 'mysql-cdc',
'hostname' = '127.0.0.1',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'checkpoint' = '10000',
'scan.startup.mode' = 'initial',
'parallelism' = '1',
'debezium.skipped.operations'='d',
'jdbc.properties.tinyInt1isBit'= 'false',
'source.server-time-zone' = 'Asia/Shanghai',
'source.schema.changes' = 'true',
'table-name' = 'test..*',
'sink.connector' = 'datastream-doris-schema-evolution',
'sink.url' = 'jdbc:mysql://127.0.0.1:9030',
'sink.fenodes' = '127.0.0.1:8030',
'sink.username' = 'root',
'sink.password' = '123456',
'sink.doris.batch.size' = '1000',
'sink.sink.max-retries' = '1',
'sink.sink.batch.interval' = '60000',
'sink.sink.db' = 'test',
'sink.table.identifier' = '#{schemaName}.#{tableName}',
'sink.auto.create' = 'true',
'sink.timezone' = 'Asia/Shanghai',
-- 支持表列名、列类型、默认值、注释同步,支持多列修改,支持列名重命名
'sink.sink.use-new-schema-change' = 'true',
-- 解决 flink cdc 时区问题及 datetime 无法解析问题
'debezium.converters' = 'datetime',
'debezium.datetime.type' = 'org.dinky.cdc.debezium.converter.MysqlDebeziumConverter',
'debezium.datetime.database.type' = 'mysql',
'debezium.datetime.format.date' = 'yyyy-MM-dd',
'debezium.datetime.format.time' = 'HH:mm:ss',
'debezium.datetime.format.datetime' = 'yyyy-MM-dd HH:mm:ss.SSS',
'debezium.datetime.format.timestamp' = 'yyyy-MM-dd HH:mm:ss.SSS',
'debezium.datetime.format.timestamp.zone' = 'Asia/Shanghai'
);

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kindbgen kindbgen added Optimization Optimization function Waiting for reply Waiting for reply labels Feb 6, 2024
Copy link

github-actions bot commented Feb 6, 2024

Hello @kindbgen, this issue is about CDC/CDCSOURCE, so I assign it to @aiwenmo. If you have any questions, you can comment and reply.

你好 @kindbgen, 这个 issue 是关于 CDC/CDCSOURCE 的,所以我把它分配给了 @aiwenmo。如有任何问题,可以评论回复。

@kindbgen kindbgen changed the title [Optimization][CDCSOURCE] Optimized CDCSOURCE from Mysql to Doris, with support for light_schema_change[Optimization][Module Name] Optimization title [Optimization][CDCSOURCE] Optimized CDCSOURCE from Mysql to Doris, with support for light_schema_change Feb 6, 2024
@aiwenmo aiwenmo removed the Waiting for reply Waiting for reply label Feb 6, 2024
@aiwenmo aiwenmo added this to the 1.0.0 milestone Feb 6, 2024
@aiwenmo aiwenmo moved this to Doing in Dinky Roadmap Feb 6, 2024
@Zzm0809 Zzm0809 moved this from Doing to Done in Dinky Roadmap Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optimization Optimization function
Projects
Status: Done
2 participants