DataX
DataX copied to clipboard
DataX是阿里云DataWorks数据集成的开源版本。涛思数据基于DataX,开发了TDengine的Writer和Reader插件,为用户提供ETL和数据迁移的工具。
postgres测试数据  配置文件  错误信息: 
2024-01-23 13:40:27.865 [0-0-0-writer] ERROR StdoutPluginCollector - 脏数据: {"exception":"TDengine ERROR (0x80003002): Invalid data format","record":[{"byteSize":8,"index":0,"rawData":1705770149000,"type":"DATE"},{"byteSize":1,"index":1,"rawData":2,"type":"LONG"},{"byteSize":10,"index":2,"rawData":1705770318,"type":"LONG"},{"byteSize":3,"index":3,"rawData":169,"type":"LONG"},{"byteSize":9,"index":4,"rawData":"4.7264E-4","type":"DOUBLE"},{"byteSize":10,"index":5,"rawData":"0.00571764","type":"DOUBLE"},{"byteSize":23,"index":6,"rawData":"id_15","type":"STRING"},{"byteSize":5,"index":7,"rawData":"15","type":"STRING"}],"type":"writer"} 行数据:[balanced_state,tname=id_15,device_id=15 battery_state=3,end_time=1705770742,duration=424,energy=6.4E-7f64,capacity=6.4E-7f64 1705770318000] 2024-01-23 13:40:27.865 [0-0-0-writer] ERROR DefaultDataHandler - TDengine ERROR (0x80003002): Invalid data format 读取没有问题,但是写入的时候会报数据格式错误,是因为已经在3.2.1.0的库中创建了超级表,在进行无模式插入时,例如battery_state=3表示的是double类型,实际应该插入battery_state=3u8的形式,最终导致格式错误 有什么解决方案吗?应该是Datax将数据转为Long类型,但是转不回老版本的taos数据库类型?
Describe the bug 描述你遇到的问题 通过datax把mongodb中多个集合数据导入到tdengine中的同一个超级表时报OOM,可复现的现象是导入第一个mongodb集合成功,第二个集合开始就失败报OOM 使用的数据库和datax版本 Mongodb:4.0.3 Tdengine:3.0.5.0 Datax:mongodbreader,tdengine30writer To Reproduce 如何重现问题 1:tdengine新建数据库 2:mongo中有待迁移集合N个,每个集合上亿条数据 3:当tdengine待迁移的超级表中无数据时,迁移任意一个mongo集合到tdengine中都可以成功 4:当待迁移tdengine库的超级表中已有上亿条数据后,再通过datax迁移mongodb任意一个集合(包括之前迁移成功的集合)数据时datax发生OOM 问题排查过程 1:调大datax内存到6G,一样发生OOM 2:排除datax配置问题 3:datax发生OOM期间tdengine数据库cpu高启,源端mongodb无导出流量显示,判断为在mongodb数据导出前datax发生的OOM 系统监控截图  追踪hprof文件后,定位到datax问题源码的截图:  直接在tdengine中执行sql,复现了一样的问题,判断是datax把所有tagid给加载到了datax的内存中,导致OOM  Expected behavior 期待修复的效果 不是很确定为什么datax需要执行下面的代码,感觉意义不大,是否可以屏蔽掉或者只抓取每个子表tagid就行了,不需要加载具体明细tagid数据
[INFO] --------------------------------[ jar ]--------------------------------- Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/maven-metadata.xml Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/tdenginewriter-0.0.1-SNAPSHOT.pom [WARNING] The POM for com.alibaba.datax.tdenginewriter:tdenginewriter:jar:0.0.1-SNAPSHOT is missing, no dependency information available Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/tdenginewriter-0.0.1-SNAPSHOT.jar [INFO] [INFO] ------------------------------------------------------------------------...
https://github.com/taosdata/DataX/blob/4c498354a166ff55e60e5da65f51e3a5cb6b449c/tdengine30writer/src/main/java/com/alibaba/datax/plugin/writer/tdengine30writer/SchemaManager.java#L115 https://github.com/taosdata/DataX/blob/4c498354a166ff55e60e5da65f51e3a5cb6b449c/tdengine30writer/src/main/java/com/alibaba/datax/plugin/writer/tdengine30writer/Schema3_0Manager.java#L146 column name shuld Enclose with back quotes ,to solve the uppercase column name .
java.lang.UnsatisfiedLinkError: Native Library C:\Windows\System32\taos.dll already loaded in another classloader at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1900) ~[na:1.8.0_261] at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1850) ~[na:1.8.0_261] at java.lang.Runtime.loadLibrary0(Runtime.java:871) ~[na:1.8.0_261] at java.lang.System.loadLibrary(System.java:1122) ~[na:1.8.0_261] at com.taosdata.jdbc.TSDBJNIConnector.(TSDBJNIConnector.java:28) ~[taos-jdbcdriver-2.0.42.jar:na] at com.taosdata.jdbc.TSDBDriver.connect(TSDBDriver.java:162) ~[taos-jdbcdriver-2.0.42.jar:na] at java.sql.DriverManager.getConnection(DriverManager.java:664) ~[na:1.8.0_261]...
### 对应错误 2023-06-08 17:45:35.245 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s |...
DataX TDengine,可以多个超级表一起操作吗,column可以不写吗【会有字段重复的问题】,或者可以直接数据库同步吗?
com.alibaba.datax.common.exception.DataXException: Code:[TDengineWriter-02], Description:[runtime exception]. - cannot find col: ts in columns: [ts, i_a, i_b, i_c, i_sum, elc, u_a, u_b, u_c, power, corp_id, equipid, line_id] at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.indexOf(DefaultDataHandler.java:552) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]...
 Exception in thread "main" java.lang.NoSuchMethodError: com.alibaba.fastjson.JSONArray.getTimestamp(I)Ljava/lang/Object; at com.taosdata.jdbc.rs.RestfulResultSet.parseTimestampColumnData(RestfulResultSet.java:255) at com.taosdata.jdbc.rs.RestfulResultSet.parseColumnData(RestfulResultSet.java:183) at com.taosdata.jdbc.rs.RestfulResultSet.(RestfulResultSet.java:98) at com.taosdata.jdbc.rs.RestfulStatement.execute(RestfulStatement.java:88) at com.taosdata.jdbc.rs.RestfulStatement.executeQuery(RestfulStatement.java:37) at Test.main(Test.java:16)