iotdb icon indicating copy to clipboard operation
iotdb copied to clipboard

[Bug] Synced data stored as unsequenced

Open pedropereira98 opened this issue 2 years ago • 3 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Version

Both machines running Ubuntu 20.04 and docker image of IoTDB version 1.2.2

Describe the bug and provide the minimal reproduce step

  1. Start two nodes in standalone configuration
  2. Set pipe to cloud node
  3. Perform insertions at the edge node

What did you expect to see?

All data inserted to be stored in the same manner on both nodes.

What did you see instead?

Although on the edge node data is stored in sequence TsFiles, the cloud node treats synced data as delayed and the data is stored in unsequence TsFiles.

On the cloud node running du -h . on the data folder shows most data usage in unsequence files

4.0K	./datanode/data/sequence
6.9G	./datanode/data/unsequence/root.gps/2/2808
6.9G	./datanode/data/unsequence/root.gps/2
6.9G	./datanode/data/unsequence/root.gps/1/2808
6.9G	./datanode/data/unsequence/root.gps/1
6.7G	./datanode/data/unsequence/root.gps/3/2808
6.7G	./datanode/data/unsequence/root.gps/3
6.4G	./datanode/data/unsequence/root.gps/4/2808
6.4G	./datanode/data/unsequence/root.gps/4
4.7G	./datanode/data/unsequence/root.gps/5/2808
4.7G	./datanode/data/unsequence/root.gps/5
32G	./datanode/data/unsequence/root.gps
32G	./datanode/data/unsequence
32G	./datanode/data

While on the edge nodes running du -h . on the data folder shows most data usage in sequence files

8.6G	./datanode/data/sequence/root.gps/2/2808
8.6G	./datanode/data/sequence/root.gps/2
6.9G	./datanode/data/sequence/root.gps/4/2808
6.9G	./datanode/data/sequence/root.gps/4
7.6G	./datanode/data/sequence/root.gps/1/2808
7.6G	./datanode/data/sequence/root.gps/1
7.1G	./datanode/data/sequence/root.gps/3/2808
7.1G	./datanode/data/sequence/root.gps/3
30G	./datanode/data/sequence/root.gps
30G	./datanode/data/sequence
26M	./datanode/data/unsequence/root.gps/2/2808
26M	./datanode/data/unsequence/root.gps/2
17M	./datanode/data/unsequence/root.gps/4/2808
17M	./datanode/data/unsequence/root.gps/4
26M	./datanode/data/unsequence/root.gps/1/2808
26M	./datanode/data/unsequence/root.gps/1
26M	./datanode/data/unsequence/root.gps/3/2808
26M	./datanode/data/unsequence/root.gps/3
94M	./datanode/data/unsequence/root.gps
94M	./datanode/data/unsequence
34G	./datanode/data
34G	./datanode

Tests performing query operations seem to indicate that this negatively impacts query performance as unsequence files negatively impact query performance. These tests were run using IoTDB version 1.1.1, but since data is still stored as unsequenced in 1.2.2, this behaviour should still be present.

Anything else?

Edge node is running in a Docker container limited to 4 CPU cores, 4GB of RAM, 44MB/s disk reads, 40MB/s disk writes, 2700 read IOps, 1200 write IOps Cloud node is running in a Docker container with no limitations. Host machine has a 6 core CPU and 16GB of RAM

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

pedropereira98 avatar Oct 31 '23 13:10 pedropereira98

you could give us more details, for example,the pipe is 1.1.1 to 1.2.2? you mean pipe plugin or tsfile sync? if you use pipe plugin, it is possible to sync tsfile into unsequence files. unsequence files does not mean unsequence data, unsequence data does slow down some thread.

wanghui42 avatar Nov 01 '23 10:11 wanghui42

you could give us more details, for example,the pipe is 1.1.1 to 1.2.2? you mean pipe plugin or tsfile sync? if you use pipe plugin, it is possible to sync tsfile into unsequence files. unsequence files does not mean unsequence data, unsequence data does slow down some thread.

Apologies for the delay. This behaviour has been observed in scenarios where all nodes are running the same IoTDB version, both for version 1.1.1 and 1.2.2. The replication is performed with the Data Sync mechanism, using pipes, as described in the documentation (https://iotdb.apache.org/UserGuide/V1.2.x/User-Manual/Data-Sync.html). Tests show 5% to 15% increases in latency for multiple types of queries when comparing between data replicated from one or multiple nodes using the Pipe mechanism with data inserted directly into that node.

pedropereira98 avatar Nov 08 '23 18:11 pedropereira98

you could give us more details, for example,the pipe is 1.1.1 to 1.2.2? you mean pipe plugin or tsfile sync? if you use pipe plugin, it is possible to sync tsfile into unsequence files. unsequence files does not mean unsequence data, unsequence data does slow down some thread.

Apologies for the delay. This behaviour has been observed in scenarios where all nodes are running the same IoTDB version, both for version 1.1.1 and 1.2.2. The replication is performed with the Data Sync mechanism, using pipes, as described in the documentation (https://iotdb.apache.org/UserGuide/V1.2.x/User-Manual/Data-Sync.html). Tests show 5% to 15% increases in latency for multiple types of queries when comparing between data replicated from one or multiple nodes using the Pipe mechanism with data inserted directly into that node.

yes, It is true that this phenomenon, we also noticed, the current release version is this problem(see https://github.com/apache/iotdb/pull/11414), we have fixed it, you can download the latest master package or wait for the next release!

wanghui42 avatar Nov 29 '23 06:11 wanghui42