control-flag icon indicating copy to clipboard operation
control-flag copied to clipboard

Segmentation fault while scan_for_anomalies.sh

Open qoega opened this issue 4 years ago • 7 comments

Tried to check ClickHouse codebase, but it crashed. You can get ClickHouse codebase just from GitHub:

git clone [email protected]:ClickHouse/ClickHouse.git clickhouse
scripts/scan_for_anomalies.sh -d /home/qoega/clickhouse/src -t ./c_lang_if_stmts_6000_gitrepos.ts -o /home/qoega/control-flag/out/
Training: start.
Trie L1 build took: 1010.554s
Trie L2 build took: 487.217s
Training: complete.
Storing logs in /home/qoega/control-flag/out/
scripts/scan_for_anomalies.sh: line 84: 72697 Segmentation fault      ${SCRIPTS_DIR}/../bin/cf_file_scanner -t ${TRAIN_FILE} -s ${SCAN_FILE_LIST} -c ${MAX_AUTOCORRECT_COST} -n ${MAX_AUTOCORRECT_RESULTS} -j ${NUM_SCAN_THREADS} -o ${OUTPUT_DIR} -a ${ANOMALY_THRESHOLD} -l ${LANGUAGE}

PS: c_lang_if_stmts_6000_gitrepos.ts was trained on C projects only or C++ as well? Did not find https://github.com/ClickHouse/ClickHouse in C++ projects list. It is written in C++ and has 20K stars/800 contributors.

qoega avatar Oct 25 '21 13:10 qoega

hi @qoega,

Thanks for trying out ControlFlag. c_lang_if_stmts_6000_gitrepos.ts is the dataset generated using repositories using C as a primary language. It should work for scanning projects using C++ language also. Although, it is more effective for scanning projects using C as their primary language.

I will try to reproduce the crash on my end. Just wanted to let you know that we have also released smaller training datasets for limited-memory devices (Although, memory capacity does not appear to be the issue behind this crash.)

nhasabni avatar Oct 25 '21 18:10 nhasabni

I also encounter this bug. What is the current status regarding this one?

Thank you

xback avatar Nov 23 '21 21:11 xback

Hi @xback,

Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

nhasabni avatar Nov 24 '21 17:11 nhasabni

Hi @xback,

Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

Hi, The test ran on a system with 1TB of RAM (really) of which >900GB was free.

xback avatar Nov 24 '21 17:11 xback

Hi @xback, Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

Hi, The test ran on a system with 1TB of RAM (really) of which >900GB was free.

Thanks for info, @xback. Let us look into reproducing the issue. Would you mind pointing us the repository that you have been scanning using ControlFlag (if it is a public repository)? That can help us expedite the process. Thanks.

nhasabni avatar Nov 24 '21 17:11 nhasabni

Would you mind pointing us the repository that you have been scanning using ControlFlag (if it is a public repository)?

Unfortunately, the repo is not public but I'll try to provide more details or a reproducer

xback avatar Nov 24 '21 18:11 xback

Hi @xback, Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

Hi, The test ran on a system with 1TB of RAM (really) of which >900GB was free.

Thanks for info, @xback. Let us look into reproducing the issue. Would you mind pointing us the repository that you have been scanning using ControlFlag (if it is a public repository)? That can help us expedite the process. Thanks.

Hi @xback, we scanned ClickHouse code using large version of the dataset, and the scan finished without any issues. In short, we do not see crash on our end. Please provide us a reproducer as per your convenience. Thanks.

nhasabni avatar Nov 24 '21 20:11 nhasabni