COMET icon indicating copy to clipboard operation
COMET copied to clipboard

Problems with MULTI-GPU scoring

Open c-soler-u opened this issue 3 years ago • 1 comments

🐛 Bug

Hi, there! I found that there were some problems with MULTI-GPU usage with COMET. When using 4 GPUs the segments seem to be getting unordered, making it impossible to retrieve the scores in the original order.

To Reproduce

I was running the following command:

comet-score -s source.txt -t target.txt –model wmt20-comet-qe-da --gpus 4

where ‘source.txt’ is a .txt file containing the source segments separated with the ‘\n’ character, and ‘target.txt’ a file containing the target segments to be scored, separated in the same way. Since I did not have a reference, I used the QE model.

I ran the scorer on a sample of the source.txt in a CPU, in 1 GPU and in 4 GPUs. The results for CPU and 1 GPU were exactly the same, but the results when using 4 GPUs showed that the scores were unordered. (See below)

(Experiment on 1 GPU) tgt.en Segment 0 score: 0.0000 tgt.en Segment 1 score: 0.0000 tgt.en Segment 2 score: 0.3730 tgt.en Segment 3 score: 0.0002 tgt.en Segment 4 score: 0.2696 tgt.en Segment 5 score: 0.2466 tgt.en Segment 6 score: 0.2901 tgt.en Segment 7 score: 0.2161 tgt.en Segment 8 score: 0.2324 tgt.en Segment 9 score: 0.2536 tgt.en score: 0.1882

(Experiment on 1 GPU) tgt.en Segment 0 score: 0.0000 tgt.en Segment 1 score: 0.0000 tgt.en Segment 2 score: 0.3730 tgt.en Segment 3 score: 0.0002 tgt.en Segment 4 score: 0.2696 tgt.en Segment 5 score: 0.2466 tgt.en Segment 6 score: 0.2901 tgt.en Segment 7 score: 0.2161 tgt.en Segment 8 score: 0.2324 tgt.en Segment 9 score: 0.2536 tgt.en score: 0.1882

(Experiment on 4 GPUs) tgt.en Segment 0 score: 0.0000 tgt.en Segment 1 score: 0.2695 tgt.en Segment 2 score: 0.2323 tgt.en Segment 3 score: 0.0000 tgt.en Segment 4 score: 0.2469 tgt.en Segment 5 score: 0.2536 tgt.en Segment 6 score: 0.3732 tgt.en Segment 7 score: 0.2899 tgt.en Segment 8 score: 0.0002 tgt.en Segment 9 score: 0.2165 tgt.en score: 0.1882

Environment

Experiments were carried out on an AWS EC2 g4dn.xlarge instance and using COMET's 1.0.1 version.

Thank you!

c-soler-u avatar Mar 03 '22 09:03 c-soler-u

Just to add on this: System-level scores are correct, the problem is that we are currently losing the order of the segment-level scores.

This should not affect system comparisons but it will affect possible segment-by-segment analysis!

ricardorei avatar Mar 06 '22 17:03 ricardorei

I refactored the multiGPU inference.

This issue is the same as #101.

Fix will be merged in next release.

ricardorei avatar Dec 19 '22 13:12 ricardorei