CodeGen
CodeGen copied to clipboard
UncompletedJobError: Job 195 (task: 0) with path /home/sushantk/anaconda3/codeGen/data/test_dataset/log/195_0_result.pkl has not produced any output (state: FINISHED) No output/error stream produced ! Check: /home/sushantk/anaconda3/codeGen/data/test_dataset/log/195_0_log.out
I am trying to run the preprocessing.py file and getting this unknow error. Can you tell me how to resolve this.
run codegen_sources/preprocessing/preprocess.py data/test_dataset --mode obfuscation --langs python --mode obfuscation --train_splits 70 --job_mem 250 --tokenization_timeout 400 --bpe_timeout 220 --train_bpe_timeout 400 --bpe_mode fast --fastbpe_use_vocab False --fastbpe_vocab_path CodeGen/data/test_dataset --keep_comments False --fastbpe_code_path CodeGen/codegen_sources/model/tools/fastBPE --ncodes 40000 --percent_test_valid 20 `
`---------------------------------------------------------------------------
UncompletedJobError Traceback (most recent call last)
~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in <module>()
212 args.input_path = os.path.abspath(args.input_path)
213 multiprocessing.set_start_method("fork")
--> 214 preprocess(args)
~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in preprocess(args)
92 )
93 dataset.extract_data_and_tokenize(
---> 94 executor=cluster_tokenization, local_parallelism=args.local_parallelism
95 )
96
~/anaconda3/codeGen/codegen_sources/preprocessing/dataset_modes/dataset_mode.py in extract_data_and_tokenize(self, executor, local_parallelism)
178
179 for job in jobs:
--> 180 job.result()
181
182 def extract_from_json_and_tokenize(
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in result(self)
264
265 def result(self) -> R:
--> 266 r = self.results()
267 assert not self._sub_jobs, "You should use `results()` if your job has subtasks."
268 return r[0]
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in results(self)
287 return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs]
288
--> 289 outcome, result = self._get_outcome_and_result()
290 if outcome == "error":
291 job_exception = self.exception()
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in _get_outcome_and_result(self)
382 else:
383 message.append(f"No output/error stream produced ! Check: {self.paths.stdout}")
--> 384 raise utils.UncompletedJobError("\n".join(message))
385 try:
386 output: tp.Tuple[str, tp.Any] = utils.pickle_load(self.paths.result_pickle)
UncompletedJobError: Job 195 (task: 0) with path /home/sushantk/anaconda3/codeGen/data/test_dataset/log/195_0_result.pkl
has not produced any output (state: FINISHED)
No output/error stream produced ! Check: /home/sushantk/anaconda3/codeGen/data/test_dataset/log/195_0_log.out`
```
Hi,
Are the logs (.out and .err) files empty? Are some .tok files generated or does it fail directly?
You may want to try using a local parallelism (--local_parallelism 4) in case it is failing due to memory issues.
By the way, it is not causing this issue but your path to the BPE codes is wrong. You should pass --fastbpe_code_path data/bpe/cpp-java-python/codes --fastbpe_vocab_path data/bpe/cpp-java-python/vocab