Support ir extraction in decompression script
Description
Validation performed
Launched the package, verified that compression, decompression, ir extraction and search still work from command line
A high level concern is that what would be good terms to distinguish, (1) a general job name launched by decompress.py script" (2) a job that decompress a file, and (3) a job that extracts an IR.
I am currently using "x" as the decompression command following CLP and CLO, but internally still use decompression as job type, which is inconsistent.
Should we call both job as decompression job, and internally distinguish them as "extraction" and "ir_extraction" command?
How do we specify target-uncompressed-size?
I tried ./decompress.sh i --target-uncompressed-size 10240 --orig-file-id daf326b3-ab77-42ec-9fcf-056b541f948a 0 and also manually inserted a msgpack record of
{
"orig_file_id": "f6fa2faf-d686-4d54-b086-b68c3da04405",
"msg_ix": 1,
"target_uncompressed_size": 10240
}
which both yielded
[2024-07-11 06:28:05,643: INFO/ForkPoolWorker-7] job_orchestration.executor.query.extract_ir_task.extract_ir[78f14806-52e2-4e87-81d8-3f00fc2ac0d2]: Started IR extraction task for job 3
[2024-07-11 06:28:05,648: ERROR/ForkPoolWorker-7] Task job_orchestration.executor.query.extract_ir_task.extract_ir[78f14806-52e2-4e87-81d8-3f00fc2ac0d2] raised unexpected: TypeError('sequence item 8: expected str instance, int found')
Traceback (most recent call last):
File "/opt/clp/lib/python3/site-packages/celery/app/trace.py", line 453, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/clp/lib/python3/site-packages/celery/app/trace.py", line 736, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/clp/lib/python3/site-packages/job_orchestration/executor/query/extract_ir_task.py", line 106, in extract_ir
return run_query_task(
File "/opt/clp/lib/python3/site-packages/job_orchestration/executor/query/utils.py", line 62, in run_query_task
logger.info(f'Running: {" ".join(task_command)}')
TypeError: sequence item 8: expected str instance, int found
How do we specify
target-uncompressed-size? I tried./decompress.sh i --target-uncompressed-size 10240 --orig-file-id daf326b3-ab77-42ec-9fcf-056b541f948a 0and also manually inserted a msgpack record of{ "orig_file_id": "f6fa2faf-d686-4d54-b086-b68c3da04405", "msg_ix": 1, "target_uncompressed_size": 10240 }which both yielded
[2024-07-11 06:28:05,643: INFO/ForkPoolWorker-7] job_orchestration.executor.query.extract_ir_task.extract_ir[78f14806-52e2-4e87-81d8-3f00fc2ac0d2]: Started IR extraction task for job 3 [2024-07-11 06:28:05,648: ERROR/ForkPoolWorker-7] Task job_orchestration.executor.query.extract_ir_task.extract_ir[78f14806-52e2-4e87-81d8-3f00fc2ac0d2] raised unexpected: TypeError('sequence item 8: expected str instance, int found') Traceback (most recent call last): File "/opt/clp/lib/python3/site-packages/celery/app/trace.py", line 453, in trace_task R = retval = fun(*args, **kwargs) File "/opt/clp/lib/python3/site-packages/celery/app/trace.py", line 736, in __protected_call__ return self.run(*args, **kwargs) File "/opt/clp/lib/python3/site-packages/job_orchestration/executor/query/extract_ir_task.py", line 106, in extract_ir return run_query_task( File "/opt/clp/lib/python3/site-packages/job_orchestration/executor/query/utils.py", line 62, in run_query_task logger.info(f'Running: {" ".join(task_command)}') TypeError: sequence item 8: expected str instance, int found
Sorry, it looks like I forgot to do a type conversion.
I included it in https://github.com/y-scope/clp/pull/472/files#diff-c3c708ca5b9cee2be7634fcb5966f6fc622b85ec0ad8575491540e676414cbbaL48. if you need this change urgently, I can also make a separate PR for it,