toil icon indicating copy to clipboard operation
toil copied to clipboard

toil stucks in recursive loop when s3 object key is a folder (ends in "/")

Open boyangzhao opened this issue 3 years ago • 2 comments

It seems that toil would get stuck in a recursive loop if from the AWS s3 file structure, there is a "folder" as one of the keys (which can happen if you create a folder on AWS, and then upload files into it). The error message is below,

Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/toil/cwl/cwltoil.py", line 3583, in main
builder = tool._init_job(initialized_job_order, runtime_context)
File "/usr/local/lib/python3.8/dist-packages/cwltool/process.py", line 808, in _init_job
get_listing(fs_access, job, recursive=(load_listing == "deep_listing"))
File "/usr/local/lib/python3.8/dist-packages/cwltool/utils.py", line 312, in get_listing
get_listing(fs_access, f, recursive=recursive)
File "/usr/local/lib/python3.8/dist-packages/cwltool/utils.py", line 328, in get_listing
get_listing(fs_access, ent, recursive)
File "/usr/local/lib/python3.8/dist-packages/cwltool/utils.py", line 328, in get_listing
get_listing(fs_access, ent, recursive)
File "/usr/local/lib/python3.8/dist-packages/cwltool/utils.py", line 328, in get_listing
get_listing(fs_access, ent, recursive)
[Previous line repeated 953 more times]
File "/usr/local/lib/python3.8/dist-packages/cwltool/utils.py", line 318, in get_listing
for ld in fs_access.listdir(loc):
File "/usr/local/lib/python3.8/dist-packages/toil/cwl/cwltoil.py", line 1287, in listdir
os.path.join(fn, entry.rstrip('/')) for entry in AbstractJobStore.list_url(fn)
File "/usr/local/lib/python3.8/dist-packages/toil/jobStores/abstractJobStore.py", line 544, in list_url
return otherCls._list_url(parseResult)
File "/usr/local/lib/python3.8/dist-packages/toil/jobStores/aws/jobStore.py", line 497, in _list_url
return list_objects_for_url(url)
File "/usr/local/lib/python3.8/dist-packages/toil/lib/retry.py", line 293, in call
return func(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/toil/lib/aws/utils.py", line 379, in list_objects_for_url
for page in result:
File "/usr/local/lib/python3.8/dist-packages/botocore/paginate.py", line 269, in iter
response = self._make_request(current_kwargs)
File "/usr/local/lib/python3.8/dist-packages/botocore/paginate.py", line 357, in _make_request
return self._method(current_kwargs)
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 898, in _make_api_call
http, parsed_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 921, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 199, in _send_request
success_response, exception = self._get_response(
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 241, in _get_response
success_response, exception = self._do_get_response(
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 308, in _do_get_response
parsed_response = parser.parse(
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 251, in parse
parsed = self._do_parse(response, shape)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 871, in _do_parse
self._add_modeled_parse(response, shape, final_parsed)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 881, in _add_modeled_parse
self._parse_payload(response, shape, member_shapes, final_parsed)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 923, in _parse_payload
body_parsed = self._parse_shape(shape, original_parsed)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 331, in _parse_shape
return handler(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 436, in _handle_structure
parsed[member_name] = self._parse_shape(
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 331, in _parse_shape
return handler(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 982, in _handle_list
return super()._handle_list(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 413, in _handle_list
return super()._handle_list(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 339, in _handle_list
parsed.append(self._parse_shape(member_shape, item))
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 331, in _parse_shape
return handler(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 436, in _handle_structure
parsed[member_name] = self._parse_shape(
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 331, in _parse_shape
return handler(shape, node)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 177, in _get_text_content
return func(self, shape, text)
File "/usr/local/lib/python3.8/dist-packages/botocore/parsers.py", line 534, in _handle_timestamp
return self._timestamp_parser(text)
File "/usr/local/lib/python3.8/dist-packages/botocore/utils.py", line 938, in parse_timestamp
return _parse_timestamp_with_tzinfo(value, tzinfo)
File "/usr/local/lib/python3.8/dist-packages/botocore/utils.py", line 919, in _parse_timestamp_with_tzinfo
return dateutil.parser.parse(value, tzinfos={'GMT': tzutc()})
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 1368, in parse
return DEFAULTPARSER.parse(timestr, kwargs)
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 640, in parse
res, skipped_tokens = self._parse(timestr, kwargs)
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 719, in _parse
l = _timelex.split(timestr) # Splits the timestr into tokens
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 201, in split
return list(cls(s))
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 190, in next
token = self.get_token()
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 117, in get_token
if self.isword(nextchar):
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 206, in isword
return nextchar.isalpha()
RecursionError: maximum recursion depth exceeded while calling a Python object

From the logs, in normal instances, you'd see toil loop through the folder structure, as examples below,

Found in ParseResult(scheme='s3', netloc='foldername', path='/path/to/file', params='', query='', fragment='') items: []
Found in ParseResult(scheme='s3', netloc='foldername', path='/path/to/folder', params='', query='', fragment='') items: ['file1', 'file2']

But when there is the error with recursion, you would see the items array would contain an item with ''

Found in ParseResult(scheme='s3', netloc='foldername', path='/path/to/folder/with/issue', params='', query='', fragment='') items: ['']

If I do a query with boto3 to list all the objects in the bucket, this occurs in cases where an object with key is pointing toward to a folder object instead of a file object, and I think somehow toil parses it as a ''.

s3.ObjectSummary(bucket_name='bucket', key='folder/')
s3.ObjectSummary(bucket_name='bucket', key='folder/filename.ext')

┆Issue is synchronized with this Jira Story ┆epic: AGC ┆friendlyId: TOIL-1200

boyangzhao avatar Aug 02 '22 16:08 boyangzhao

This is when you have a key with a name ending in / that is an actual S3 object, with associated data, and not just a prefix, right? And it happens when the CWL workflow wants to use a Directory that ends up containing that object, or maybe a File that points at it?

We definitely shouldn't hang and crash when we see this, but I think we also can't actually express that situation in a CWL Directory's listing, and such a structure can't be materialized on a local filesystem. If you wrote a CWL workflow to recursively copy one directory tree to another, and you ran it with this S3 structure as the source, it wouldn't be able to reproduce this structure at the destination, even if the destination were also on S3.

So we could either just ignore the data associated with keys that end in /, and hide them from the CWL workflows, or we could throw an error complaining that the input directory isn't something that CWL can understand, and make the user get rid of the offending keys with data that end in / before deigning to take it as an input.

I think secretly hiding them makes the most sense. It's pretty astonishing, but probably less astonishing than stopping to harangue the user, when the user presumably wanted the workflow to work.

adamnovak avatar Aug 17 '22 15:08 adamnovak

This happens as an 'artifact' on AWS when a folder is created manually in the AWS online console. There is no concept of folder structure on S3, but this method creates a S3 object ending in "/". As a hack for now, I avoid manually creating folders, but do direct upload of folders onto S3. For example, uploading [folder/file1.txt, folder/file2.txt],

  • if I do this by first manually create a folder and then upload file1 and file2, this would result in 3 s3 objects [folder/, folder/file1.txt, folder/file2.txt]
  • if I upload the folder with the awscli, this results in 2 s3 objects [folder/file1.txt, folder/file2.txt]

I've only see this as an issue when the input in the CWL is Directory and it's doing its listing, but if one of the objects is folder/, it would get stuck. But I wouldn't just ignore "/" as there can legitimately be empty folders as part of the folder structure. I have a tool in one of my workflows which it was built to self checks for a folder structure before running, and if a folder is missing (even if it is empty), this tool wouldn't run. So as a hack for this, I needed to create a dummy empty file so this creates an actual S3 file object.

boyangzhao avatar Aug 17 '22 23:08 boyangzhao