distributed
distributed copied to clipboard
A distributed task scheduler for Dask
### Use case 1 A task in flight fails to unpickle when it lands; this triggers a `GatherDepFailureEvent`. ### Use case 2 A task in flight unpickles successfully when it...
In https://github.com/dask/distributed/issues/6110#issuecomment-1105837219, we found that workers were running themselves out of memory to the point where the machines became unresponsive. Because the memory limit in the Nanny is implemented [at...
It would be handy to be able to record multiple measures at once in `MemorySampler`. Particularly, recording `process` and `managed_spilled` at the same time gives you a picture of both...
This log message is also issued if the worker closes properly. That is a bit confusing. If this happens in a non-closing state, I consider this an error and promoted...
Note: If I do the below with adapt(minimum=32, maximum=32), it works repeatably with no failures. If I throw ~100 tasks at a AWS EC2Cluster with adapt(minimum=1, maximum=32) enabled. All tasks...
``` Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Exception in callback Worker._handle_stimulus_from_task(
I don't really understand fully what's going on but we've seen some fatal windows errors on CI with this. This _should_ never happen since the merge actually catches a `RecursionError`...
- Related to #5371 - Blocked by #6577 Currently crashes also in the case without AMM. I believe this is due to having 10 nannies on 2-4 CPUs.
``` Aug 11 09:49:57 ip-10-0-12-62 cloud-init[1268]: 2022-08-11 09:49:57,015 - bokeh.core.property.validation - ERROR - 'start' Aug 11 09:49:57 ip-10-0-12-62 cloud-init[1268]: Traceback (most recent call last): Aug 11 09:49:57 ip-10-0-12-62 cloud-init[1268]: File...
Backport from #6271