Client.submit *args is typehinted wrongly
Describe the issue:
While the distributed.Client class is missing actual typehints, there are loose mentions of types in the docstrings. Some IDEs (such as PyCharm for example) parse Docstrings to determine typehints it will then display and use for checking types statically. Unfortunately, the *args typehint is wrong here: https://github.com/dask/distributed/blob/5af14d668536340d046949cf854293021f48cd7f/distributed/client.py#L1864
def submit(
self,
func,
*args,
...,
*kwargs,
):
"""Submit a function application to the scheduler
Parameters
----------
func : callable
...
*args : tuple # <-- 'tuple' is wrong! Annotating nothing would be better
Optional positional arguments
...
""""
What this annotation would mean is that submit is a function taking an arbitrary number of *args which itself all are tuples individually.
Minimal Complete Verifiable Example:
This leads to a simple & common example already spitting out type errors.
from distributed import Client
def foo(bar: int, baz: str) -> None:
return None
client = Client()
client.submit(foo, 1, "1")
Will lead to:
Expected type 'tuple', got 'int' instead
Expected type 'tuple', got 'str' instead
Anything else we need to know?:
How to do this correctly: In typeshed, for the concurrent.futures.Executors, this is done as good as this can work with current constrains of the Python typehinting system: https://github.com/python/typeshed/blob/c1137ee36414a042e30712cd491b39481737a4ea/stdlib/concurrent/futures/_base.pyi#L57
Note that since Iterable[P.args] is unfortunately not allowed, they were forced to fall back to Any for the map function: https://github.com/python/typeshed/blob/c1137ee36414a042e30712cd491b39481737a4ea/stdlib/concurrent/futures/_base.pyi#L62
I prepared an implementation that would work for Dask:
from typing import TypeVar, Any, Callable, Iterable, TYPE_CHECKING
if TYPE_CHECKING:
from typing_extensions import ParamSpec
P = ParamSpec("P")
R = TypeVar("R")
class Client(SyncMethodMixin):
def submit(
self,
func: Callable[P, R],
*args: P.args,
key: str | None = None,
workers: str | Iterable[str] | None = None,
resources: dict | None = None,
retries: int | None = None,
priority: int = 0,
fifo_timeout: str = "100ms",
allow_other_workers: bool = False,
actor: bool = False,
actors: bool = False,
pure: bool = True,
**kwargs: P.kwargs,
) -> R: ...
def map(
self,
func: Callable[..., R],
*iterables: Iterable[Any],
key: str | list[str] | None = None,
workers: str | Iterable[str] | None = None,
retries: int | None = None,
resources: dict | None = None,
priority: int = 0,
allow_other_workers: bool = False,
fifo_timeout: str = "100ms",
actor: bool = False,
actors: bool = False,
pure: bool = True,
batch_size: int | None = None,
**kwargs: Any,
) -> Iterable[R]: ...
I'm not sure though if you want that, so I did not prepare a PR. If that's of interest, I could do that of course.
Environment:
- Dask version:
distributed.__version__ =='2023.12.1' - Python version: 3.9.18
- Operating System: Linux Manjaro
- Install method (conda, pip, source):
poetry install
Type hint improvements are always welcome. A fix to the docstring would be my first action, then adding more type hints to the API could come later.
I can take this task up. I use dask on a daily basis so I'd like to contribute to the codebase.