Simplify device count external API calls
Currently there are many external APIs related getting the number of devices associate with PyTorch XLA. Those that I could find were:
- "global_runtime_device_count": returns the total number of devices across all processes/hosts, but it has "@functools.lru_cache()"
- "global_device_count": returns the total number of devices across all processes/hosts, but it has "@functools.lru_cache()"
- "addressable_runtime_device_count": Access number of addressable devices visible to a process.
- "addressable_device_count": Access number of addressable devices visible to a process. It specifically returns 1 in case of SPMD.
- "local_device_count": takes the number of addressable devices and multiplies it by the number of local process counts. Equivalent of the answer of the number of devices running on a host.
From these, some existing observations are:
-
addressable_runtime_device_countandaddressable_device_countare extremely similar in implementation and name. Perhaps we should make the distinction more clear. Perhaps there is some context aroundaddressable_device_countparticular I don't fully grasp. -
local_device_countterminology can be confusing when compared with JAX's concept for local devices for jax.local_devices.local_device_countbeing the number of devices in the host, while JAX's definition is of devices in the process - We should deduplicate
global_runtime_device_countandglobal_device_count, just have one reference the other to remove multiple calls
Related issues: #7653 #7657 #7658 cc @zpcore
Related comment in https://github.com/pytorch/xla/pull/9184/files#r2115084512
is there a plan on how to consolidates the APIs? if yes. maybe i can work on the implementation.
I believe that would be part of the issue, and would be an interesting item to familiarize yourself with all the different APIs. It would likely entail breaking it down and illustrating all the relevant APIs [1][2][3][4], and what the suggested replacement and deprecation would be. I think this might be needed first to obtain the consensus here prior to the implementation.