Corey adams

Results 63 comments of Corey adams

Thanks @melMass for the reminder. Using the library I posted above which is a reproducer, I get this: This one is the python bindings: ```bash $ otool -L /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so:...

Hey @cqc-alec thanks for digging into this too. Interesting result - though, I tried in my reproducer and it's not so trivial to remove the link to python: I have...

For what it's worth, I have done this and tested it on our Polaris Supercomputer at Argonne National Lab, the changes are pretty small, one additional file (Mpi4pyCluster.py) and one...

OK, I'll put a PR together and send it in. I thought of another question - is it likely to call these cluster functions more than once or twice at...

> By the way, we'd love to hear more about your workload, always great to hear about people using JAX at scale. Do you use JAX's distributed `jit` or `shard_map`,...

On our systems, we're using MPICH from HPE but in my experience when you are dealing with the vendor optimized mpi implementations, from HPE/Cray/Intel/IBM/etc - the env variables they set...

Just so we can have something clear to discuss: I opened a PR #20174 based on what we've talked about here, using an exclusively opt-in method. Hopefully it proves useful!

The mpi I have installed locally doesn't have CUDA support, I am using `MPI4JAX_USE_CUDA_MPI=0`. with MPI4JAX_USE_CUDA_MPI=1, it breaks the entire application. I can go test on our cluster, to see...

A datapoint: On A100s, I see that I need a bigger buffer size to reproduce this, and it is without CUDA-aware mpi. ```bash ❯ MPI4JAX_USE_CUDA_MPI=0 python cg_test.py NO JIT -...

Just tried - unfortunately it doesn't change anything. The `notoken` mode fails in the same way. A few things I've noticed as I explore this strange behavior, using the `allreduce`...