jax icon indicating copy to clipboard operation
jax copied to clipboard

[ROCm] JAX-ROCm docker images

Open reza-amd opened this issue 4 years ago • 15 comments

Hi,

As part of our effort to support JAX on ROCm framework, we have published our preview release under the following DockerHub repository. https://hub.docker.com/repository/docker/rocm/jax

We appreciate it if you help us with the following items:

  • Announcing this release to the JAX community to try it and gives us feedback to improve our support
  • We have asked users in the Readme to submit issues here (in google/jax). Please CC @reza-amd and @deven-amd if any ROCm specific issue is submitted. If you prefer to submit them somewhere else, we can track issues in our forked repository
  • Helping us to setup CI-builds similar to your internal infrastructure
    • This issue is being tracked in https://github.com/google/jax/issues/7323
  • Helping us to release JAX on ROCm as Python Wheels
  • Guiding us to pick some representative benchmarks for performance tuning and detecting missing features

@hawkinsp

reza-amd avatar Aug 12 '21 02:08 reza-amd

Thanks so much for this amazing work, and for bringing it to our attention!

Announcing this release to the JAX community to try it and gives us feedback to improve our support

I'll announce this to Google-internal users. We don't have a clear communication line to external folks... got any suggestions for what we should do on this front? We could mention it in the README.

Helping us to setup CI-builds similar to your internal infrastructure Helping us to release JAX on ROCm as Python Wheels

@yashk2810 could you weigh in on this? (Note there's also #7323 as pointed out in the OP.)

(Assigning to Yash for now, to follow up on this point.)

Guiding us to pick some representative benchmarks for performance tuning and detecting missing features

I'll ping some folks about this.

mattjj avatar Aug 13 '21 04:08 mattjj

This issue is being tracked in [ROCm] running unit test in parallel #7323

I replied on this issue about OSSing BUILD files for testing.

Helping us to setup CI-builds similar to your internal infrastructure

Once BUILD files are opensourced, I can look into running bazel tests using internal infra. I can't guarantee a timeline but when BUILD files are opensourced for testing that should atleast give you a way to test using bazel.

How does that sound?

yashk2810 avatar Aug 13 '21 05:08 yashk2810

Helping us to release JAX on ROCm as Python Wheels

This is interesting. Maybe I can hook something for this when I work on the release process for JAX. But if you have a way to do this, you can try that out!

yashk2810 avatar Aug 13 '21 05:08 yashk2810

@mattjj , Thanks so much for your attention on this matter.

I'll announce this to Google-internal users. We don't have a clear communication line to external folks... got any suggestions for what we should do on this front? We could mention it in the README.

Mentioning this in README would be great.

@yashk2810

Once BUILD files are opensourced, I can look into running bazel tests using internal infra. I can't guarantee a timeline but when BUILD files are opensourced for testing that should atleast give you a way to test using bazel. How does that sound?

Thanks so much for your help. It sounds good. Meanwhile we still use the approach mentioned in the documentation.

reza-amd avatar Aug 19 '21 18:08 reza-amd

See also: https://github.com/google/jax/tree/main/build/rocm

brettkoonce avatar Jun 26 '22 15:06 brettkoonce

See also: #2012

brettkoonce avatar Jun 26 '22 15:06 brettkoonce

@reza-amd is it possible to get a build with 5.7.1 support? I am interested in testing 7900xtx compatibility!

brettkoonce avatar Oct 18 '23 02:10 brettkoonce

@rahulbatra85 any luck with your updates / is the plan to wait on this till 6.0? I got the pytorch image working locally w/ 5.7.1 and was able to train simple models!

brettkoonce avatar Nov 25 '23 16:11 brettkoonce

@brettkoonce We have been releasing wheels and docker images for ROCm for a while now. Please see this https://github.com/ROCmSoftwarePlatform/jax/releases

rahulbatra85 avatar Nov 27 '23 15:11 rahulbatra85

@rahulbatra85 Do you want to send a PR improving https://jax.readthedocs.io/en/latest/installation.html ? I didn't quite know what to put there, and I think we could do a better job pointing to your releases.

hawkinsp avatar Nov 27 '23 15:11 hawkinsp

@hawkinsp yes, will update it.

rahulbatra85 avatar Nov 28 '23 14:11 rahulbatra85

@rahulbatra85 Thank you for the link to the images. I have been trying to use them for the past month or two without success, and so was assuming that my card (7900 xtx) was still not officially supported (eg 5.7.1 was required). I filed a bug (#18747) with notes on what I am seeing on my machine, would appreciate any advice!

brettkoonce avatar Nov 30 '23 12:11 brettkoonce

@brettkoonce Sorry, I misunderstood your question. Currently, JAX support for 7900 XTX is not there, but it's in our plan to support it with ROCm 6.xxx. Current best estimate is sometime next year.

I will keep you posted when I have an update!

Thanks!

rahulbatra85 avatar Nov 30 '23 16:11 rahulbatra85

@rahulbatra85 thanks for the update! looking forward to it!

brettkoonce avatar Dec 02 '23 16:12 brettkoonce

@rahulbatra85 Thank you for the updated docker images, I am able to train networks using jax and ROCm 6.1!

brettkoonce avatar Apr 20 '24 01:04 brettkoonce