node icon indicating copy to clipboard operation
node copied to clipboard

# [FEA] Support ARM64 & update to RAPIDS 25.02, CUDA 12.8, Ubuntu 24.04 - Phase 1

Open aucahuasi opened this issue 3 months ago • 4 comments

Modernizes node-rapids to RAPIDS 25.02, CUDA 12.8, and Ubuntu 24.04 with ARM64 (aarch64) support for GH200 Grace Hopper platforms.

Changes

  • Update to RAPIDS 25.02, CUDA 12.8, Ubuntu 24.04, Python 3.12
  • Add ARM64 (aarch64) support alongside x86_64
  • Update Arrow 9.0.0 to 19.0.0 (enable S3, Acero)
  • Update nvcomp 2.4.1 to 4.2.0.11 with ARM64 binaries
  • Update build system: cmake-js 7.3.1, node-gyp 10.2.0, CMake 3.30.5
  • Update TypeScript 4.5.5 to 5.3.3, Jest 26.5.3 to 29.7.0
  • Update @typescript-eslint 5.30.0 to 6.21.0 for TypeScript 5.3 compatibility
  • Update RMM bindings for RAPIDS 25.02 API changes (thrust::optional to std::optional, removed deprecated methods)
  • Remove BlazingSQL module (abandoned upstream)

Testing

Phase 1 modules (core, cuda, rmm) all passing tests on Ubuntu 24.04 (x86_64 and ARM64), CUDA 12.8, Python 3.12, Node.js 16.15.1.

Phase 2 (separate PR) will address cudf module with its significant RAPIDS 25.02 API changes. Node.js version was kept at 16.x for this phase; Phase 2 may target Node.js 20.x depending on testing and compatibility requirements.

aucahuasi avatar Oct 18 '25 15:10 aucahuasi

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Oct 18 '25 15:10 copy-pr-bot[bot]

I've successfully updated the toolchain (LLVM 18, GCC 13, sccache 0.10.0, Node.js 16.20.2) and tested the core, cuda, and rmm modules in the devel container. Everything builds and runs correctly with CUDA 12.8.

A couple of questions:

  1. Version bumping: Should external contributors handle version updates across the monorepo, or do you prefer to manage this during your release process?
  2. Packages workflow: The yarn docker:build:devel:packages command fails trying to access S3 credentials. Is this workflow meant for RAPIDS internal use only, or should external contributors be able to run it? If the latter, the package.Dockerfile would need to support local-only sccache (currently sccache 0.10.0 tries AWS autodiscovery even with empty S3 config).

For now, I've verified the modules build and work in the devel container. Let me know what else you need for review!

aucahuasi avatar Oct 31 '25 19:10 aucahuasi

@aucahuasi thanks for this PR, I'll try to review it this weekend. Typically we'd update the RAPIDS version across all the projects, but I should be able to handle that. There's also some housekeeping tasks, like sccache, that I can tackle.

I'll push any updates to this branch if you don't mind, then approve and merge once I've double checked everything still works on my end.

trxcllnt avatar Oct 31 '25 22:10 trxcllnt

@aucahuasi thanks for this PR, I'll try to review it this weekend. Typically we'd update the RAPIDS version across all the projects, but I should be able to handle that. There's also some housekeeping tasks, like sccache, that I can tackle.

I'll push any updates to this branch if you don't mind, then approve and merge once I've double checked everything still works on my end.

Thanks @trxcllnt! That's perfect! Please go ahead and push any updates to the branch. I appreciate you handling the version updates and sccache configuration.

For context on the Node.js 16.20.2 choice: it aligns with our production environment requirements for ARM64/GH200 integration work.

Let me know if you need any additional work/testing or info from my end!

aucahuasi avatar Nov 01 '25 03:11 aucahuasi