mage icon indicating copy to clipboard operation
mage copied to clipboard

Add latest cuGraph

Open gitbuda opened this issue 1 year ago • 9 comments

  • [ ] Make sure all API changes produce correct results -> TESTING 🧪
    • [ ] cugraph.balanced_cut_clustering -> NO TESTS -> add at least the empty test
    • [x] cugraph.betweenness_centrality -> unable to load symbol (SS below) -> tests are passing
    • [ ] cugraph.generator -> NO TESTS -> add at least the empty test
    • [x] cugraph.hits
    • [x] cugraph.katz_centrality
    • [ ] cugraph.leiden ⏳ -> unable to load symbol (SS below) -> free(): invalid pointer error
    • [ ] cugraph.louvain ⏳ -> invalid pointer error
    • [x] cugraph.pagerank
    • [x] cugraph.personalized_pagerank
    • [ ] cugraph.spectral_clustering -> NO TESTS -> add at least the empty test
  • [ ] Write a spec how much memory usage each algorithm is using (update the documentation page + communicate to JosipM)
  • [x] Figure out undefined symbols under laiden and betweenness_centrality Screenshot 2024-05-12 at 3 06 22 PM
    • NOTE: The problem here was that even though /opt/conda/include/cugraph/algorithms.hpp contains functions that could be compiled, /opt/conda/lib/libcugraph.so only contains certain instantiations of these functions (NOTE: it's not just template arguments, also whole functions are missing) -> a useful command to figure stuff out is `nm -C /opt/conda/lib/libcugraph.so | grep "cugraph::betweenness"
  • [ ] Merge main and resolve conflicts
  • [ ] Minimize the image as much as possible in limited time
  • [ ] Make the pipeline to push image to Dockerhub
  • [ ] Push experimental image to Dockerhub
  • [ ] Figure out ML vs no-ML build (SLACK discussion)
  • [ ] --> DELIVERABLE#1 Follow up HERE (Discord) <--
  • [ ] Upgrade Python pip libs for the full mage with cugraph Docker image
  • [ ] Consider adding a template generator for Dockerfiles or some form of includes -> https://codeberg.org/devthefuture/dockerfile-x#include
  • [ ] Upgrade dgl -> any build (native/Docker) fails -> figure out how to compile
    • [x] Try with https://catalog.ngc.nvidia.com/orgs/nvidia/containers/dgl
      • [x] Install nvidia-container-toolkit; restart Docker systemctl restart docker; run docker run -it --rm --gpus all ubuntu nvidia-smi to verify (source)
    • [ ] Both cuGraph and DGL projects recommend using conda -> use conda under the container
  • [ ] Extend mage/setup to pick individual libraries to install -> add that as a docker argument because usually, someone needs a specific binary (while the whole think doesn't compile); consider adding init bash script because usually, installation of deps fails (e.g. gdl is a very complex dependency)

Docker Build Commands

# make sure memgraph submodule under mage is initialized https://git-scm.com/book/en/v2/Git-Tools-Submodules

docker build -f Dockerfile.experiment_partial -t mage-cugraph-part --progress plain .
docker run -it --rm --gpus all mage-cugraph-part bash

docker build -f Dockerfile.experiment_full -t mage-cugraph-full --progress plain .

docker run -it --rm --gpus all mage-cugraph-full bash
cd /mage/cpp/build
cmake -DMAGE_CUGRAPH_ENABLE=ON -DMAGE_CUGRAPH_ROOT=/opt/conda ..
VERBOSE=1 make cugraph.pagerank
make cugraph.pagerank cugraph.personalized_pagerank cugraph.louvain cugraph.katz_centrality cugraph.leiden cugraph.betweenness_centrality cugraph.balanced_cut_clustering cugraph.spectral_clustering cugraph.hits cugraph.generator

python3 /mage/setup build --gpu --cpp-build-flags CMAKE_BUILD_TYPE=Release MAGE_CUGRAPH_ROOT=/opt/conda/ -p /usr/lib/memgraph/query_modules/

docker run -it --rm --gpus all -p 7687:7687 mage-cugraph-full --log-level=TRACE --also-log-to-stderr

# if building memgraph, do it outside the /mage/cpp/memgraph because that's copied to during Docker build
cd /tmp/mage/cpp/dist
cp ../build/cugraph_module/cugraph.pagerank.so ./
# where ever memgraph is built, cd there
./memgrpah --storage-properties-on-edges=True --query-modules-directory=/tmp/mage/cpp/dist --log-level=TRACE --also-log-to-stderr

cd /tmp/mage && ./test_e2e -k "pagerank_test-test_cugraph_influential"

Docker Base Images

  • https://hub.docker.com/r/rapidsai/base/tags
  • https://hub.docker.com/r/nvidia/cuda/tags
  • https://nvidia.github.io/cccl/libcudacxx/

Done

  • [x] Compile only the cuGraph module with latest everything
    • [x] python3 /mage/setup build --gpu --cpp-build-flags MAGE_CUGRAPH_ROOT=/opt/conda CMAKE_C_COMPILER=gcc CMAKE_CXX_COMPILER=g++ -p /usr/lib/memgraph/query_modules
    • [x] Figure out what's wrong with fmt (when all is compiled), again issues with fmt+spdlog, spdlog from /opt/conda is not ok (maybe interesting https://github.com/gabime/spdlog/issues/2825); fmt v9 doesn't compile with used g++, fmt v10 doesn't work with the spdlog in the /opt/conda Screenshot 2024-04-07 at 4 49 05 PM
      • [x] Potential solution -> https://github.com/gabime/spdlog/issues/1897
    • [x] cugraph/algorithms.hpp for some reason doesn't work -> make sure GPU+Docker is ok Screenshot 2024-04-20 at 10 06 05 PM
    • [x] Figure out the problem with cuda code under conda -> most likely some build config issue 🤔 -> https://github.com/NVIDIA/cuCollections/issues/331#issuecomment-1866958019 Screenshot 2024-04-21 at 5 17 56 PM
    • [x] API fixes Screenshot 2024-04-25 at 5 04 22 PM
  • [x] Find and link libcugraph-ops++.so and libraft.so -> taken from rapidsai/base Docker image under /opt/conda/lib Screenshot 2024-05-12 at 1 53 02 PM
  • [x] Make cugraph.pagerank to work correctly, implement the package process and release Docker image only with memgraph and pagerank internally

Experiment / Optional

  • [ ] Take a look at the edge mask feature (there is no masking of vertices)

Description

Please briefly explain the changes you made here.

Pull request type

  • [ ] Bugfix
  • [ ] Algorithm/Module
  • [ ] Feature
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes, no api changes)
  • [ ] Build related changes
  • [ ] Documentation content changes
  • [ ] Other (please describe):

Related issues

Delete if this PR doesn't resolve any issues. Link the issue if it does.

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

  • [ ] Core algorithm/module implementation
  • [ ] Query module implementation
  • [ ] Tests provided (unit / e2e)
  • [ ] Code documentation
  • [ ] README short description

Documentation checklist

  • [ ] Add the documentation label tag
  • [ ] Add the bug / feature label tag
  • [ ] Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • [ ] Write a release note, including added/changed clauses
    • [Release note text]
  • [ ] Link the documentation PR here
    • [Documentation PR link]
  • [ ] Tag someone from docs team in the comments

gitbuda avatar Mar 29 '24 16:03 gitbuda

any update? look like cugraph.pagerank still results different output compared with pagerank cpu version

nad010286 avatar May 30 '24 16:05 nad010286

@nad010286 nothing yet, just didn't have a chance to finish this work, but it's still on the short-term TODO list 😄

gitbuda avatar May 30 '24 16:05 gitbuda

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔

Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

gitbuda avatar Jun 02 '24 17:06 gitbuda

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Jun 02 '24 18:06 sonarqubecloud[bot]

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔

Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

hey, Im having issue to build the cuGraph and MAGE from source :( any instructions or things I have to pay attention to? Or if you can send me the docker image that you already built, Im happy to perform some tests for you with loads of graph data

nad010286 avatar Jun 06 '24 08:06 nad010286

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔 Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

hey, Im having issue to build the cuGraph and MAGE from source :( any instructions or things I have to pay attention to? Or if you can send me the docker image that you already built, Im happy to perform some tests for you with loads of graph data

I would also be very interested in an updated guide on how to build cugraph mage - I tried building the main branch with CUDA 11.8 but the nvcc compiler does not support C++ 20, so it fails with unorder_map::contains later on.

intuitiveminds avatar Jun 07 '24 21:06 intuitiveminds

managed to build and run some tests @gitbuda The results are similar with very small difference. But processing time is worse than CPU. Not sure if GPU is used because by monitoring the GPU via nvidia-smi, I can only see power consumption increase after the result is out. Also no VRAM is used.

nad010286 avatar Jun 19 '24 09:06 nad010286