Illegal memory access
Hi, I'm currently encountering the following problem when trying to use node2vec for node embedding:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
GraphSolver<32, float32, uint32>
----------------- Resource -----------------
#worker: 1, #sampler: 11, #partition: 1
tied weights: no, episode size: 200
gpu memory limit: 3.53 GiB
gpu memory cost: 59.6 MiB
----------------- Sampling -----------------
augmentation step: 1, p: 1, q: 1
random walk length: 40
random walk batch size: 100
#negative: 1, negative sample exponent: 0.75
----------------- Training -----------------
model: node2vec
optimizer: SGD
learning rate: 0.025, lr schedule: linear
weight decay: 0.005
#epoch: 2000, batch size: 100000
resume: no
positive reuse: 1, negative weight: 5
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Batch id: 0 / 7122
loss = -nan
Check failed: error == cudaSuccess CUDA error an illegal memory access was encountered at /network/home/zhuzhaoc/.local/envs/build/conda-bld/graphvite_1584598935508/work/include/core/solver.h:1539
*** Check failure stack trace: ***
@ 0x7f5f873d24dd google::LogMessage::Fail()
@ 0x7f5f873da071 google::LogMessage::SendToLog()
@ 0x7f5f873d1ecd google::LogMessage::Flush()
@ 0x7f5f873d376a google::LogMessageFatal::~LogMessageFatal()
@ 0x7f5f87537b7b graphvite::WorkerMixin<>::train()
@ 0x7f5ff1791163 execute_native_thread_routine
@ 0x7f5fff02b609 start_thread
@ 0x7f5ffef52103 clone
@ (nil) (unknown)
The code I'm using is (this is inside of a loop which loads different graphs using different edgelist_filename: str):
# Prepare graph for Node2Vec
v_graph = vite_graph.Graph()
v_graph.load(edgelist_filename, as_undirected=False)
# Train Node2Vec hidden data
embed = vite_solver.GraphSolver(dim=32)
embed.build(v_graph)
embed.train(model='node2vec', num_epoch=2000, resume=False, augmentation_step=1, random_walk_length=40,
random_walk_batch_size=100, shuffle_base=1, p=1, q=1, positive_reuse=1,
negative_sample_exponent=0.75, negative_weight=5, log_frequency=1000)
# Extract embedded feature data
features = np.array(np.copy(embed.vertex_embeddings), dtype=np.float32)
# Clear memory and data on CPU and GPU
embed.clear()
The weird thing is that this happens completely sporadically. And the next time I run the same code (on the same edgelist_filename instance), the code works. So really, the only problem I have is that I need to keep running the code over and over again until all my data is processed.
I'm using cudatoolkit=10.1 and graphvite version 0.2.2 build py37cuda101hd3e7edd from conda channel milagraph.
It's an illegal memory access in GPUs. Really weird. Could you provide any graph dataset that can reproduce this error?
Here is the smallest dataset I used to reproduce the error: deepgraphlearning-graphvite-issue-67.zip I have also attached the output from the terminal and filenames which were used. While processing the last file (SAT_Competition2009/CRAFTED/rbsat/random/unforced/rbsat-v760c43649g4.cnf.edgelist), the program crashes.
Note that, while processing the dataset above, I used the dimension size of 64 instead of 32 as I described above. Everything else remained the same.
Thanks! We will try to reproduce that.
@KiddoZhu And I'm also encountering the same problem! Btw, is there any update?
@KiddoZhu Hi, I'm also suffering from the same problem. With description:
Check failed: error == cudaSuccess CUDA error an illegal memory access was encountered at /network/home/zhuzhaoc/.local/envs/build/conda-bld/graphvite_1584598935508/work/include/core/solver.h:1539