【GraphBolt】【Bug】exception is thrown by __iter__ of FeatureFetcher
🐛 Bug
print graph Graph(num_nodes=104818, num_edges=1150630,
ndata_schemes={}
edata_schemes={})
Exception in thread Thread-1 (thread_worker):
Traceback (most recent call last):
File "/root/anaconda3/envs/GINO/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/GINO/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/torchdata/datapipes/iter/util/prefetcher.py", line 69, in thread_worker
item = next(itr)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 173, in wrap_generator
response = gen.send(None)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 123, in __iter__
yield self._apply_fn(data)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 88, in _apply_fn
return self.fn(data)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/dgl/graphbolt/minibatch_transformer.py", line 38, in _transformer
minibatch = self.transformer(minibatch)
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/dgl/graphbolt/feature_fetcher.py", line 97, in _read
node_features[feature_name] = self.feature_store.read(
File "/root/anaconda3/envs/GINO/lib/python3.10/site-packages/dgl/graphbolt/impl/basic_feature_store.py", line 58, in read
return self._features[(domain, type_name, feature_name)].read(ids)
KeyError: "
This exception is thrown by __iter__ of
FeatureFetcher(
datapipe=MultiprocessingWrapper, edge_feature_keys=None, feature_store=
TorchBasedFeatureStore{
(<OnDiskFeatureDataDomain.NODE: 'node'>, 'float32', 'x_in') :
TorchBasedFeature(feature=tensor([
[ 0.9825, 0.4900, -0.0148],
[ 0.9803, 0.4900, -0.0105],
[ 0.9846, 0.2326, -0.3140],
...,
[ 0.1170, 0.2753, 0.7003],
[ 0.1107, 0.2824, 0.7003],
[ 0.1128, 0.2753, 0.7003]]),
metadata={},
),
(<OnDiskFeatureDataDomain.NODE: 'node'>, 'float32', 'area'):
TorchBasedFeature(feature=tensor([
[0.3695],
[0.3695],
[0.2457],
...,
[0.3603],
[0.3604],
[0.3604]]),
metadata={},
)},
node_feature_keys=['x_in', 'area'])"
To Reproduce
Steps to reproduce the behavior:
1.Download data https://drive.google.com/drive/folders/1esJ-4ThKsaDQQLQMtZVowwkSlY8thJxr?usp=drive_link 2.Run This
import torch
import dgl
import dgl.graphbolt as gb
graph = dgl.load_graphs("./graph.bin")[0][0]
print("print graph", graph)
feat_data = [
gb.OnDiskFeatureData(domain="node", type="float32", name="x_in",
format="numpy", path="./x_in.npy", in_memory=False),
gb.OnDiskFeatureData(domain="node", type="float32", name="area",
format="numpy", path="./area_in.npy", in_memory=False),
]
graph = gb.from_dglgraph(graph, True)
feature = gb.TorchBasedFeatureStore(feat_data)
item_set = gb.ItemSet(104818, names="seed_nodes")
datapipe = gb.ItemSampler(item_set, batch_size=1024, shuffle=False)
datapipe = datapipe.sample_neighbor(graph, [10, 10]) # 2 layers.
datapipe = datapipe.fetch_feature(feature, node_feature_keys=["x_in", "area"])
datapipe = datapipe.copy_to(torch.device('cuda'))
dataloader = gb.DataLoader(datapipe)
mini_batch = next(iter(dataloader))
print("\nmini_batch x_in : ", mini_batch.node_features["x_in"])
Expected behavior
1024 src nodes feature "x_in" should be printed Many dst nodes features "x_in" should be printed
Environment
- DGL Version (e.g., 1.0): dgl 2.0.0+cu116
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): torch 1.13.1+cu116
- OS (e.g., Linux): Linux
- How you installed DGL (
conda,pip, source): pip - Build command you used (if compiling from source):
- Python version: python 3.10
- CUDA/cuDNN version (if applicable): cuda 11.6 cuDNN 7.4
- GPU models and configuration (e.g. V100): A100 40GB
- Any other relevant information:
Additional context
assign type to be None will jump over this bug
type="float32" # not work
type= None # it works
Another severe issue(at least bothers me a lot): The graph has to be saved and loaded, even if it already exists in my PC memory:
dgl.save_graphs("./graph.bin", graph)
np.save("./x_in.npy", x_in.cpu().numpy())
np.save("./area.npy", area.cpu().numpy().reshape(-1, 1))
graph = dgl.load_graphs("./graph.bin")[0][0]
Each epoch of my training(500 graphs) will cost a huge IO time and enhance so little over my model.
As the provided google drive link doesn't contain a file named "x_in.npy", I assume it is replaced with the file "node_feat.npy". I found that the matrix is in shape (2, 3), which is inconsistent with the graph size.
x_in = np.load("node_feat.npy")
print(x_in.shape). # (2, 3)
Could you check your data uploaded and also the shape of your local x_in matrix?
I didn't quite get the second question. What's the purpose of such periodic saving?
@mfbalin Could you help to comment here?
@mfbalin Could you help to comment here?
I haven't used a custom dataset before, including gb.OnDiskFeatureData. So I don't know what could be going wrong.
@wangguan1995 In the code snippet you shared, type is wrongly used. It should be node/edge type name instead of data type you used: type="float32". Here's the correct way to instantiate OnDiskFeatureData
a = torch.tensor([[1, 2, 4], [2, 5, 3]])
b = torch.tensor([[[1, 2], [3, 4]], [[2, 5], [3, 4]]])
write_tensor_to_disk(test_dir, "a", a, fmt="torch")
write_tensor_to_disk(test_dir, "b", b, fmt="numpy")
feature_data = [
gb.OnDiskFeatureData(
domain="node",
type="paper",
name="a",
format="torch",
path=os.path.join(test_dir, "a.pt"),
),
gb.OnDiskFeatureData(
domain="edge",
type="paper:cites:paper",
name="b",
format="numpy",
path=os.path.join(test_dir, "b.npy"),
),
]
feature_store = gb.TorchBasedFeatureStore(feature_data)
Corresponding documentation is available here: https://docs.dgl.ai/generated/dgl.graphbolt.TorchBasedFeatureStore.html#dgl.graphbolt.TorchBasedFeatureStore
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you