[BugFIx] Support different dtype for `indptr`, `indices` and `data` in `COOToCSR`
π¨Work Item
IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Background πΊοΈ
We now have 4 cpp functions responsible for converting COO to CSR, they are SortedCOOToCSR, UnSortedSmallCOOToCSR, UnSortedSparseCOOToCSR and UnSortedDenseCOOToCSR. The selection of the appropriate COOToCSR function among them is based on a heuristic approach (https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L749).
Bug π
Currently, all 4 COOToCSR functions are defined with the template template <class IdType>. Let us take UnSortedSparseCOOToCSR as an example. https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L413-L414
In the task of converting COO to CSR, the data type IdType is designated in https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/array.cc#L809 , indicating that IdType is actually equal to the data type of coo.row.
And then, in the current implementation of these COOToCSR functions, the constructed ret_indptr, ret_indices and ret_data are all set to be of dtype IdType. https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L426-L433
This is definitely not right because ret_indptr, ret_indices and ret_data do not necessarily have the same data type. Let's break them down in detail:
-
ret_indices: Its dtype is the same ascoo.row.dtype(IdType). This is the only correct part of the current implementation. -
ret_indptr: Its dtype depends on the number of non zero elements(NNZ). IfIdTypeisint32butNNZexceedsINT32_MAX, thenret_indptr.dtypeshould beint64, not the same asret_indices. -
ret_data: Its dtype should be exactly the same ascoo.data. It could beint,floator evenbool, not guaranteed to be the same asret_indices.
Question β
- Since the data types of
ret_indptr,ret_indicesandret_datamay be completely different, why do we set them all asIdType? - Furthermore, why do we need
IdType? It's only applicable toret_indices. The dtype ofret_indptrshould be determined dynamically; andret_dataare not even guaranteed to have ID-like dtype.
Working Plan π§
- Remove
template <class IdType>, makingCOOToCSRa non-template function. - Dynamically determine the dtypes of
ret_indptr,ret_indicesandret_dataas follow.
-
-
ret_indices.dtype<-coo.row.dtype,
-
-
-
ret_indptr.dtype<- whetherNNZexceedsINT32_MAX(however, ifcoo.rowis ofint64, we setret_indptrasint64anyway),
-
-
-
ret_data.dtype<-coo.data.dtype(ifcoo.datais null, set dtype the same asret_indptr).
-
Reference π
- #699 : This 5-year-old PR implemented the first
COOToCSRfunction withtemplate <DLDeviceType XPU, typename IdType, typename DType>. - #1251 : This 4-year-old PR removed
typename DTypebut kepttypename IdType. - #3326 : This 3-year-old PR extended
COOToCSRtoSorted,UnSortedDenseandUnSortedSparseversions, but still kepttypename IdType.
None of these venerable PRs explained why we need IdType. π€
Acknowledgement π
#7364 for reporting this bug.
@frozenbugs @Rhett-Ying @peizhou001 @mfbalin @BarclayII I'd be grateful for your opinions and suggestions on this issue. Because this involves modifying the code written many years ago, I'm afraid I may not have thought it through enough.
I think it's a historical issue that only IdType is used only that does not take more scenarios into consideration.
Dynamically determining the type of indptr, indices, data is the correct way. please go ahead with the changes required.
Before the change, is it possible to add a check for data type to throw exception in the scenario we hit in the issue?
I think it's a historical issue that only
IdTypeis used only that does not take more scenarios into consideration.Dynamically determining the type of
indptr,indices,datais the correct way. please go ahead with the changes required.Before the change, is it possible to add a check for data type to throw exception in the scenario we hit in the issue?
Yes, please help review #7459 .
After further investigation, I've discovered a possible reason why we need IdType. It's because the approaches of cuda implementation of COOToCSR are different for int32 and int64. See https://github.com/dmlc/dgl/blob/ed50c170dda9627730cb8ee4c7110205b6ea09de/src/array/cuda/coo2csr.cu#L25 and https://github.com/dmlc/dgl/blob/ed50c170dda9627730cb8ee4c7110205b6ea09de/src/array/cuda/coo2csr.cu#L100 .
If there is no way to merge these 2 approaches into an Integral one, we have to keep COOToCSR a template function and keep IdType parameter. But we still need to Dynamically determine the dtypes of ret_indptr, ret_indices and ret_data.