Optimizations and Corrections
(CORRECTION) GPMA Node labelling should start from 1
The node labelling in GPMA should start from 1, the reason is because when GPMA is initialized by default a sentinel value is inserted for every node. That sentinel value is (src,0) as a 64-bit element.
- This means that if GPMA row offset does not ignore the sentinal we will have random edges to 0 when it is not actually present.
- This means that if GPMA row offset does ignore the sentinel then nodes who actually have edges with 0 are having those edges ignored.
We will probably have to perform a relabelling in preprocessing or find a better way to deal with this.
CORRECTION (CLOSED)
- The sentient value is actually
(src,0xFFFFFFFF)and hence we don't really have to worry about node id 0 being ignored.
HENCE NO CHANGE IS REQUIRED
(OPTIMIZATION) Coalesced memory access in GPMA
Since there are spaces in GPMA we are not benefiting from the coalesced memory access and hence we can maybe benefit from moving to shared memory as shown here: https://www.tutorialspoint.com/cuda/cuda_memory_considerations.htm
(UNDERSTANDING) Edge Parallelism or Node Parallelism
Is edge or node parallelism better for the count_sort portion of building the reverse graph.
(CORRECTION) Evaluation support
We will need to think about how seastar can be used for evaluation. So no backdrop
- We can disable the state stack and the timestamp stack
- We would need the base forward graph to be cached. GPMA at the moment does not use cacheing since the backward prop brings the graph back to its original state. So we will have the introduce cacheing and test the validity of
deepcopy