Performance of gap junction transfer may be easily improvable.
The performance of the #446 extension for source to target transfer (e.g. allows transfer of ion concentration) has not been compared to the previous (less general and restricted to gap junctions) implementation. If performance turns out to be visibly worse, I would first try reordering the SetupTransferInfo.src_xxx and SetupTransferInfo.tar_... so that the src_index and tar_index vectors are in increasing NrnThead._data order. See the comment in coreneuron/networ/partrans_setup.cpp`` in void nrn_partrans::gap_data_indices_setup(NrnThread* n) {```
Also, that ordering may turn out to be complete for targets so that, in that case, as well as single thread, or non-mpi, or non-gpu cases, the general transfer sequence
src_gather[i] = NrnThread._data[src_indices[i]]
outsrc_buf[outsrc_indices[j]] = src_gather[gather2outsrc[j]]
MPI_Alltoallv outsrc_buf -> insrc_buf
NrnThread._data[tar_indices[k]] = insrc_buf[insrc_indices[k]]
may be simplified.