sparse icon indicating copy to clipboard operation
sparse copied to clipboard

Support for DOK slicing and advanced indexing

Open stschroe opened this issue 6 years ago • 9 comments

Since the DOK array is versatile applicable I use them frequently in my projects. I needed support for slicing and advanced indexing and implemented it myself, see gist.

The indexing supports slicing, advanced (fancy) indexing and as I need it most of the time "array indexing". This is a fast version for advanced indexing. Only vectors (1d sequences) are accepted for indexing. The getitem method always returns dense ndarrays in contrast to normal advanced indexing of sparse arrays which create new copies of sparse arrays. The setitem method allows only vectors and scalars as values. This allows fast calculations over ndarray methods without the expensive creation of temporary sparse arrays between operations. This is of course only useful for small vectors, which fit easily in to memory.

If there is interest in adding this to pydata.sparse, I could help. But as I have never contributed to any open project, I have no idea if I could handle the necessary amount of work.

stschroe avatar Nov 06 '19 13:11 stschroe

Hi! First of all, thanks for the offer to contribute! It means a lot.

Second, your code looks pretty much perfect, except that one thing I'd like is for parts of it to be ported to Numba, along with some of the other indexing logic.

If you're willing to accept that burden, it'd be great to have your contribution in.

hameerabbasi avatar Nov 06 '19 14:11 hameerabbasi

During working on the indexing code I also made some experiments in speeding up the code with numba. But I decided to not use numba for now and let it mature for some time. Especially the numba typed list and dict objects as well as the slice object get some development work at the moment. So it may be much less work in the near future.

stschroe avatar Nov 06 '19 14:11 stschroe

Fair enough. Were the experiments successful or are you not using them for some other reason? I'm also willing to accept your code and speed it up in the future.

hameerabbasi avatar Nov 06 '19 14:11 hameerabbasi

I have done my experiments to evaluate, if all the functionality could be written in nopython mode. And the conclusion is yes. But as I wrote the code, it turned out, the numba version would result in a much bigger code base therefore much more work. For example I now use the slice indices method and the itertools product function. And the replecement of the usage of the itertools product function alone wold be not so easy. And as you can imagine, at the start, I had no idea how complex the whole indexing logic is.

stschroe avatar Nov 06 '19 14:11 stschroe

In that case, feel free to make a PR.

hameerabbasi avatar Nov 06 '19 14:11 hameerabbasi

Is this bychance related to this issue found over at xarray? Or should I post a separate issue?

k-a-mendoza avatar Nov 08 '19 16:11 k-a-mendoza

@El-minadero It seems the issue you're looking for is #114.

hameerabbasi avatar Nov 08 '19 17:11 hameerabbasi

@hameerabbasi I'd like to pick up this issue again, and if possible turn @stschroe's gist into a PR. Fancy indexing for item access and assignment for DOK arrays would be very useful for us over at napari to support sparse labels layers (see the issue on the napari repo here

Is this something you would still be open to? If so, I think we can get a PR in very soon.

DragaDoncila avatar Mar 03 '21 03:03 DragaDoncila

Of course. If it doesn’t have a wontfix label, we’re open to PRs.

hameerabbasi avatar Mar 03 '21 03:03 hameerabbasi