Port `Datetime` and `Timedelta` to server-side objects
I think these are great candidates for the next server-side objects because:
- CI is already written
- They are fairly simple wrappers around single integer arrays without too much functionality
- Having them in the server would solve a lot of current problems where these objects revert to bare
int64arrays after being passed to functions likeak.concatenate,ak.unique, etc.
I'm having serious second thoughts about this approach, mainly because the code changes are monstrous. I had thought that the benefit of having objects retain their type through server operations would be worth the extra code, but now I'm not so sure. In hindsight, the code cost for these time-related types is extra high because they support arithmetic. By contrast, supporting an IP address type (for example) would not be as much code, but still a lot.
I am now leaning towards keeping these "fancy" dtypes in the client, and handling type preservation with the approach akutil takes. See, for example, akutil.concatenate:
https://github.com/Bears-R-Us/arkouda/blob/697d8682afccc310a4f8051ca7dd4bfdc0fd5b0a/akutil/akutil/util.py#L10-L39
Arguments to akutil.concatenate take one of two code paths:
- If they define a
.concatmethod, then that method takes over all the logic and returns the correct type - If they don't have a
.concatmethod, they must subclasspdarray, in which case: a. the values go through the vanilla concatenate function and lose their fancy type b. the result gets converted back to the fancy type via a callback
If we standardize this template for all fancy dtype classes, I think we can make this work for similar functions that take and return arkouda arrays.
@ronawho @mhmerrill @glitch @pierce314159 @hokiegeek2 @bmcdonald3 Thoughts about keeping fancy dtypes in the client vs. porting to the server?
@reuster986 Can you point me to a branch you're working in so I can get an idea of the changes? Thanks.
Hi @glitch , it's the time-classes branch: https://github.com/Bears-R-Us/arkouda/compare/time-classes
(I put this link under the word "monstrous" above, but I probably should have made it more obvious, sorry)
I played around a bit with this today using BroadcastMsg as an example. Unfortunately I couldn't figure out how to trick the chapel compiler into letting it figure out what the raw type is (int, real, bool etc.) If we could figure that out it would shrink a lot of the code in a lot of places.. but if it was easy to figure out / possible, I'm guessing it would have been done before. I'm going to look at ways to slim this down again tomorrow. Essentially you're really just making a wrapper around another int array with extra meta info and I feel like this should be less painful than what you've had to go through.
I'm thinking there might be a way to update some of the toSymEntry / addEntry procs to accept meta info so you don't have to make your own set of when conditions all over the place.
Ok, I'm actually feeling a bit better about this. I think part of the pain here is that you're running into the incremental change of adding the typing system. Had some of these changes already been implemented, I think this would have been easier. We have a couple of different usage patterns for creating entries / adding them to the table, here are the current function signatures:
-
addEntry(name: string, len: int, type t): borrowed SymEntry(t) -
addEntry(name: string, in entry: shared AbstractSymEntry): borrowed AbstractSymEntry -
addEntry(name: string, len: int, dtype: DType): borrowed AbstractSymEntry -
new
addTimeEntry(name: string, len: int, dtype: DType): borrowed TimeEntry
Sometimes we care about the return value of addEntry and other times we don't. This loosely matches the pattern of
- allocate an array for me so I can fill it (i.e. BroadcastMsg)
- I've already made the array but I want it to go in as the proper type according to the DType (i.e. ReductionMsg, SortMsg, etc.)
I see this show up a lot:
if gEnt.dtype == DType.Int64 {
st.addEntry(sortedName, new shared SymEntry(sorted));
} else {
st.addEntry(sortedName, new shared TimeEntry(sorted, gEnt.dtype));
}
If we were able to clean up our access patterns I think a lot of those types of operations would go away. We already have a proc addEntry(name, len, dtype) , but it seems we need a similar one which accepts an already allocated array. That should centralize some of the logic inside MultiTypeSymbolTable so anything with a dtype and raw array just gets wrapped in the appropriate entry... and since we generally don't do anything with that entry we don't really care about the return type of it. In the cases where we go on to do something with it, we can do one of two things, we can either do the casting locally using a convenience function, or we could add a finer grained addEntry proc to return the intermediate type you're looking for (i.e. the allocate an entry so I can fill the array version).
Another main theme I noticed is related to how we handle dtypes & aliasing over in the client python code. We have specific DTypes which we've declared as Enums in chapel, but we don't really have a way to group them. Over in the client dtypes.py we have Unions of the various scalar types. If we had something similar it would help clean up stuff like this:
select (gAr1.dtype, gAr2.dtype) {
when (DType.Int64, DType.Int64),
(DType.Datetime64, DType.Datetime64),
(DType.Timedelta64, DType.Timedelta64) {
This leaves OperatorMsg which I think https://github.com/Bears-R-Us/arkouda/issues/1045 is going to help there.
I'm going to try and move some stuff around and I'll post a PR with some changes.