Feature/raja vec
Added vectorization in stream/add
Perhaps we should take a step back and do a PR for the Lamba changes first? What do you think @vsrana01 ?
I agree with
Perhaps we should take a step back and do a PR for the Lamba changes first? What do you think @vsrana01 ?
@rhornung67 those changes were needed. @ajkunen I agree with you, I can go do another PR for just the Lambda changes and then a second one for the vec stuff. I can test across the different compilers with the new lambda changes and see if there is a performance difference.
@vsrana01 and @ajkunen if the lambda args changes are needed for the vectorization stuff, then it may be a good idea to figure out a good way to have both variants (with and without the 'Segs' business) for the non-vector variants. My main concern is that we want to make sure both versions of each kernel perform the same for each compiler. If not, then this is a good place for vendors to mine for why they are not. But, let's not do that now.
I suggest only making additions you need to support the vector variants and leave all non-vector variants as is for now. Does that make sense?
@rhornung67 i think the current RAJA develop branch imposes the Lambda requirements, which means the use of the new Lambda notation is necessary. I think if @vsrana01 does a PR for RAJAPerf with just the Lambda changes (and updated RAJA) we can see if there is a performance difference there. After that's complete, then the vector_exec work will be more narrowly scoped, and we can test that performance separately.
But we cant do the vector_exec stuff now without the Lambda stuff.
@ajkunen and @rhornung67 when I build with adams vectorization branch of RAJA i get compiler errors due to the new Lambda requirements. I will start looking at the changes in performance that the kernels have on them with the new lambda requirements and create a new pr.
@vsrana01 and @ajkunen OK. I misunderstood the constraints.
I think it would be best to do a PR with a new variant added (RAJA_Seq_Args) so we can assess performance. Then, move on from there. Agree?
@rhornung67 @ajkunen agreed.