Samuel Omlin

Results 112 comments of Samuel Omlin

reopened as foreseen GPU optimizations should also make the usage of LoopVectorization feasible without or little approach divergence between CPU and GPU code generation

Thanks, Chris, for your suggestion and your interest in ParallelStencil. We would be very happy to do some comparison against the cudnn wrappers. Do you have an example using cudnn...

We do not have any experience with cudnn - so, we will most certainly not be able to create an example using cudnn in its most performant way as needed...

Thanks @chriselrod for sharing your benchmarks with us! Here are a couple of thoughts with respect to the **cross-over point** that you have mentioned: - The problem fits in the...

> Execution using PS shows now somehow lower perf for the GPU compared to the CPU. This is to be expected for this problem size that fits in the CPU...

The function `isdef` is defined as follows (copied from [here](https://github.com/FluxML/MacroTools.jl/blob/master/src/utils.jl#L234)): ```julia isdef(ex) = isshortdef(ex) || longdef1(ex) !== nothing ``` The problem is that `longdef1(ex) !== nothing` does not at all...

@cstjean, can you review and merge this?

Thanks a lot @mortenpi ! The idea is to enable this kind of documentation pattern where content is presented in whichever custom categories are appropriate (not covered by `Modules` and...

I am glad if ParallelStencil can help - let me know what your needs are!

I think the following brought up in #3061, would still be good to dicuss further (@staticfloat, any further comments?): Me: > @staticfloat , @vchuravy : how about having in the...