Qinglei Cao
Qinglei Cao
Thanks for your explanation. Yes, the performance of SSL2 is similar as shown [in this website](https://github.com/flame/blis/blob/master/docs/Performance.md#a64fx), about 64 Gflop/s on a single core and about 60 Gflop/s per core if...
Thanks for your quick reply. Yes, that's the configuration in DPLASMA, coupling single-threaded BLIS kernel and pthread. Is it good if calling many single-threaded kernels in BLIS, e.g., dgemm, simultaneously?
Not sure whether SSL2 is available on Ookami, but on Faguku it's not as good as OpenMP because of sector cache issue. Not sure the details but maybe it's related...
nope. It's just run on single-core. The testing is simple, (1) initializing A, B, and C; (2) timer starts (3) calling DGEMM from BLIS of the single-threaded version (4) timer...
I will take a look this weekend. Thanks. On Wed, Apr 24, 2024 at 4:56 PM Aurelien Bouteiller < ***@***.***> wrote: > This should be restudied after 3935f31 > >...
> > I will take a look this weekend. Thanks. > > […](#) > > We never documented what was the outcome of this study [@QingleiCao](https://github.com/QingleiCao) It seems it's not...
> I don't understand why we skipped version 0 data transfer. Does it work if you remove that line ? I'm not sure the reason; maybe reshape of desc sets...
This desc will be reshaped to arena_ADT. Is it related to here?
I can reproduce this issue with the following error: ``` testing_issue624: /home/qcao3/parsec/parsec/remote_dep_mpi.c:1582: remote_dep_mpi_save_put_cb: Assertion `0 != deps->pending_ack' failed. ``` And changing the flow order in the task class `start` can...
There are some discussions about this recursive issue [here](https://github.com/ICLDisco/parsec/issues/464).