David Parks

Results 23 comments of David Parks

@cparrott73 - the 128 byte alignment is an optimization which can minimizes the number of times the "stwcx." can possibly fail - that is the reservation is cleared by another...

@martin-frbg The enhanced mod as provided, was developed with only consideration given to the OpenMP path. If you do like the mod, it might just be a special case for...

@Flamefire , no objections to your comments. The mod was developed to get around the runtime failures we were experiencing in-house, and I expected that the OpenBLAS developers would modify...

Hi, With regards to the inquiry about the requirement in the comment above. What we strive for in the numerical intrinsic libraries is having the scalar and vector versions produce...

Approach outlined by @fpetrogalli-arm seems reasonable. Though, I have a few question: What is the conceptual difference between vfma_() and vmla_()? Is vfma() only to be used if the hardware...

Hi Shibata-san, Perfect. This is clearer and cleaner. Best regards, Dave On Tue, Oct 24, 2017 at 10:05 AM, Naoki Shibata wrote: > Regarding this, I am planning to change...

Hi Shibata-san, Our situation is a little more complicated than just using your SLEEF routines. We have adopted your intrinsic notation and accompanying header files for all our new CPU...

Those symbols are the parallel routines from the parallel regions starting at line `*_F1L_1_`

They should be defined in the corresponding .obj file (for example I'd expect __nv_zhetrd_hb2st__F1L467_1_ to be defined in zhetrd_hb2st.F.obj)

I don't believe that the routines you're referencing are user callable - you might want to first double check whether these routines are internal prior to changing the semantics to...