FastMemcpy issues

2

https://github.com/skywind3000/FastMemcpy/blob/8fea5f666be174c6548d0ae4010e81b0a742c853/FastMemcpy.h#L644 Hi, it bugs me as 128 seems to be a reasonable choice. Is that derived from experiments? Or something related to the mechanism of prefetching itself?

amosbird

Slower on later GCC

5

This actually appears to be slower on GCC 5.4 > benchmark(size=32 bytes, times=16777216): > result(dst aligned, src aligned): memcpy_fast=42ms memcpy=48 ms > result(dst aligned, src unalign): memcpy_fast=46ms memcpy=54 ms >...

aaronovz1

[Question] about when to use `mm_sfence`

Maybe quite naive, but why use `mm_sfence` if size >= L2 cache size? https://github.com/skywind3000/FastMemcpy/blob/master/FastMemcpy.h#L680 And what if L2 cache size (0x200000) is not actually L2 cache size, is there any...

dirtysalt

GCC 10.2.1 Results

6

`gcc version 10.2.1 20201007 releases/gcc-10.2.0-350-g136256c32d (Clear Linux OS for Intel Architecture) ` > ./FastMemcpy > benchmark(size=32 bytes, times=16777216): > result(dst aligned, src aligned): memcpy_fast=48ms memcpy=35 ms > result(dst aligned, src...

victorstewart

一些小小的问题

4

和现在的 MCFCRT 比较了一下，因为 MCFCRT 不打算支持 AVX 就只测试了 SSE 的（实际上是懒得改，其实比较简单，目前的复制操作都是两个连续 `movups` 打包的，这地方改改就能支持 AVX）： ![4311](https://user-images.githubusercontent.com/5071344/35473846-b0aeb952-03c0-11e8-8ffa-5ed06979666d.png) ```plaintext gcc (gcc-7-branch HEAD with MCF thread model, built by LH_Mouse.) 7.3.1 20180125 Copyright (C) 2017 Free...

lhmouse

FastMemcpy{,_Avx}.c: Mark `memcpy()` as `dllimport` on MinGW and ming…

…w-w64 targets. On MinGW and mingw-w64 targets, `memcpy()` is imported from MSVCRT.DLL. With regard to benchmarking purposes, we have to eliminate the overhead of implicit importation by specifying `dllexport` explicitly....

lhmouse

Do you plan to support SIMD for ARM64 architecture?

1

As more and more people use servers with the arm64 architecture, supporting the arm64 architecture with SIMD becomes meaningful.

JackyWoo

FastMemcpy
FastMemcpy copied to clipboard

Metadata

[Revise] add latency and bandwidth in benchmark

What is the reason of using 256 for prefetchnta?

Slower on later GCC

[Question] about when to use `mm_sfence`

GCC 10.2.1 Results

一些小小的问题

FastMemcpy{,_Avx}.c: Mark `memcpy()` as `dllimport` on MinGW and ming…

Do you plan to support SIMD for ARM64 architecture?

← Metadata

Owner

Metadata

FastMemcpy FastMemcpy copied to clipboard

Metadata

← Metadata

Owner

Metadata

FastMemcpy
FastMemcpy copied to clipboard