zig
zig copied to clipboard
`mem.span(p)` is 10x slower than `strlen(p)`
Zig Version
0.10.0-dev.1120+0ea51f7f4
Steps to Reproduce
- Save this file:
const std = @import("std");
extern fn strlen(ptr: [*c]const u8) usize;
pub fn main() anyerror!void {
var args = try std.process.argsAlloc(std.heap.c_allocator);
var file = try std.fs.cwd().openFileZ(args[args.len - 1], .{ .mode = .read_only });
var contents = try std.heap.c_allocator.dupeZ(u8, try file.readToEndAlloc(std.heap.c_allocator, std.math.maxInt(usize)));
var time = try std.time.Timer.start();
{
var slice = std.mem.sliceTo(contents, 0);
std.mem.doNotOptimizeAway(&slice);
}
const zig = time.read();
time = try std.time.Timer.start();
{
var len = strlen(contents);
std.mem.doNotOptimizeAway(&len);
}
const c = time.read();
std.debug.print(
\\Reading {s} ({any})
\\
\\ std.mem.sliceTo: {any}
\\ strlen: {any}
,
.{
std.mem.span(args[args.len - 1]),
std.fmt.fmtIntSizeBin(contents.len),
std.fmt.fmtDuration(zig),
std.fmt.fmtDuration(c),
},
);
}
- Compile with:
zig build-exe -OReleaseFast long.zig -lc
- Run it on a file
Expected Behavior
Same order of magnitude performance or better for medium-long strings (> 1 KB) equal or better for short strings
Actual Behavior
❯ ./long 512mb.file
Reading 512mb.file (512MiB)
std.mem.sliceTo: 172.243ms
strlen: 11.298ms
❯ ./long 1mb.file
Reading 1mb.file (1MiB)
std.mem.sliceTo: 809.583us
strlen: 50.708us
❯ ./long 254.js
Reading 254.js (157B)
std.mem.sliceTo: 167ns
strlen: 83ns
It makes sense why. strlen has probably been optimized a lot over the years. It's probably using SIMD
Note: this is on macOS
A few random things that I think are worth noting:
-
std.mem.sliceToforces calculation of the length by searching for the terminator, even for slices that already have a length. This is not true ofstd.mem.spanorstd.mem.len, which would give youcontents.lenback immediately.- EDIT: This is no longer the case as of https://github.com/ziglang/zig/pull/14417
- Here's musl's
strlenimplementation, which has an optimization to readsizeof(size_t)bytes at a time when__GNUC__is defined: https://github.com/ziglang/zig/blob/master/lib/libc/musl/src/string/strlen.c- When targeting musl,
strlenis ~2x slower than glibc's for the 512mb file (so still much faster than Zig's). When commenting the optimization out and targeting musl, the performance of Zig'ssliceToand musl'sstrlenis comparable
- When targeting musl,
- Here's a stack overflow thread on glibc's strlen optimizations: https://stackoverflow.com/questions/57650895/why-does-glibcs-strlen-need-to-be-so-complicated-to-run-quickly
- It seems like, for most targets, glibc's
strlenwill use a hand-optimized assembly version - EDIT: Another stackoverflow answer with more info on the glibc assembly optimizations (see the
strlen is the canonical examplesection)
- It seems like, for most targets, glibc's