zig icon indicating copy to clipboard operation
zig copied to clipboard

`mem.span(p)` is 10x slower than `strlen(p)`

Open Jarred-Sumner opened this issue 3 years ago • 1 comments

Zig Version

0.10.0-dev.1120+0ea51f7f4

Steps to Reproduce

  1. Save this file:
const std = @import("std");

extern fn strlen(ptr: [*c]const u8) usize;

pub fn main() anyerror!void {
    var args = try std.process.argsAlloc(std.heap.c_allocator);
    var file = try std.fs.cwd().openFileZ(args[args.len - 1], .{ .mode = .read_only });
    var contents = try std.heap.c_allocator.dupeZ(u8, try file.readToEndAlloc(std.heap.c_allocator, std.math.maxInt(usize)));

    var time = try std.time.Timer.start();
    {
        var slice = std.mem.sliceTo(contents, 0);
        std.mem.doNotOptimizeAway(&slice);
    }
    const zig = time.read();

    time = try std.time.Timer.start();
    {
        var len = strlen(contents);
        std.mem.doNotOptimizeAway(&len);
    }
    const c = time.read();

    std.debug.print(
        \\Reading {s} ({any})
        \\
        \\  std.mem.sliceTo: {any}
        \\           strlen: {any}
    ,
        .{
            std.mem.span(args[args.len - 1]),
            std.fmt.fmtIntSizeBin(contents.len),
            std.fmt.fmtDuration(zig),
            std.fmt.fmtDuration(c),
        },
    );
}
  1. Compile with:
zig build-exe -OReleaseFast long.zig -lc
  1. Run it on a file

Expected Behavior

Same order of magnitude performance or better for medium-long strings (> 1 KB) equal or better for short strings

Actual Behavior

❯ ./long 512mb.file

Reading 512mb.file (512MiB)

  std.mem.sliceTo: 172.243ms
           strlen: 11.298ms         
                          
                                                  
❯ ./long 1mb.file

Reading 1mb.file (1MiB)

  std.mem.sliceTo: 809.583us
           strlen: 50.708us

❯ ./long 254.js

Reading 254.js (157B)

  std.mem.sliceTo: 167ns
           strlen: 83ns 

It makes sense why. strlen has probably been optimized a lot over the years. It's probably using SIMD

Note: this is on macOS

Jarred-Sumner avatar May 11 '22 08:05 Jarred-Sumner

A few random things that I think are worth noting:

  • std.mem.sliceTo forces calculation of the length by searching for the terminator, even for slices that already have a length. This is not true of std.mem.span or std.mem.len, which would give you contents.len back immediately.
    • EDIT: This is no longer the case as of https://github.com/ziglang/zig/pull/14417
  • Here's musl's strlen implementation, which has an optimization to read sizeof(size_t) bytes at a time when __GNUC__ is defined: https://github.com/ziglang/zig/blob/master/lib/libc/musl/src/string/strlen.c
    • When targeting musl, strlen is ~2x slower than glibc's for the 512mb file (so still much faster than Zig's). When commenting the optimization out and targeting musl, the performance of Zig's sliceTo and musl's strlen is comparable
  • Here's a stack overflow thread on glibc's strlen optimizations: https://stackoverflow.com/questions/57650895/why-does-glibcs-strlen-need-to-be-so-complicated-to-run-quickly

squeek502 avatar May 11 '22 23:05 squeek502