`mem.span(p)` is 10x slower than `strlen(p)`

Open Jarred-Sumner opened this issue 3 years ago • 1 comments

Zig Version

0.10.0-dev.1120+0ea51f7f4

Steps to Reproduce

Save this file:

const std = @import("std");

extern fn strlen(ptr: [*c]const u8) usize;

pub fn main() anyerror!void {
    var args = try std.process.argsAlloc(std.heap.c_allocator);
    var file = try std.fs.cwd().openFileZ(args[args.len - 1], .{ .mode = .read_only });
    var contents = try std.heap.c_allocator.dupeZ(u8, try file.readToEndAlloc(std.heap.c_allocator, std.math.maxInt(usize)));

    var time = try std.time.Timer.start();
    {
        var slice = std.mem.sliceTo(contents, 0);
        std.mem.doNotOptimizeAway(&slice);
    }
    const zig = time.read();

    time = try std.time.Timer.start();
    {
        var len = strlen(contents);
        std.mem.doNotOptimizeAway(&len);
    }
    const c = time.read();

    std.debug.print(
        \\Reading {s} ({any})
        \\
        \\  std.mem.sliceTo: {any}
        \\           strlen: {any}
    ,
        .{
            std.mem.span(args[args.len - 1]),
            std.fmt.fmtIntSizeBin(contents.len),
            std.fmt.fmtDuration(zig),
            std.fmt.fmtDuration(c),
        },
    );
}

Compile with:

zig build-exe -OReleaseFast long.zig -lc

Run it on a file

Expected Behavior

Same order of magnitude performance or better for medium-long strings (> 1 KB) equal or better for short strings

Actual Behavior

❯ ./long 512mb.file

Reading 512mb.file (512MiB)

  std.mem.sliceTo: 172.243ms
           strlen: 11.298ms         
                          
                                                  
❯ ./long 1mb.file

Reading 1mb.file (1MiB)

  std.mem.sliceTo: 809.583us
           strlen: 50.708us

❯ ./long 254.js

Reading 254.js (157B)

  std.mem.sliceTo: 167ns
           strlen: 83ns

It makes sense why. strlen has probably been optimized a lot over the years. It's probably using SIMD

Note: this is on macOS

May 11 '22 08:05 Jarred-Sumner

A few random things that I think are worth noting:

std.mem.sliceTo forces calculation of the length by searching for the terminator, even for slices that already have a length. This is not true of std.mem.span or std.mem.len, which would give you contents.len back immediately.
- EDIT: This is no longer the case as of https://github.com/ziglang/zig/pull/14417
Here's musl's strlen implementation, which has an optimization to read sizeof(size_t) bytes at a time when __GNUC__ is defined: https://github.com/ziglang/zig/blob/master/lib/libc/musl/src/string/strlen.c
- When targeting musl, strlen is ~2x slower than glibc's for the 512mb file (so still much faster than Zig's). When commenting the optimization out and targeting musl, the performance of Zig's sliceTo and musl's strlen is comparable
Here's a stack overflow thread on glibc's strlen optimizations: https://stackoverflow.com/questions/57650895/why-does-glibcs-strlen-need-to-be-so-complicated-to-run-quickly
- It seems like, for most targets, glibc's strlen will use a hand-optimized assembly version
- EDIT: Another stackoverflow answer with more info on the glibc assembly optimizations (see the strlen is the canonical example section)

May 11 '22 23:05 squeek502