zig icon indicating copy to clipboard operation
zig copied to clipboard

fmt shouldn't touch up hex escape sequences in zon and extern identifiers (`.@"\x0a"` => `.@"\n"`)

Open SeriousBusiness101 opened this issue 1 year ago • 6 comments

Input:

.{
                      .@"\x01" = .{""}, .@"\x02" = .{""}, .@"\x03" = .{""},
    .@"\x04" = .{""}, .@"\x05" = .{""}, .@"\x06" = .{""}, .@"\x07" = .{""},
    .@"\x08" = .{""}, .@"\x09" = .{""}, .@"\x0A" = .{""}, .@"\x0B" = .{""},
}

Currently fmt outputs:

.{
    .@"\x01" = .{""},
    .@"\x02" = .{""},
    .@"\x03" = .{""},
    .@"\x04" = .{""},
    .@"\x05" = .{""},
    .@"\x06" = .{""},
    .@"\x07" = .{""},
    .@"\x08" = .{""},
    .@"\t" = .{""},
    .@"\n" = .{""},
    .@"\x0b" = .{""},
}

Instead of:

.{
    .@"\x01" = .{""},
    .@"\x02" = .{""},
    .@"\x03" = .{""},
    .@"\x04" = .{""},
    .@"\x05" = .{""},
    .@"\x06" = .{""},
    .@"\x07" = .{""},
    .@"\x08" = .{""},
    .@"\x09" = .{""},
    .@"\x0a" = .{""},
    .@"\x0b" = .{""},
}

Edit:

Most relevant for ZON and extern. Should probably only apply to those? What I'm suggesting is fmt shouldn't perform this canonicalization on interchangable bytes representations where they're known to exchange out of Zig's realm.

SeriousBusiness101 avatar Mar 21 '24 19:03 SeriousBusiness101

Is there a strong motivation for this beyond a desire to align source code? Keep in mind that you can disable the auto-canonicalization with // zig fmt: off. The original motivation for canonicalizing identifiers was greppability; see #166 and its linked pull request. ASCII bytes in identifiers currently have only one canonical representation each.

castholm avatar Mar 21 '24 21:03 castholm

@castholm The motivation as in my other issues is foremostly object notation. The desire is 8-bit cleanness, in situations where you're reaching outside of a vacuum. I don't think it serves well to arbitrarily canonicalize interchangable byte representations, at the crossroads where they're interchanged. Andrew's words https://github.com/ziglang/zig/issues/14534:

Even JSON has the ability to represent null bytes in map keys. It would be a crime to introduce a new data exchange format and have it be worse than JSON at representing a mere map data structure.

SeriousBusiness101 avatar Mar 21 '24 22:03 SeriousBusiness101

Not really sure how Andrew's comment there is relevant here. Canonicalizing helps with searchability, as @castholm says, and if you really want to disable it you can turn off zig fmt for those lines

silversquirl avatar Mar 21 '24 22:03 silversquirl

Actually this form of canonicalization isn't a measure that helps with searchability, most certainly not in the example I presented here @silversquirl. Andrew knows a crime when he sees one.

SeriousBusiness101 avatar Mar 21 '24 22:03 SeriousBusiness101

canonicalizing helps with searchability in the common case because it means you can search for @"\t" and find all the fields that are named a single tab character. Your usecase (non-textual field names) is uncommon, and not one that I think Zig should design around.

silversquirl avatar Mar 21 '24 22:03 silversquirl

@silversquirl It's not a non-textual use case. It's disengaged from an exclusive canonicalization for a specific encoding, in what is essentially it's own interchange format. This makes sense in extern too. There isn't profound complexity to grep conditions, and I don't think this friction in particular is bad for that matter.

SeriousBusiness101 avatar Mar 21 '24 23:03 SeriousBusiness101