zig icon indicating copy to clipboard operation
zig copied to clipboard

std.fmt proposal: improve format string syntax

Open HydroH opened this issue 1 year ago • 6 comments

Extracted from #14436 and #19473 . Currently, std.fmt format string looks like this: [argument][specifier]:[fill][alignment][width].[precision]. It lacks the following capabilities when formatting numbers:

  • Ability to specify whether or not a positive sign should be shown before positive numbers;
  • Ability to pad the number with leading zeroes.

As mentioned here, formatting numbers with leading zeroes and padding strings with fill characters are very different things, and padding zeroes should probably deserve a separate option. If the options for explicit signs and padding zeroes were introduces, what syntax should they be? What options could they provide?

HydroH avatar Mar 30 '24 04:03 HydroH

For reference, the Python format spec looks like this: [[fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type] The sign field could be +, - or , which stands for signs for both positive and negative numbers, signs for only negative numbers, and pad a leading space before positive numbers respectively. The 0 field specifies leading zeroes should be padded before numbers. There are other useful and options that may be worth introduced into the zig formatting system.

HydroH avatar Mar 30 '24 04:03 HydroH

Is the Python format spec generally considered to be good and popular? If so, let's just copy their spec exactly unless it has some kind of glaring issue.

andrewrk avatar Jul 01 '24 23:07 andrewrk

Decimal alignment for float/fixed (align with decimal point in specific location) would be useful.

Paul-Dempsey avatar Jul 02 '24 00:07 Paul-Dempsey

Is the Python format spec generally considered to be good and popular?

If Zig isn't interested in innovating in this space we could certainly do a lot worse than copying Python and Rust, which both use the same general placeholder syntax and which are both popular enough that such a syntax should feel familiar to most users:

Python:
    replacement_field = "{" [field_name] ["!" conversion] [":" format_spec] "}"
    format_spec       = [[fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type]

Rust:
    format      = '{' [ argument ] [ ':' format_spec ] [ ws ] * '}'
    format_spec = [[fill]align][sign]['#']['0'][width]['.' precision]type

However, I think these specs still run into the underlying problem of conflating alignment with sign/prefix-aware zero-padding and the way they work around this feels ad-hoc and unsatisfying.

In Python, the = alignment option inserts the padding after the sign/prefix but before the digits. The = option is only valid for numeric types and raises an error when used with e.g. strings. As a workaround, you can prefix the width option with a leading zero, which can be seen as a shorthand for a 0= fill/alignment for numeric types and 0< for e.g. strings.

>>> '{:0>8}'.format(123)
'00000123'
>>> '{:0>8}'.format(-123)
'0000-123'
>>> '{:0=8}'.format(-123)
'-0000123'
>>> '{:0=8}'.format('abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: '=' alignment not allowed in string format specifier
>>> '{:08}'.format(-123)
'-0000123'
>>> '{:08}'.format('abc')
'abc00000'

In Python, an explicit fill/alignment option overrides the leading zero. Rust doesn't support the = alignment, only the leading zero shortcut, which overrides any fill/alignment options and causes them to be ignored.

I also think it's a bit awkward that the way zero-padding works is that you specify the minimum width of the formatted text, not how digits you want to pad the integer part to. If you want to e.g. pad a number to 5 integer digits and 5 fractional digits, you have to take all other options into account and do a bunch of math in your head to figure out which width you should specify, which makes the true intent of the format placeholder less clear to the reader. This also doesn't take the sign into account, which might result in one digit fewer for negative numbers.

>>> '{:011.5f}'.format(123.45)
'00123.45000'
>>> '{:011.5f}'.format(-123.45)
'-0123.45000'

I talk about some of the problems with the status quo a bit more in #20152 but I think Zig would greatly benefit from not making the same mistake of mixing numeric formatting options with generic padding/alignment options that most other string formatting syntaxes have made. The exact syntax can be discussed (it might still be worth copying Python/Rust's general '{'[argument][':' format_spec]'}' syntax where the type/specifier is placed after the : and all the options) but having something like {d+3.3: >10} which clearly separates

format the number in decimal with a positive sign, zero-pad the integer part to a minimum of 3 digits, round/zero-pad the fractional part to 3 digits

from

then right align the formatted text, padding with space until a width of 10 bytes

would probably be beneficial.

castholm avatar Jul 02 '24 12:07 castholm

Python has three built-in formattings: %, format() and f-strings. They look similar but have differences and their own limitations. The most commonly used is format() and then f-strings. %-formatting is reminiscent of python 2.

I doubt that it's feasible to extend Python's syntax in a non-breaking way (100% compatible) and non-cryptic way, so how about having two formatting libraries:

  • The Zig standard library: copy Python's most used formatting scheme (most likely format()) and be 100% compatible with it so as to ease transition to Zig,
  • An advanced formatting library for use cases not covered by the standard library, for example zero-format a negative number and at the same time pad it with spaces (e.g. "-00042.5 ")

With the advanced formatting library (the 2nd bullet point) we should try to be smart about being as close as possible to the standard library formatting (the 1st bullet point) because nobody likes learning an entirely new formatting scheme. Moving apart too much from the existing raises higher the barrier to adoption.

eric-saintetienne avatar Jul 15 '24 10:07 eric-saintetienne

Any option to print .0 if a float is round number.

emar-kar avatar Aug 02 '24 17:08 emar-kar