Proposal: Number format rework
Hi! First of all, I'd like to express my sincere appreciation for umya-spreadsheet. It's truly fantastic! 🎉
I'm developing a Typst package that converts Excel tables into Typst format using WASM and umya-spreadsheet. During implementation, I've encountered some issues with number format parsing (see https://github.com/hongjr03/typst-rexllent/issues/9). After examining the code, I noticed that number formats aren't being fully parsed (my workaround is to use numfmt.js which supports locale and fully number format parsing).
I was wondering if you have any thoughts or suggestions on how to improve or rework the number format feature?
@hongjr03
Thank you for your report.
I created the Number format with reference to PhpSpreadsheet, but the process was too complicated for me to understand. I have passed some tests, but it still seems to have many defects.
It is possible to improve the accuracy by modifying the values that cause the defects, but it will take some time due to the above reasons.
If there is a good crate that does this process, we would really like to use it.
I came across this issue today while trying to match a formatted spreadsheet, so I had Claude take all of the tests from numfmt.js and build the crate I believe you wanted. I called it ssfmt (short for spreadsheet format). I just posted it to crates.io. I had it compile formats into an AST, so hopefully if you use that method it's not too slow. Please take a look and let me know if you'd consider using it. It fixed some of the issues I was running into. Whether or not you integrate this, it would be helpful if you exposed the 1904 date system flag from the Excel files so that consumers of umya-spreadsheet can use this information when formatting dates. Thanks for the great crate @MathNya !
I rewrote a Rust implementation based on the ideas from borgar/numfmt using Codex: hongjr03/numfmt-rs.
It hasn’t been very thoroughly tested yet, but so far it already covers many of my own number-formatting use cases. I think it could serve as a useful reference implementation for discussion or further development.
@hongjr03 that's funny that you had the same idea! My crate is at https://docs.rs/ssfmt/latest/ssfmt/ and https://github.com/ketbra/ssfmt
Had I known about your crate, I wouldn't have created ssfmt! Both of our crates have very good coverage. I ran mine against the full SheetJS test suite (I believe about 16+ million test cases), and it passes everything SheetJS's SSF doesn't skip. I took a slightly different approach than you. Rather than a straight port, I tried to make ssfmt more idiomatic Rust, and I have the option of pre-compiling an AST for a format and running it many times without having to re-parse. This seemed to make sense for spreadsheets where there are usually many more cells than formats.
I had Claude run a performance comparison against the SSF test suite for the two crates, and seeing where yours was faster for one shot (uncompiled) conversions was helpful. I used it as a reference per your suggestion and was able to improve some of the date and time parsing that originally weren't as efficient in ssfmt. If you want to see the timing results from Claude's testing, it can be found here: https://github.com/ketbra/ssfmt/blob/main/docs/COMPARISON.md
Thanks for sharing, and let me know if you'd like to collaborate!
@hongjr03 that's funny that you had the same idea! My crate is at https://docs.rs/ssfmt/latest/ssfmt/ and https://github.com/ketbra/ssfmt
Had I known about your crate, I wouldn't have created ssfmt! Both of our crates have very good coverage. I ran mine against the full SheetJS test suite (I believe about 16+ million test cases), and it passes everything SheetJS's SSF doesn't skip. I took a slightly different approach than you. Rather than a straight port, I tried to make ssfmt more idiomatic Rust, and I have the option of pre-compiling an AST for a format and running it many times without having to re-parse. This seemed to make sense for spreadsheets where there are usually many more cells than formats.
I had Claude run a performance comparison against the SSF test suite for the two crates, and seeing where yours was faster for one shot (uncompiled) conversions was helpful. I used it as a reference per your suggestion and was able to improve some of the date and time parsing that originally weren't as efficient in ssfmt. If you want to see the timing results from Claude's testing, it can be found here: https://github.com/ketbra/ssfmt/blob/main/docs/COMPARISON.md
Thanks for sharing, and let me know if you'd like to collaborate!
It was cool to see the comparison results, and I am happy to collaborate and exchange ideas!