Deprecation
@laurmaedje Hi! I'm giving up on this project. I have no further plans on working on it. I know you're using it in typst, so it's probably affects you.
I plan to either archive it or pass to someone else.
As for resvg, which is the reason this projects exists, I haven't decided yet. But basically I have a "choice" of using this deprecated version for now, switch to harfbuzz bindings or try out swash (I'm very skeptical).
Sad to see this :( We use it in Graphite currently, and will depend on it even more when we add more typesetting and desktop publishing related features.
I know COSMIC Text is also built upon rustbuzz so it might be worth pinging @jackpot51 as well.
I noticed you called out Swash but didn't mention Allsorts. The latter is still maintained. I'm not very familiar with either of them in terms of their maturity and features, but I was wondering if there's a reason you didn't mention Allsorts or comment on its status by comparison to Swash.
I would be willing to take over maintenance for bug fixes (although I probably won't add any features or make any significant optimizations). As cosmic-text and theo also both depend on this crate, I would like to see this maintained, and I'm sure there are many others who would as well.
@RazrFalcon It seems this crate is in the same boat as some of your other crates, like tiny-skia: used extensively throughout the ecosystem yet you no longer have the time/energy to maintain it. How would you feel about creating a GH organization, transferring some of your repos there, and then adding some interested volunteers to maintain those crates? I would definitely be happy to participate given the option.
This news might also be worth posting to Reddit. It's an invaluable part of the ecosystem and that could garner contributors. But I definitely see how...
Since v2.7.1, harfbuzz received 5813 commits.
...could be a pretty daunting proposition to keep up with.
@Keavon cosmic-text also supports swash, so they should be fine. Both shapers are basically dead anyway.
I do mention allsorts at the end of the readme. It should be pretty good, but it lacks some features (variable fonts for one) and focuses on subsetting.
In general, it's extremely hard to compare various shaping libraries. I guess running one agains harfbuzz's test suite is the only option right now. Because it's the golden standard.
This news might also be worth posting to Reddit.
@notgull I need someone who would be back-porting harfbuzz changes. Which no one probably will. Otherwise there is no point.
The problem with porting harfbuzz in the first place, with all due respect to the authors, is that it uses a very complex C++ subset. And despite the fact that I was a C++ developer myself for almost 10 years - I simply cannot read it. Templates with CRTP, custom iterators, custom std, macros, and so on makes it extremely hard to follow. Sure, I'm a C++ hater, but I still cannot read C++14 and newer. I just physically can't. This is why I've eventually switched to Rust and Swift. You basically have to rewrite harfbuzz into a simple/sane C++ first, spending days with a debugger, and then rewrite it to Rust. This is what I did! I wasn't porting harfbuzz, but rather a harfbuzz fork. And don't get me started on ragel...
Just to illustrate, it took me 6-8 months to port harfbuzz and 2-3 to port Skia (tiny-skia). And the amount of code is roughly the same. Skia's codebase is the best C++ codebase I ever seen, similarly to Qt. It has its quirks, but it's still very manageable and intuitive.
It's an invaluable part of the ecosystem
Well, I consider it a failure. I knew it was a bad idea even during writing. Fun fact, I technically gave up 2/3 of the way. If not @laurmaedje it would never be finished. The way harfbuzz and rustybuzz are written is highly different. It's not a C++ port to Rust, but rather a harfbuzz core algorithm rewrite in Rust. Which makes back-porting new changes extremely difficult.
@notgull
It seems this crate is in the same boat as some of your other crates, like
tiny-skia
Well, all of my crates are sort of dead. And there are many reasons for that. One of which is a complete lack of time. But I still can accept patches, which no one really sends. So... I do have time maintaining them, but not developing. Maybe after a couple of years I would have some free time.
The issue with rustybuzz in particular is that it must be updated/synced.
Hi, I’m the developer of Caxton and a contributor to Typst. I’m also willing to help in any way that I can.
Hi, I’m the developer of Caxton and a contributor to Typst. I’m also willing to help in any way that I can.
I think the first action item would be to port this code to match later versions of Harfbuzz. The first step would probably be to sync up with Harfbuzz v2.7.4, aka #37. Here's a list of all of the commits between v2.7.1 (which is what the current version of rustybuzz is built to match) and v2.7.4: https://github.com/harfbuzz/harfbuzz/compare/2.7.1...2.7.4
Another important thing to do would be to make the current code more maintainable. There's a few instances where machine-generated parser code has been hand-translated to Rust, and I'd like to port that code to use code that more accurately matches the Ragel code that it was generated from. This way it's much easier to match what the Harfbuzz team is doing.
I would normally reach for something like nom in this use case, but @RazrFalcon has a policy to use as few external dependencies as possible in their projects. Therefore it would probably be best to write a mini-parser-library inside of rustybuzz and then rewrite all of the parsing code using that.
At the moment I find myself preoccupied with getting Smol v2.0 out the door. But in a week (hopefully) I'll have freed myself up to focus on this.
@bluebear94 You're welcome!
Note that rustybuzz already has a couple of fixes from later versions, sort of. For example this fix already present. So don't be surprised.
@notgull
I would normally reach for something like nom in this use case
Why would you need nom? Ragel is a state-machine generator, not parser-generator.
Different folks are going to need different things. For cosmic-text I need a solid shaping solution with limited dependencies and the current version of rustybuzz is exactly that. I don't need it to stay in sync with harfbuzz, except when it changes what a user sees for the better. There are also numerous improvements that could be made to the API for my use case that would likely make backporting harfbuzz changes harder, but would improve performance of shaping and especially font fallback in cosmic-text. Due to having different needs than the other users, I am likely to fork rustybuzz into a component that is part of cosmic-text. The swash shaper would have to be as feature complete as rustybuzz for me to consider using it, a regression in cosmic-text capabilities is not acceptable.
@jackpot51 The problem with text shaping is that it cannot be "finished". It's a moving target. OpenType/AAT updates, Unicode updates. rustybuzz is a decent Unicode 12 shaper, not a Unicode 15 one. You would probably not see any issues in the near feature, but they will crop up. Not to mention actual bugs in rb/hb. That's why rustybuzz must be in-sync with harfbuzz. Otherwise there is no point. I don't care about version numbers and other superficial stuff. The newer harfbuzz is actually a better shaper.
As for the API, I know it's meh, but it's what harfbuzz provides as well. You're free to open issues and send patches. Some caching would help with performance as well, but it's hard to do in a safe way, unlike in harfbuzz.
@RazrFalcon if it is the case that none of the independent pure Rust shaping projects will be able to keep up with harfbuzz, I will look into using harfbuzz directly.
@jackpot51 Honestly, making a harfbuzz wrapper is the best solution for now. Especially since cosmic-text is mainly a Linux-only library, to my understanding.
But for something like resvg it's a nightmare, because it needs a C++ toolchain, it breaks wasm (afaik) and adds a lot of bloat, because it includes a lot of things you do not need, including C++ stuff (not a problem when dynamically linking on Linux).
There is really no ideal solution. And text shaping is an absurdly complicated task to "just write" one.
All of those problems still apply to cosmic-text as well, which supports all major OS platforms as well as no_std usage.
FWIW, in wezterm, I vendor in and wrap harfbuzz directly. For my purposes this gives me a consistent version of harfbuzz on all platforms. I don't have any wasm or no_std platforms to support so the C++ toolchain is an acceptable dependency for me.
https://github.com/wez/wezterm/tree/main/deps/harfbuzz "-sys" crate equivalent https://github.com/wez/wezterm/blob/main/wezterm-font/src/hbwrap.rs - slightly higher level bindings
I do something similar with freetype, because there is some inter-dependence between these two libraries.
FWIW HarfBuzz doesn't link to or require libc++ / libstdc++. I don't know how it would break wasm. We definitely ship HB wasm in https://github.com/harfbuzz/harfbuzzjs
Also: I agree that trying to track and port harfbuzz changes into a rustybuzz or some other project is a large ongoing undertaking. The harfbuzz folks are actively innovating and improving all the time.
Perhaps an alternate strategy to solve these problems from the perspective of the rust community would be to get a sense of whether there is interest/desire amongst the harfbuzz folks to see harfbuzz itself migrate to being implemented in rust and working with them to incrementally migrate from the inside out? That's also a huge undertaking, but it would be a bounded undertaking with lasting effects.
Rust's wasm32-unknown-unknown target (currently) can't link to C/C++ because of ABI incompatibilities.
Perhaps an alternate strategy to solve these problems from the perspective of the rust community would be to get a sense of whether there is interest/desire amongst the harfbuzz folks to see harfbuzz itself migrate to being implemented in rust and working with them to incrementally migrate from the inside out? That's also a huge undertaking, but it would be a bounded undertaking with lasting effects.
We are definitely interested. And it's in the scope for the https://github.com/googlefonts/oxidize project. I just would hate to give up some of the conveniences and optimizations of the C++ implementation. Namely the zero-parsing model; giving that up is a nonstarter to me.
cc @rsheeter
We are definitely interested. And it's in the scope for the https://github.com/googlefonts/oxidize project.
Great to hear! I have some experience in incremental migration to rust from my time in a FAANG, and also with using harfbuzz's API. I'm interested to see that work progress and perhaps even participate... if there's potential to get funded to do that, that will help as well :)
Namely the zero-parsing model
Can you clarify what you mean by that? Is that essentially memory mapping / casting buffers as the associated structs, or deferred/lazy parsing for certain parts of the data?
(happy to relocate this discussion to the oxidize project if it feels like we're getting too far off-topic from the fate of rustybuzz)
Namely the zero-parsing model
Can you clarify what you mean by that? Is that essentially memory mapping / casting buffers as the associated structs,
Yes. And relying on operator overloading to do byte-order swapping...
I see no reason why that couldn't be made to work in Rust, could either be done via some newtype wrappers or in some cases could derive via a proc macro accessor functions that both could handle the byteswapping on deref/access and so on.
For context, https://lib.rs/crates/read-fonts aims to provide HB-style reading. However, we won't get to shaping until we (optimistic hat firmly on) finish landing https://lib.rs/crates/skrifa.
@behdad
Namely the zero-parsing model; giving that up is a nonstarter to me.
Is it really that faster? ttf-parser has zero unsafe and does basically the same. Maybe with a slightly higher overhead due to imperative code instead of C++ templates-based DSL.
swash even claims to be faster that HB, while using the same idea. But maybe it pulls ahead in non-parsing stages, which is also strange, because it would be hard to beat state-machines.
Last time I've checked, rustybuzz wasn't that much slower (sure 50% is a lot, but not at ms scale) and there are a lot of optimization opportunities left. While it is 100% memory safe and has 0 memory leaks. A very tempting trade of.
HarfBuzz doesn't link to or require libc++ / libstdc++
I'm aware of that, and this in itself a strange feature, but by a toolchain I've meant the compiler as well. Right now you do not need a C++ compiler to build rustybuzz and resvg, which is the core idea and would not change.
@wez
happy to relocate this discussion to the oxidize project if it feels like we're getting too far off-topic from the fate of rustybuzz
All good. This is the right place for such a discussion. I think it's a good illustration of how complex text shaping is, when Rust has 3 independent implementations and all of them are sort of dead. Writing one is one thing. Keeping it up to date is completely different.
At least allsorts is company-backed, which is the only way for such a library to survive, imho. Sadly, allsorts has different priorities.
Is it really that faster?
ttf-parserhas zero unsafe and does basically the same. Maybe with a slightly higher overhead due to imperative code instead of C++ templates-based DSL.
It's not faster. But a lot more memory efficient. See for example:
https://docs.google.com/document/d/12jfNpQJzeVIAxoUSpk7KziyINAa1msbGliyXqguS86M/preview
I know for example Android cares a lot about that.
@rsheeter Yeah, I'm following fontations. I'm interested to see how well code generation would work. I've tried it myself, but it felt needlessly complicated and limited with little benefits. Sadly, most of TrueType tables are not just POD structures. CFF is a good example of that. Or even glyf.
(Just to interject, for Graphite we need Wasm support; currently we use RustyBuzz but had planned to switch to Cosmic Text so our preference would be keeping that pure Rust if at all possible.)
I'm planning to maintain no_std and wasm support in cosmic-text no matter what happens.
@behdad Yes, I saw this paper, but ttf-parser doesn't allocate as well. rustybuzz does allocate some GSUB/GPOS metadata, but it's a temporary hack. Otherwise memory usage between rustybuzz and harfbuzz should be identical.
The only overhead Rust has over C++ in this case is a mandatory bounds-checking. Which swash and allsorts try to avoid with some unsafe. I'm not a good programmer, so I tend to avoid unsafe completely.
The main benefit of Rust's/ttf-parser approach is that there is no need for separate validation and parsing steps, like in harfbuzz, which makes the code much simpler. But you do have to pay the higher price when parsing the same data over and over again.
And honestly, I was so burnt down by this port, that correctness was my only priority. I spent no time optimizing it.