rfcs Format-preserving code printer

View as rendered text

We have been talking for a while about an option to preserve the output format after running Babel, so I wrote down how I think it should be done. Feedback welcome!

Jun 21 '24 16:06 nicolo-ribaudo

Thank you very much for the detailed description!

Would it be faster/more accurate/easier to implement if we required the corresponding options to be enabled in the parser or used tokens?

Also, I'm curious, how do users expect the formatting of the edited code? Perhaps it would be better to make appropriate adjustments (such as indentation) to the edited code?

Jun 24 '24 20:06 liuxingbaoyu

Would it be faster/more accurate/easier to implement if we required the corresponding options to be enabled in the parser or used tokens?

Yes the plan is to use tokens, since there is no other way of knowing the proper location for every single output token otherwise.

Also, I'm curious, how do users expect the formatting of the edited code? Perhaps it would be better to make appropriate adjustments (such as indentation) to the edited code?

This is something that we need to explore :) I think for now it would be good enough to have the "standard formatting" when using injected code, and only respect the format for the lines/nodes that are not modified.

Jun 28 '24 12:06 nicolo-ribaudo

You can also go about the same goal by using a CST, which is to say by storing a node's syntactic tokens together with that node instead of having them in a single flat tokens array. Then instead of having to go through a costly heuristic process of trying to figure out which tokens belong to which nodes, you'll just know.

Jul 17 '24 11:07 conartist6

A note: this would make Babel a first-class codemod engine, but it would remain unable to do one of the basic tasks of codemodding: inserting a blank line.

Also as far as I can tell when using the expression templating engine, formatting from the template expression would not be preserved. E.g.

expr`[   ${expr` [ ]   `} ] ;` // [[]] instead of [    []    ]

Jul 17 '24 11:07 conartist6

You can also go about the same goal by using a CST, which is to say by storing a node's syntactic tokens together with that node instead of having them in a single flat tokens array. Then instead of having to go through a costly heuristic process of trying to figure out which tokens belong to which nodes, you'll just know.

I am worried that this would need significant changes to transform plugins to preserve the CST info, while the goal of this proposal is to require changes (almost) only in the code generator.

Jul 17 '24 15:07 nicolo-ribaudo

Yeah, I think implementing this proposal is probably the best step Babel can take right now. I'm more trying to introduce a distinction between what is an small incremental improvement, vs what would it look like to solve this problem for the long run.

Jul 18 '24 10:07 conartist6

I think it's also worth noting that this puts Babel on a collision course with ESLint and Prettier. When things started out it was clear what was what. I'd describe it like this:

Babel is the mathematically robust system of code transformation, but it discards syntactic information
ESLint sees both structure and style of code, but respects your existing style choices
Prettier only touches style but brings strong opinions with it

Each of these tools has started to encounter similar problems like precedence, comment attachment, and multilingualism, though, so that it is no longer easy to explain the distinctions in which tool to use and when, and is especially not easy to explain how all the tools are meant to be used together -- programmatically I mean, not in the sense of "install all their VSCode extensions."

Jul 18 '24 13:07 conartist6