Simple `proc_macro::Span` operations should not require going through bridge
Motivation
This was mentioned in #149229 and since there doesn't seem to be a dedicated issue tracking this, I decided to make one.
Essentially, Span::join is a pretty key operation on spans: the very simple example of combining the open and close spans for a group to get the span of the whole group applies beyond groups to all sorts of parser implementations. The fact that this is not a trivial operation and requires going through the bridge means that it incurs a substantially larger performance penalty than you'd expect.
Potential Solutions
I decided to keep the problem statement intentionally vague to ensure that the solution set is large enough to allow compromises between performance and compatibility. It's been proposed that the solution to this is to make Span have a set representation across bridges, but this isn't actually necessary to solve this problem: we could, for example, represent Span as an opaque "source" ID and a start and end index. To actually get information about what the source represents, e.g. a file or macro call site, or a concrete line/column number, you would still have to go through the bridge. But, simple operations like joining spans could easily be done without having to go through the bridge.
Just kinda a rough idea of a possible representation. Spans in proc_macro are 4 bytes currently. The representation keeps Spans the same size. This has room for 64 file ids which is probably enough for most proc_macros. Since, when a proc_macro is invoked to be expanded it's likely only encountering source text from a few files. It could maybe even be smaller to free more bits for position and length. However the exact number bits, probably some sort of test would be ideal to derive a better numbers.
It can store 16384 starting positions and fragments up to 2047 bytes long inline, 14 and 11 bits respectively. The length of fragments a proc_macro typical handles are probably pretty short so maybe additional bits could be given to the position. If the inline file id could be smaller more bits could be given to these as well.
If the MSB is zero then this is a span too large to fit in the 4 bytes, and the rest of the bits are used are an ID starting at one.
use std::num::NonZeroU32;
#[repr(transparent)]
#[derive(Copy, Clone, PartialEq, Eq)]
struct SpanRepr(NonZeroU32);
impl SpanRepr {
const LEN_MSK: u32 = 0x7ff;
const POS_MSK: u32 = 0x3fff;
const FILE_MSK: u32 = 0x3f;
// These start at 1 to allow Non-zero niche
const SP_ID_MSK: u32 = u32::MAX >> 1;
pub fn lo(self) -> u32 {
if self.0.get() > Self::SP_ID_MSK {
return self.inline_lo();
}
let id = self.0.get() & Self::SP_ID_MSK;
// probably some thread_local table.
todo!("look up data based off this id");
}
pub fn len(self) -> u32 {
if self.0.get() > Self::SP_ID_MSK {
return self.inline_len();
}
let id = self.0.get() & Self::SP_ID_MSK;
todo!("look up data based off this id");
}
pub fn hi(self) -> u32 {
if self.0.get() > Self::SP_ID_MSK {
return self.inline_hi();
}
let id = self.0.get() & Self::SP_ID_MSK;
todo!("look up data based off this id");
}
pub fn file_id(self) -> u32 {
if self.0.get() > Self::SP_ID_MSK {
return self.inline_file_id();
}
let id = self.0.get() & Self::SP_ID_MSK;
todo!("look up data based off this id");
}
#[inline]
fn inline_lo(self) -> u32 {
self.0.get() & Self::POS_MSK
}
#[inline]
fn inline_hi(self) -> u32 {
self.inline_lo() + self.inline_lo()
}
#[inline]
fn inline_len(self) -> u32 {
(self.0.get() >> 14) & Self::LEN_MSK
}
#[inline]
fn inline_file_id(self) -> u32 {
(self.0.get() >> 25) & Self::FILE_MSK
}
}
The fact that this is not a trivial operation and requires going through the bridge means that it incurs a substantially larger performance penalty than you'd expect.
To raise some awareness, going through the bridge can be vastly more expensive on rust-analyzer's side depending on the operation as that will incur a whole RPC roundtrip from proc-macro server to rust-analyzer back to the proc-macro server (once we have this implemented properly). Which in theory can occur on Span::join depending on the argument spans for implementation reasons.
I will also add making Span less coupled to the bridge calls makes getting the proc_macro usable outside non-proc_macro instances easier (e.g https://github.com/rust-lang/rust/issues/130856). This is because span is everywhere, it's part of every token tree type so doing this has more benefits than just performance.
If any one has any other ideas, I have been messing around with a few ideas. However, due to this experimenting, I have noticed whatever the solution is used probably needs to some how account for SyntaxContext. It will likely need more than a unique file id and the start and end positions. Further, this extra bit of data, and however it ends up tracked, likely means Spans will need to be larger than 4 bytes or make all spans an ID into a table.
Currently my best idea involves treating SyntaxContext as an Opaque handle much like spans are currently treated. This does mean on the proc_macro side we are limited to only equality comparisons between SyntaxContext. However, if an operation is on the same file between two spans, and both syntax context handles are equal then all is good. However it gets more tricky if not equal. The options are pick one, or use the Root syntax context. The other option is fall back to a call over the bridge and let the compiler handle it which also is less than ideal since that's what this is trying to avoid.