Register access is not inlined during compile
Hi,
I recently found out that the code generated by svd2rust (the one of stm32f1 crate) is very slow if compiled for STM32F103 as --release (about five times slower than expected). Disassembling the resulting code has shown that:
- All
Derefimplementations for all registers are called as functions and not inlined. - All
.ptr()methods of all registers are called as functions and not inlined.
Adding #[inline(always)] manually to these methods solved the issue (I got correct timings and well-optimized assembly code).
Perhaps svd2rust should generate #[inline] or even #[inline(always)] attribute for such calls, as otherwise the compiler sometimes fails to optimize here.
Regards, Alexey
That is very odd. Can you provide some assembly for comparison?
Of course. Relevant part of the code being compiled:
let mut led = gpioc.pc13.into_push_pull_output(&mut gpioc.crh);
loop {
led.set_high();
led.set_low();
}
Compiling with rustc 1.33 (stable) for thumbv7m-none-eabi with --release option. The following assembler code is generated:
80001f6: f000 fa1d bl 8000634 <_ZN7stm32f19stm32f1035GPIOC3ptr17he8d61d57514c9137E>
80001fa: 6105 str r5, [r0, #16]
80001fc: f000 fa1a bl 8000634 <_ZN7stm32f19stm32f1035GPIOC3ptr17he8d61d57514c9137E>
8000200: 6104 str r4, [r0, #16]
8000202: e7f8 b.n 80001f6 <main+0xc6>
In more complex examples (involving timer delays) I also observed calls to deref instead of ptr.
The today's nightly version generates the same code.
Adding #[inline] to ptr() like that:
impl GPIOC {
#[doc = r" Returns a pointer to the register block"]
#[inline]
pub fn ptr() -> *const gpioa::RegisterBlock {
1073811456 as *const _
}
}
resulted in the following machine code:
80001fa: 6020 str r0, [r4, #0]
80001fc: 6025 str r5, [r4, #0]
80001fe: e7fc b.n 80001fa <main+0xca>
which is the expected one. For more complex example I had to add #[inline] to Deref as well.
In my case it was enough to add just #[inline], there seems to be no need to use #[inline(always)].
Upd: enabling lto fixes the problem as well, but I believe it is expected to work without lto too.
The same thing happens with register bit set() and clear(): they are called as separate functions. I don't think that it's ok, because several bit-sets can be merged into one OR operation.
@Disasm merging register access is generally not a good idea since the hardware may behave differently depending on the order of operations. I.e. pulling data high, then pulling clock high is not the same as pulling data and clock high together. Changing multiple bits at once must be done in a single function call.
@agalakhov Yes, but I am talking about situation like this: https://github.com/mvirkkunen/stm32f103xx-usb/blob/57d23751367461bec5f39322727bdd65e5c2aa30/src/bus.rs#L162-L169
Which optimization level is this happening with? I believe --release uses opt-level=3 by default, which optimizes for speed. opt-level=s or z should optimize for size. (not saying that these call shouldn't be inlined at opt-level=3, just noting that it might be worth playing around with this).
@hannobraun it is opt-level=3 --release with enable_lto=false. While it is "optimizing by speed", it has problems exactly with the speed. Enabling LTO fixes the issue, but unfortunately LTO is not always applicable and off by default.
I believe the reason is, these function calls reside in separate compiler units, so the compiler just does not know that they could be inlined unless we add an inline hint.
@Disasm makes sense, you're observing the same issue.
I'm running into the same issue. The resulting (optimized) binary contains lots of ptr and deref and quite a few other functions, that should've been inlined.
Rustc doesn't inline functions across crate boundaries, unless there are generics involved, or #[inline] was specified.
I think there are no non-#[inline] functions left anymore after #389 and #390. All fns inside quote!{}s are now preceded by #[inline].
Related pull request https://github.com/rust-embedded/svd2rust/pull/515