Reduce monomorphization output by using more specialized code paths for deserializers
This PR tries to reduce the size of deserializers generated by rmp-serde by avoiding matching against irrelevant markers, instead using a specialized match expression that is generated using a macro to only handle the relevant possibilities.
Filtered output of cargo llvm-lines on one of the biggest crates of Garage, before this change:
3456 (0.3%, 45.3%) 72 (0.2%, 26.4%) tokio::runtime::task::core::Core<T,S>::set_stage
3461 (0.3%, 45.0%) 63 (0.2%, 26.1%) <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
3816 (0.3%, 44.7%) 72 (0.2%, 25.9%) tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
4509 (0.4%, 40.2%) 9 (0.0%, 22.1%) garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
4639 (0.4%, 39.8%) 63 (0.2%, 22.1%) <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
4644 (0.4%, 39.4%) 36 (0.1%, 21.9%) tokio::runtime::scheduler::current_thread::Handle::spawn
--
5826 (0.5%, 35.0%) 255 (0.8%, 17.1%) core::result::Result<T,E>::map
5916 (0.5%, 34.5%) 102 (0.3%, 16.3%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
6379 (0.5%, 34.0%) 94 (0.3%, 15.9%) core::iter::traits::iterator::Iterator::try_fold
--
14904 (1.3%, 24.3%) 432 (1.4%, 3.3%) std::panic::catch_unwind
30468 (2.6%, 23.1%) 196 (0.6%, 1.9%) rmp_serde::decode::read_str_data
103382 (8.8%, 20.5%) 197 (0.6%, 1.3%) rmp_serde::decode::any_num
138318 (11.7%, 11.7%) 196 (0.6%, 0.6%) rmp_serde::decode::Deserializer<R,C>::any_inner
1180351 31016 (TOTAL)
----- ------ -------------
Lines Copies Function name
and after this change:
3212 (0.4%, 33.3%) 1 (0.0%, 22.8%) garage_api_admin::router_v2::<impl garage_api_admin::api::AdminApiRequest>::from_request::{{closure}}
3216 (0.4%, 33.0%) 16 (0.1%, 22.8%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::collect_seq
3240 (0.4%, 32.6%) 72 (0.3%, 22.8%) tokio::runtime::task::harness::Harness<T,S>::release
--
3456 (0.4%, 31.2%) 72 (0.3%, 22.1%) tokio::runtime::task::core::Core<T,S>::set_stage
3461 (0.4%, 30.8%) 63 (0.2%, 21.9%) <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
3522 (0.4%, 30.4%) 15 (0.1%, 21.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_map
3816 (0.4%, 30.0%) 72 (0.3%, 21.6%) tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
3908 (0.4%, 29.2%) 39 (0.1%, 20.7%) alloc::vec::Vec<T,A>::extend_desugared
3972 (0.4%, 28.8%) 12 (0.0%, 20.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_seq
3999 (0.4%, 28.3%) 30 (0.1%, 20.5%) <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
--
4509 (0.5%, 25.6%) 9 (0.0%, 19.3%) garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
4639 (0.5%, 25.1%) 63 (0.2%, 19.3%) <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
4644 (0.5%, 24.6%) 36 (0.1%, 19.1%) tokio::runtime::scheduler::current_thread::Handle::spawn
--
5402 (0.6%, 21.9%) 235 (0.9%, 17.5%) alloc::boxed::Box<T>::new
5916 (0.6%, 21.3%) 102 (0.4%, 16.6%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
6212 (0.7%, 20.7%) 767 (2.9%, 16.2%) <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
--
11149 (1.2%, 11.6%) 362 (1.4%, 6.0%) tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
11539 (1.3%, 10.4%) 73 (0.3%, 4.7%) rmp_serde::decode::read_str_data
12427 (1.4%, 9.1%) 571 (2.2%, 4.4%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
14904 (1.6%, 7.7%) 432 (1.6%, 2.2%) std::panic::catch_unwind
25430 (2.8%, 6.1%) 72 (0.3%, 0.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_identifier
30260 (3.3%, 3.3%) 75 (0.3%, 0.3%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_struct
911552 26229 (TOTAL)
----- ------ -------------
Lines Copies Function name
As you can see, this reduces the size of the LLVM IR by 268799 lines, or 22.7% of all code generated by this crate.
In this example, most of the serialization and deserialization routines correspond to (de)serializing structs defined in this file
Neat! In my use case this does better than https://github.com/3Hren/msgpack-rust/pull/350 with 3436229 (vs. 3632031) LLVM lines and 59s (vs. 65s) compile time.