rust-multibase icon indicating copy to clipboard operation
rust-multibase copied to clipboard

Make it no-alloc compatible

Open vmx opened this issue 5 years ago • 10 comments

The data-encoding library is no_alloc compatible, you can use e.g. encode_mut.

We could provide a similar trait, which writes into a mutable slice.

Things to keep in mind:

  • base-x currently needs alloc. It's used for bases which are not power of two.
  • It increases the API surface, currently this library is pretty small, should we change that?
  • Can we make it work without making this library hard to use when you don't need no_std and no-alloc compatibility?
  • How would it relate to rust-cid? Could we follow the idea of rust-multihash and use GenericArray when converting into a certain Multibase? Given that we know the maximum digest size of the CID, we also know the possible maximum base encoded size at compile time. Can this be wrapped in a nice to use API?

vmx avatar Nov 14 '20 00:11 vmx

Feature flags can do a lot of work here to get no-alloc available. If for now we just want to have it as an option, we could throw in the necessary feature flags and be on our way. Default features would be std, with options to specify if they want no_std with alloc, or no features whatsoever (no base-x, no alloc, no std). It would just be a matter of making the effects of the features obvious, so probably a section in README.

For API, copying decode_len, decode_mut, etc. from data_encoding is only growing the amount of functions by 4 and it's only purpose would be for the specific use case of no alloc. Optionally it might be nice to add as/from_str

Generic array would really be a whole new section of the API (encode x as y with the size of z) as it could easily interfere with the core of the lib (encode x as y). If complexity of API is of concern it would probably be best for cid to do the extra logic at runtime for the time being. If it is really beneficial to cid then it might be worth the effort though currently I can't think of a clean way of doing it (at least without const generics and features from the RFC that don't even work currently).

mriise avatar Nov 16 '20 08:11 mriise

It also looks like with some work base-x can be converted to no-alloc as well.

mriise avatar Nov 16 '20 08:11 mriise

If complexity of API is of concern it would probably be best for cid to do the extra logic at runtime for the time being.

This aligns with my thoughts. I find it hard to come up with good/usable APIs without a proper consumer/use case. Hence I would lean towards making it possible to use rust-multibase with no-alloc somehow, but don't spend to much time on a super tight integration with rust-cid. Though I would use rust-cid use case as a consumer in the sense of "what would it look like if you would want to convert a CID into a base-encoded string in a no-alloc environment.

There wouldn't need to be a fleshed out API for rust-cid. I think it would be good enough to have it in the rust-cid documentation a section with an example on how to do it (or an example as code in the rust-cid repo). If it then really becomes used by someone we could think about a tighter integration.

vmx avatar Nov 16 '20 11:11 vmx

Decided to get back to work on the no-alloc base-x https://github.com/mriise/smol-base-x Currently its rather rough, and im considering if supporting utf over 1 byte is worth it. It would simply things a fair amount, maybe just as a feature flag?

Also working on an API using const generics to generate the backing arrays.

mriise avatar Feb 27 '22 05:02 mriise

im considering if supporting utf over 1 byte is worth it.

I don't think it's worth it. Base encodings are usually ASCII, which is also often the point of them. The only use case are those emoji encodings that I've so far only seen to be used for fun or experimenting with packing more data into less bytes (e.g. for Tweets). So from a practical level, I think having the limitation of supporting ASCII based alphabets only is totally fine.

vmx avatar Feb 28 '22 09:02 vmx

@vmx i'd like to re-open this for some more discussion now, as i finally got around to releasing a version of https://docs.rs/smol-base-x/0.1.0/smol_base_x/

mriise avatar Jul 04 '22 14:07 mriise

One option is modifying trait BaseCodec to

pub(crate) trait BaseCodec<En, De> {
    const CODE: char;
    
    /// Encode with the given byte slice.
    fn encode(input: impl AsRef<[u8]>) -> Result<En>;
    /// Decode with the given string (as slice).
    fn decode(input: impl AsRef<str>) -> Result<De>;
}

std impl will look like

impl BaseCodec<String, Vec<u8>> for $type

no-alloc could be (likely to change)

impl<const S: usize, const B: usize> BaseCodec<heapless::String<S>, heapless::Vec<B>> for $type

no_alloc public fn might look like

fn encode_arr<const S: usize>(
    base: BaseEnum,
    input: impl AsRef<[u8]>,
) -> Result<heapless::String<S>> {
    use impls::smol::Base10;
    use impls::BaseCodec;
    let input = input.as_ref();
    // needs to be a different error
    // String size <S> needs to be defined, though Decode <D> doesnt matter and isn't used so it can be whatever
    let mut out =
        <Base10 as BaseCodec<heapless::String<S>, heapless::Vec<u8, 0>>>::encode(input).unwrap();
    // SAFETY: this trusts that encode() leaves an open byte in the begining
    unsafe { out.as_mut_vec()[0] = base.code() as u8 };
    Ok(out)
}

This would allow for preservation of current API since BaseCodec is pub(crate). We will likely need a new Code enum for no_alloc encoding, but offering From (and probably TryInto for std -> no_std code) between the two should make it easy enough to work with.

mriise avatar Jul 04 '22 20:07 mriise

I'm using heapless here but we can move to an alternative non-alloc output container.

mriise avatar Jul 04 '22 20:07 mriise

Thanks, all this sounds good to me.

We will likely need a new Code enum for no_alloc encoding

Is this because you need different implementations of the same functions for alloc/no-alloc?

vmx avatar Jul 05 '22 09:07 vmx

Is this because you need different implementations of the same functions for alloc/no-alloc?

yes, it can be done with feature flags per code [adding feature(..) and feature(not(..)) ] but then you are forced to use either alloc or no alloc, when it is simple enough to just have two enums where one is removed when alloc is disabled.

mriise avatar Jul 05 '22 16:07 mriise