rfcs RFC: Adding `ArrayVec` to `core`

Rendered

This RFC seeks to add a representation for variable-length data within a fixed-size array, with an interface similar to Vec. Goals are to unite and clarify existing concepts in core, provide language users with a representation for a common data structure that is not heap-depdent, and ease some FFI relationships.

Current use (subject to change) looks like:

use core::collections::ArrayVec;

const BUF_LEN: usize = 16;

let mut v: ArrayVec<i32, BUF_LEN> = ArrayVec::new();
v.push(1).unwrap();
v.push(2).unwrap();

Biggest issues to address

[ ] 1. Location: core::collections (new), core::array (existing), and core::ext (nex) have all been offered as suggestions
[x] 2. ~~to panic or not to panic simple operations like push() that will never (reasonably) fail for Vec are extremely failable for ArrayVec. I elected to make all potentiall failable methods return a Result or similar, based on the belief that core should attempt to never panic (in no_std environments, panicking is generally UB)~~ Any API should mirror Vec's existing API as much as possible, with possible additions for checked operations
[ ] 3. Generics interface: @thomcc brought up the possibility of using ArrayVec<[T; N]> with deref traits to allow for some code reuse, see the comment here. I personally think that the interface has some more potential for confusion than ArrayVec<T, N>, but the benefits are notable (I don't know enough to know if the code duplication comes from a compiler limitation or a language limitation).
[ ] 4. Slice backing: I see notable benefits to allowing an ArrayVec (or perhaps more accurately in this case, BufVec) being able to be backed by any slice, rather than just strictly arrays. I have yet to come up with a clean possible implementation for this, but think it is worth doing if it can be done without overhead or added confusion

Links

Original PR: #2990 (this was closed by the author due to inactivity, not because of a decision against)
Original Zulip thread
Active Zulip thread

Sep 15 '22 04:09 tgross35

I don't know enough to know if the code duplication comes from a compiler limitation or a language limitation

The code duplication is unlikely to be resolvable via either language or compiler improvements. It's somewhat fundamental, as the code won't be 100% identical (it's similar to taking &[T; N] vs &[T]).

I have yet to come up with a clean possible implementation for this, but think it is worth doing if it can be done without overhead or added confusion

That is similar to what I proposed, although perhaps I'm misunderstanding.

Sep 15 '22 04:09 thomcc

The tinyvec crate has a SliceVec type, if that's any help or inspiration.

Sep 15 '22 04:09 Lokathor

The code duplication is unlikely to be resolvable via either language or compiler improvements. It's somewhat fundamental, as the code won't be 100% identical (it's similar to taking &[T; N] vs &[T]).

Thank you for the clarification, that nuance is out of my league so I leave it to you & the other maintainers to decide

That is similar to what I proposed, although perhaps I'm misunderstanding.

I think it is possible with your implementation (which I mentioned in the doc) but just isn't in your linked example since there's no constructor for ArrVec<[T]>

The tinyvec crate has a SliceVec type, if that's any help or inspiration.

Yes that was exactly my thought! It looks like TinyVec uses Thom's implementation, so maybe that's a good place to start and just add the unsafe optimizations

Sep 15 '22 05:09 tgross35

an alternative is the storages proposal (wg-allocators issue) that allows an ArrayVec to be written Vec<T, [MaybeUninit<T>; N]> where an array is used instead of an allocator, the array is stored inline in the Vec rather than being allocated on the heap. you can also do a whole lot more with storages, e.g. implement a storage that stores a few elements inline and if there's too many stores them on the heap instead, you can also use a &mut [MaybeUninit<T>] as a storage, allowing reusing an existing buffer.

also, I think ArrayVec is a better name than StackVec since StackVec is imho a misleading name, it can easily be stored on the heap: Box<StackVec<[T; N]>> or Arc<MyStructContainingAStackVec>

also there's two Rendered links in the top comment, the broken one at the bottom should be removed.

Sep 15 '22 05:09 programmerjake

an alternative is the storages proposal that allows an ArrayVec to be written Vec<T, [MaybeUninit<T>; N]> where an array is used instead of an allocator, the array is stored inline in the Vec rather than being allocated on the heap. you can also do a whole lot more with storages, e.g. implement a storage that stores a few elements inline and if there's too many stores them on the heap instead, you can also use a &mut [MaybeUninit<T>] as a storage, allowing reusing an existing buffer.

That is extremely clean, great suggestion. core itself and anything no_std don't stand to benefit without some moving around, correct? And, is it possible to use a slice instead of an array as the allocator here? (those answers may be in the thread, I am still reading through)

also, I think ArrayVec is a better name than StackVec since StackVec is imho a misleading name, it can easily be stored on the heap: Box<StackVec<[T; N]>> or Arc<MyStructContainingAStackVec>

You are correct, ArrayVec is what is in the document. I just named my branch off the original proposal and named the PR wrong after it. Corrected the title

also there's two Rendered links in the top comment, the broken one at the bottom should be removed.

Thanks for the catch, fixed this

Sep 15 '22 05:09 tgross35

an alternative is the storages proposal that allows an ArrayVec to be written Vec<T, [MaybeUninit<T>; N]> where an array is used instead of an allocator, the array is stored inline in the Vec rather than being allocated on the heap. you can also do a whole lot more with storages, e.g. implement a storage that stores a few elements inline and if there's too many stores them on the heap instead, you can also use a &mut [MaybeUninit<T>] as a storage, allowing reusing an existing buffer.

That is extremely clean, great suggestion. core itself and anything no_std don't stand to benefit without some moving around, correct?

yes. i'd expect Vec and String (and maybe more) to move to core if storages are implemented. the only difference in alloc and std is that the type aliases for Vec and String would have the storage type argument default to the global allocator.

And, is it possible to use a slice instead of an array as the allocator here? (those answers may be in the thread, I am still reading through)

yes, that's what the &mut [MaybeUninit<T>] storage is.

Sep 15 '22 05:09 programmerjake

Wow, that concept really blows this right out of the water and I wish I had seen it earlier. It seems a bit like discussion has stalled unfortunately, the linked thread ended over a year ago and this one hasn't seen too much activity

I will do some poking on it tomorrow

Sep 15 '22 05:09 tgross35

based on the belief that core should attempt to never panic

This is an overstatement. There's even panics in built-ins, like [1][1].

That doesn't mean these APIs should panic, but I don't think it can be immediately discarded.

Sep 15 '22 07:09 scottmcm

Generics interface: @thomcc brought up the possibility of using ArrayVec<[T; N]> with deref traits to allow for some code reuse, see https://github.com/rust-lang/rfcs/pull/2990#issuecomment-848962572. I personally think that the interface has some more potential for confusion than ArrayVec<T, N>, but the benefits are notable (I don't know enough to know if the code duplication comes from a compiler limitation or a language limitation).

Couldn't the same be done with ArrayVec<T, N> and a separate type that it derefs to?

Sep 15 '22 08:09 SkiFire13

RE panics: I'd probably prefer that we have both push and try_push, for example. It mirrors Vec and I think that's worth something.

RE storage: It would be quite amazing to have Vec<T, [MaybeUninit<T>; N]> be a thing, and it does kind of sound like that would completely supplant any hypothetical ArrayVec. So maybe we need to make progress on the storages proposal first to see whether that's something that can move forward.

I am overall :+1: at least on the idea of having an abstraction like this in core though.

Sep 15 '22 12:09 BurntSushi

It seems a bit like [Storage] discussion has stalled unfortunately

I'm working on it, and will be bringing forward a project proposal early October. https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Recruiting.3A.20Storage.20API.20Project.20Group

Sep 15 '22 13:09 CAD97

I think the generic storage idea is neat and might have some use cases, but it’s well into diminishing-returns territory IMO in terms of how complex it is compared to what it brings. It could live in a crate for the (comparatively) few cases where it’s useful, while more specific Vec and ArrayVec would be in the standard library.

Sep 15 '22 15:09 SimonSapin

I think the generic storage idea is neat and might have some use cases, but it’s well into diminishing-returns territory IMO in terms of how complex it is compared to what it brings. It could live in a crate for the (comparatively) few cases where it’s useful, while more specific Vec and ArrayVec would be in the standard library.

imho storages would be far more useful specifically because it would fit into the standard library's types in the way allocators currently do in nightly, that wouldn't really be possible just using an external crate because you couldn't use it with the standard types then, you'd have to specifically opt into using those storage-enabled types which is much less likely to happen throughout the crate ecosystem.

with storages in the standard library 3rd-party crates could easily use them while still working with the standard types:

fn third_party_code<S: Storage>(v: &mut Vec<i32, S>) { ... }

// works great:
let mut my_vec = vec![1, 2, 3];
third_party_code(&mut my_vec);

that would not work with storages being a 3rd-party crate because the Vec and other types they would have to use won't be the standard Vec types. the below code can't work, so third party crates won't try to be generic over storages:

fn third_party_code<S: storages::Storage>(v: &mut storages::Vec<i32, S>) { ... }

// can't work:
let mut my_vec = vec![1, 2, 3];
third_party_code(&mut my_vec); // type mismatch Vec != storages::Vec

Sep 15 '22 15:09 programmerjake

Couldn't the same be done with ArrayVec<T, N> and a separate type that it derefs to?

Not really, the cleverness there is actually not the deref but the use of unsizing. Actually, I'm not sure why the deref is needed -- it's surprising that deref is needed at all there. It shouldn't be, but it seems to help method resolution perform the unsizing coersion.

Sep 15 '22 16:09 thomcc

I love the storages idea a lot, especially because I felt greedy even mentioning the possibility of ArrayString in this RFC - with storages, you get that for free. And any other collections, I'm actually really curious what the future of core and std will look like here since you could move a lot to core with this change.

@CAD97 is there anything publicly available to help shape the storages design, or are you sort of attempting to get something working quietly before discussion adds noise? (completely understandable if so).

Re: push -> Result or push -> maybe panic, I will adjust the RFC for the changes proposed. Perhaps my belief that core should never panic (comes from Linus's comment on allocating, and from my own work on embedded) would be better redirected into "there is advantage to presenting a non-panicking API for a lot of things". Curious how it will propegate to storages where your allocator may very llikely run out of space, and a lot of Vec usage don't really expect that.

I will keep this RFC around as a draft while we see storages develop, because having a "desirable" API for this sort of construct will probably help shape the goals there, things like methods to construct the object on an existing memory buffer (and, I suppose, if storages wind up not making it - which I hope isn't the case). And it probably wouldn't be totally unreasonable to provide something like type ArrayVec<T, N> = Vec<T, [MaybeUninit<T>; N]>; to help guide users. (also makes me wonder where Thom's comments about asm duplication fall into storages).

Sep 15 '22 16:09 tgross35

also makes me wonder where Thom's comments about asm duplication fall into storages

Sadly, storages must be monomorphized over the storage type, and it looks there's no way to perform the unsizing trick I used. I actually worry that this is likely to cause storages, if adopted in the stdlib, to cause a significant hit in compile times and code bloat due to additional monomorphizations.

That said, I agree that if we go with storages, it should likely be the canonical ArrayVec, even if a better one is definable without.

Sep 15 '22 16:09 thomcc

Sadly, storages must be monomorphized over the storage type, and it looks there's no way to perform the unsizing trick I used. I actually worry that this is likely to cause storages, if adopted in the stdlib, to cause a significant hit in compile times and code bloat due to additional monomorphizations.

I am still super naive to the concepts involved, but it continues to seem a bit weird to me that a kind of trick is required to prevent code bloat. It seems like the compiler would be able to notice cases where unsizing would help monomorphization and automatically apply them, that seems helpful outside the scope of this concept. Polymorphization application perhaps?

Sep 15 '22 16:09 tgross35

It would require some compiler heroics and in practice is impossible. Keep in mind that the code is genuinely different when monomorphized.

Sep 15 '22 16:09 thomcc

It seems like if sufficient care is taken, i.e. if the Vec API is precisely followed, an ArrayVec type could eventually be replaced with a type alias that uses the storages approach, e.g.

pub type ArrayVec<T, const N: usize> = Vec<T, [MaybeUninit<T>; N]>;

...which could allow an ArrayVec type to be stabilized first without depending on the stabilization of allocator_api and a hypothetical storages feature, but also in a way that can eliminate the duplication when they are stabilized.

Sep 15 '22 16:09 tarcieri

That's not possible, because a trait can be implemented separately for ArrayVec<T> and for Vec<T, Arr>, but would violate coherence if you make ArrayVec into a type alias. This means it must always stay a separate type, even if just a transparent wrapper.

Sep 15 '22 17:09 afetisov

@CAD97 is there anything publicly available to help shape the storages design

https://github.com/CAD97/storages-api (docs) is the current API draft. See the Zulip link for what details exist for the project group's plan; TL;DR

October: refine the proposed API design
November: write the RFC / API Change Proposal / whatever
December: propose & defend the proposal

Just add a message into the Zulip thread if you want to be CCd as a potential participant in the Project Group. Once that spins up, we'll have a proper setup for discussing/iterating the API; in the meantime feel free to open issues and/or discussions on my draft's repo.

The main purpose of the Project Group is to assess whether the Storage API is appropriate for stdlib. Complexity is a large concern, but it should be noted that the draft as it exists today is significantly simpler than the original drafts.

a trait can be implemented separately for ArrayVec<T> and for Vec<T, Arr>, but would violate coherence if you make ArrayVec into a type alias

The idea is that (as currently) the second type parameter on Vec would be unstable; stable code would only be able to name Vec<_> and ArrayVec<_, const _>, but not Vec<_, _>. If ArrayVec<T, N> is defined as an alias to Vec<T, InlineStorage<[T; N]>> before using a custom storage with Vec is made stable, it will not be breaking, as only Vec<_, AllocStorage<Global>> will have been namable for trait implementation prior.

Sep 15 '22 17:09 CAD97

It is my understanding that, eventually, the allocator (or storage) parameter of collections should become stabilized. Did something change on that front? It is quite unfortunate and poorly composable that currently arena crates need to provide their own duplicates of stdlib collections.

Sep 15 '22 17:09 afetisov

Yes, of course.

What I am saying is specifically that the storage parameter cannot be stable while ArrayVec is a distinct type if we want to eventually make ArrayVec an alias to Vec. (And that doing so is sufficient to make the shift nonbreaking.)

The point of the storage project group is to help advance us towards eventual stabilization of the parameter, by determining if std collections should be parameterized over storages or only indirection allocators.

Sep 15 '22 18:09 CAD97

an alternative is the storages proposal (wg-allocators issue) that allows an ArrayVec to be written Vec<T, [MaybeUninit<T>; N]> where an array is used instead of an allocator, the array is stored inline in the Vec rather than being allocated on the heap.

What if ArrayVec is introduced now as a new type, and if the storages proposal is accepted it is changed into a type alias?

Sep 16 '22 05:09 dlight

If ArrayVec is stable, that can't be done in a backwards-compatible way most likely. If any downstream has impl Foo for ArrayVec and impl Foo for Vec, they may conflict, particularly if they're generic over the storage on Vec already. I think there may be other cases, but that's the first that comes to mind.

Sep 16 '22 15:09 CraftSpider

What if ArrayVec is introduced now as a new type, and if the storages proposal is accepted it is changed into a type alias?

See four comments above: https://github.com/rust-lang/rfcs/pull/3316#issuecomment-1248384297

Sep 16 '22 16:09 Xiretza

@CraftSpider

As @CAD97 mentioned above, since the second generic parameter of Vec<T, A> which specifies the Allocator (as well as the Allocator trait itself) are unstable (gated behind the allocator_api feature), it's possible to stabilize ArrayVec in such a way it can be "upgraded" to a type alias later when that generic parameter of Vec is stabilized.

Unless that parameter is stable, it's not possible to write a conflicting impl on stable Rust, since any impl you can write today on stable Rust uses the default A = Global.

Sep 16 '22 17:09 tarcieri

it should be possible to upgrade Vec<T, A> where A: Allocator and a separate ArrayVec<T, N> to Vec<T, S> where S: Storage and type ArrayVec<T, const N: usize> = Vec<T, [MaybeUninit<T>; N]> if there's the impl:

impl<T, const N: usize> !Allocator for [T; N] {}

Sep 16 '22 20:09 programmerjake

(whether Allocator: Storage is undecided; I personally lean towards having a wrapper AllocStorage<A>. additionally, it's almost certainly going to be InlineStorage rather than directly an array, because Storage is (now) an untyped interface to memory. i'd also prefer InlineStorage<SIZE, ALIGN> (or better, a const Layout) to InlineStorage<[T; N]> but unfortunately it's not possible to #[repr(align(CONST))] yet.)

(but yes; theoretically, if there is a blanket implementation from Allocator to Storage, it would be possible to stabilize being allocator generic and then later generalize to storage generic. however, it would likely still require using a newly stabilized ArrayStorage type rather than just raw arrays for array storage, as std doesn't bound the generic on the struct definition (only on the impls) so it's perfectly allowed to name (and implement traits on) e.g. Vec<T, ()> despite () not implementing Allocator.)

Sep 16 '22 22:09 CAD97

however, it would likely still require using a newly stabilized ArrayStorage type rather than just raw arrays for array storage, as std doesn't bound the generic on the struct definition (only on the impls) so it's perfectly allowed to name (and implement traits on) e.g. Vec<T, ()> despite () not implementing Allocator.)

that seems to error out for me: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=09313310ddda8252d11bce5bdf7e853c

can you make an example that doesn't error?

Sep 16 '22 23:09 programmerjake