phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Remove deallocate and reallocate from GCAllocator

Open pbackus opened this issue 3 years ago • 25 comments

This is necessary to make clients of GCAllocator (e.g., generic containers) usable in @safe code.

Memory allocated by GC allocator will still be freed by the GC when it is no longer reachable.

Rejected alternatives:

  • Making GCAllocator.deallocate a no-op method that always returns false. The documentation in std.experimental.allocator.building_blocks specifically says not to do that.

  • Special-casing client code for GCAllocator to avoid calling its 'deallocate' method, while still calling 'deallocate' for other allocators.

'deallocate' has been documented as an optional method since std.experimental.allocator was introduced in 2015 (commit 8b249a6240), so this change will not break code that correctly adheres to the documented allocator interface (i.e., checks for the presence of 'deallocate' before calling it).

Fixes issue 23318.


Submitted as a draft to see what breaks on Buildkite.

This is going to be pretty disruptive, but better to rip the band-aid off now while allocators are still in std.experimental than to be stuck with a design mistake forever.

TODO

pbackus avatar Sep 03 '22 01:09 pbackus

Thanks for your pull request and interest in making D better, @pbackus! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23318 enhancement GCAllocator should not implement deallocate

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8554"

dlang-bot avatar Sep 03 '22 01:09 dlang-bot

What about a use case where you want to use the GC as a standard allocator, but get the benefits of having the memory scanned because it's in the GC?

I'm wondering if there might be template config flags we should use on the GCAllocator. I'd also like to see a way to generate blocks with NO_SCAN bit set.

schveiguy avatar Sep 03 '22 02:09 schveiguy

What about a use case where you want to use the GC as a standard allocator, but get the benefits of having the memory scanned because it's in the GC?

We can add a new allocator for this if it's actually something people really want (perhaps GCHeapMallocator?). My guess is that it's not, though.

Worth noting that emsi_containers handles GC scanning at the container level; see e.g. the supportGC template parameter in DynamicArray. This has the advantage of also working with allocators like Mallocator that don't use the GC heap.

I'm wondering if there might be template config flags we should use on the GCAllocator. I'd also like to see a way to generate blocks with NO_SCAN bit set.

Out of scope for this PR, but please file an enhancement request on bugzilla so this doesn't get lost.

pbackus avatar Sep 03 '22 02:09 pbackus

  • Making GCAllocator.deallocate a no-op method that always returns false. The documentation in std.experimental.allocator.building_blocks specifically says not to do that.

What about making GCAllocator.deallocate a no-op method which always returns true? I think that would be most meaningful.

I would interpret that if an allocator does not have deallocate (or as the documentation says, has one but it always returns false), then it's a grow-only allocator that cannot free memory. This will make it perfectly suitable for grow-only data structures (or other grow-only allocators), but data structures which actually need to deallocate ought to fail to compile when given such an allocator. Otherwise, it would be easy to accidentally pass in a grow-only allocator to a data structure which needs an allocator capable of deallocation, and as a result get a silent memory leak.

CyberShadow avatar Sep 03 '22 16:09 CyberShadow

I subjectively disagree with this paragraph from the issue description rationalizing this change:

Since GCAllocator is intended to be std.experimental.allocator's interface to D's built-in GC, it should match the behavior of the GC as closely as possible.

I don't think so. The definition is an implementation of a std.allocator allocator backed by memory managed by the D GC. I don't see why the fact that the allocated memory is coming from the managed heap leads to the conclusion that it must also follow semantics of the language's use of the GC (i.e. things like new).

CyberShadow avatar Sep 03 '22 16:09 CyberShadow

What about making GCAllocator.deallocate a no-op method which always returns true? I think that would be most meaningful.

A no-op deallocate method does not free any memory, so having it return true would be lying to the caller.

In support of my interpretation: there is precedent for having a no-op deallocate return false:

I cannot find any precedent for a no-op deallocate method that returns true.

I would interpret that if an allocator does not have deallocate (or as the documentation says, has one but it always returns false), then it's a grow-only allocator that cannot free memory.

The only thing you can conclude from the absence of a deallocate method is that it is not the allocator client's responsibility to free the memory it allocates. That may be because it is a grow-only allocator, or it may be because some other party is responsible for freeing the memory (such as the D runtime, or the user). You cannot tell the difference just by looking at the allocator's interface.

data structures which actually need to deallocate ought to fail to compile when given such an allocator.

Of course. But that's a pretty niche requirement—most data structures are perfectly capable of working with garbage collection.

pbackus avatar Sep 03 '22 17:09 pbackus

The definition [of GCAllocator] is an implementation of a std.allocator allocator backed by memory managed by the D GC. I don't see why the fact that the allocated memory is coming from the managed heap leads to the conclusion that it must also follow semantics of the language's use of the GC (i.e. things like new).

The DDoc comment describes it as "D's built-in garbage-collected allocator." That seems pretty unambiguous to me.

pbackus avatar Sep 03 '22 17:09 pbackus

A no-op deallocate method does not free any memory, so having it return true would be lying to the caller.

Yeah, well, not really?

How do you define when the method returns true anyway? That the memory is returned to the OS? Not true for most allocators. That the memory becomes instantly reusable and used for the next allocation? Not true for round-robin allocators. That the memory has been marked as free, and will eventually be reclaimed? But wait, with the GC, all memory is essentially "marked free" unless proven otherwise, and will be reclaimed when no longer referenced, unless you explicitly pin it with GC.addRoot or such.

I agree that it's not great because it can still cause regressions. Memory that was previously freed will now not be freed, potentially for a long time if some garbage somewhere (dangling pointer that is never going to be dereferenced) is or is interpreted as a reference to it, and will potentially result in an infinite memory leak if the application chains these events forming an accidental ever-growing linked list.

In support of my interpretation: there is precedent for having a no-op deallocate return false:

* [`StatsCollector.deallocate`](https://github.com/dlang/phobos/blob/v2.100.1/std/experimental/allocator/building_blocks/stats_collector.d#L577-L587) (not a true no-op; has unrelated side effects)

* [`CAllocatorImpl.deallocate`](https://github.com/dlang/phobos/blob/v2.100.1/std/experimental/allocator/package.d#L2959-L2973) (true no-op; required to implement `IAllocator`)

Right, that's not what I meant and not related to what I was trying to say.

I cannot find any precedent for a no-op deallocate method that returns true.

Yes, I think garbage-collector backed allocators are a bit of a special case here.

The only thing you can conclude from the absence of a deallocate method is that it is not the allocator client's responsibility to free the memory it allocates. That may be because it is a grow-only allocator, or it may be because some other party is responsible for freeing the memory (such as the D runtime, or the user). You cannot tell the difference just by looking at the allocator's interface.

Um, I think looking at the allocator's interface is literally the core of Design by Introspection, which is what std.allocator is bulit on.

But it's an interesting idea, is there precedent for this?

Of course. But that's a pretty niche requirement—most data structures are perfectly capable of working with garbage collection.

Sorry, what does that have to do with garbage collection? We may be talking past each other.

CyberShadow avatar Sep 03 '22 17:09 CyberShadow

The DDoc comment describes it as "D's built-in garbage-collected allocator." That seems pretty unambiguous to me.

I don't understand how you're reaching that conclusion just from that description. By itself it seems at odds with the general description of std.allocator allocators, which aim to facilitate manual memory management, right?

CyberShadow avatar Sep 03 '22 17:09 CyberShadow

By itself it seems at odds with the general description of std.allocator allocators, which aim to facilitate manual memory management, right?

My understanding is that std.allocator's allocators are supposed to allow memory-management strategies to be decoupled from containers, in the same way that ranges allow algorithms to be decoupled from data structures. So they should allow the user to choose between manual and automatic memory management.

pbackus avatar Sep 03 '22 18:09 pbackus

That makes sense. So it comes back to, if the question is "can this allocator deallocate?", and the answer is "yes, but it's not going to happen right away", how to represent this in the allocator interface in a meaningful way.

Generalizing that to "not immediately, but it's not something you should worry about" as you mentioned does makes sense to me... Though, we already can represent the "don't worry about it" part in the current allocator interface: just return true from deallocate. The problem with it though is that an inner allocator implementation won't know the circumstances in which it is invoked, so it can't know if its inability to deallocate is going to lead to a memory leak or not.

As such, a more correct approach to me here seems to be to add a layer which adds a no-op deallocate which returns true; the program building the allocator stack could then insert this layer as appropriately within the hierarchy to signal to inner allocator users that the inner-most allocators' inability to deallocate is not something they need to worry about.

Does that make sense?

Also, perhaps it would be useful regardless to have an allocator backed by the GC heap that does call GC.free.

CyberShadow avatar Sep 03 '22 18:09 CyberShadow

Another reason I'm uneasy about this change:

We have the following:

  • if a deallocate returns false, it means deallocation failed
  • a deallocate which always returns false is the same as no deallocate

We want to add the following:

  • if a deallocate returns false (or there is no deallocate method), it's not something that the caller needs to worry about; it may happen later.

However, I think that might be incompatible with things like round-robin allocators, which could try to deallocate a piece of memory from all child allocators in turn until one returns true. (owns seems very relevant here but the I don't see a description of the relationship between owns and deallocate in the documentation.)

CyberShadow avatar Sep 03 '22 18:09 CyberShadow

~~Oh, I guess I should add that, since "a deallocate which always returns false is the same as no deallocate", it doesn't make sense to say that it's OK to remove GCAllocator.deallocate but not OK to make GCAllocator.deallocate a no-op which returns false?~~ Right, they're synonymous but the latter is redundant, so might as well remove it.

CyberShadow avatar Sep 03 '22 18:09 CyberShadow

So it comes back to, if the question is "can this allocator deallocate?", and the answer is "yes, but it's not going to happen right away", how to represent this in the allocator interface in a meaningful way.

I think the relevant question to ask is: "should the client of this allocator call deallocate?"

Since we're using DbI, the presence of deallocate signals to clients that the answer to that question is "yes"; its absence signals "no."

For the GC, we know the answer should be "no", because the entire point of GC is that you don't free memory manually. So in order to communicate that answer to clients of GCAllocator, we should make sure GCAllocator does not implement deallocate.

Technically, it would not break anything to implement a no-op deallocate, or use a shim layer, or employ some other workaround. But it would violate the principles of design by introspection and structural typing.

Also, perhaps it would be useful regardless to have an allocator backed by the GC heap that does call GC.free.

Yes, perhaps. @schveiguy suggested the same thing in an earlier comment. I'm not opposed to it at all, as long as the name GCAllocator is reserved for the version that actually uses garbage collection.

pbackus avatar Sep 03 '22 18:09 pbackus

I think the relevant question to ask is: "should the client of this allocator call deallocate?"

Since we're using DbI, the presence of deallocate signals to clients that the answer to that question is "yes"; its absence signals "no."

I'm not sure this is useful. It's also not how DbI is used for other methods.

For shim allocators, introspection is mainly done so that the primitive's presence is propagated up the chain.

For other cases, the presence is used like, "Does the underlying object support operation X? If yes, great, I'll use it! If not, oh well, I'll do something less efficient or signal an error". E.g. for reallocate and expand, the fallback is allocate + deallocate.

I understand that you're trying to say that not calling deallocate if we need to because it's not defined is as bad as not calling reallocate (and instead calling allocate + deallocate) because reallocate is not defined. Well, no, they're not the same, because one of them leaks memory.

This sort of thing ought to be explicit.

(Edit: deleted a message I need to think more about)

CyberShadow avatar Sep 03 '22 18:09 CyberShadow

It's also not how DbI is used for other methods.

Granted, but I'm not talking about other methods, I'm talking about deallocate. And a container must call deallocate if it is present—to do otherwise would be a bug.

This sort of thing ought to be explicit.

When the user passes GCAllocator as a template parameter to a generic container, they are explicitly telling the container to use the GC for memory management. Is that not enough?

pbackus avatar Sep 03 '22 19:09 pbackus

A shim layer is the conceptually correct solution here, because it makes a statement of information that would otherwise need to be transmitted downwards across layers (this would mean e.g. making all allocators a template template, with the inner template getting instantiated by parent allocators)

You've completely lost me here.

As far as I can tell, transmission of information is entirely one-way, even with no shim layer:

  • The user transmits their choice of memory-management strategy to the container via their choice of allocator.
  • The allocator transmits information about that strategy to the container via its interface.

pbackus avatar Sep 03 '22 19:09 pbackus

Granted, but I'm not talking about other methods, I'm talking about deallocate. And a container must call deallocate if it is present—to do otherwise would be a bug.

Okay. But I think it would also be a bug to ignore deallocate's absence if it would have been called otherwise.

When the user passes GCAllocator as a template parameter to a generic container, they are explicitly telling the container to use the GC for memory management. Is that not enough?

No, that's not enough. The user should be able to treat the container as a black box. The container describes its allocator requirements by mandating that the given allocator implements the needed primitives. Different containers may have different needs:

  • No need for deallocation at all, it is a grow-only container
  • The container allocates and deallocates memory
  • The container aggressively allocates and deallocates memory, and expects that deallocation actually performs deallocation

Currently we're not looking at the distinction between the last two points. For the first two points, this proposal enables accidentally passing an allocator which actually doesn't (and will never) deallocate to the second type of container and the program will happily compile. I think this is not OK, and we need the design to address it.

The current design does. Absence of deallocate is the same as deallocate returning false. deallocate returning false (or not existing) when it should have returned true is a logic bug. This change muddles the design more, if anything, by complicating that contract.

You have not addressed this BTW: https://github.com/dlang/phobos/pull/8554#issuecomment-1236178651

In short, this is a bad idea and makes the std.allocator interface less useful and more complicated for the sake of one corner case.

You've completely lost me here.

Yes sorry I already retracted that post.

CyberShadow avatar Sep 03 '22 19:09 CyberShadow

Here is a proposal for a way forward:

  • Go ahead and remove deallocate and reallocate from GCAllocator
  • Add a ManagedMemoryAllocator OSLT with the current unsafe behavior of GCAllocator
  • Add a StubDeallocator shim which has a no-op deallocate which returns true
    • This can be inserted at the user's discretion in the container/allocator chain in cases where the allocator-user component wants to deallocate, the used allocator does not have deallocate, but that's fine because a far-away mechanism will deallocate the memory anyway (a bottom-level allocator, the GC, or even the OS if the program is short-lived).
    • Should assert at compile-time that its parent allocator doesn't have deallocate
    • I think it should be possible to easily implement this shim with alias this.

CyberShadow avatar Sep 03 '22 19:09 CyberShadow

  • Go ahead and remove deallocate and reallocate from GCAllocator
  • Add a ManagedMemoryAllocator OSLT with the current unsafe behavior of GCAllocator

Sure, sounds good. I'll submit a new PR for the unsafe one.

  • Add a StubDeallocator shim which has a no-op deallocate which returns true

I'm not opposed on principle, but I don't see any reason to do this now, since it doesn't block anything. Either way, it's a separate PR.

pbackus avatar Sep 03 '22 20:09 pbackus

I'm not opposed on principle, but I don't see any reason to do this now, since it doesn't block anything. Either way, it's a separate PR.

Agreed, just making sure we have a path forward for this use case.

Thank you !!!

Sure, sounds good. I'll submit a new PR for the unsafe one.

I guess we should merge that first, so that e.g. ae could be updated to use it, then GCAllocator can have its methods removed.

CyberShadow avatar Sep 03 '22 20:09 CyberShadow

I left a comment on the issue itself, and I'll repeat it here: I'm not sure GCAllocator should exist. What is its purpose? If it deallocates, then how is it different from Mallocator (I'm guessing it's worse because it's slower). In what situations would one want to parameterise on the allocator type and slot in GCAllocator?

The GC has guarantees that GCAllocator doesn't, and can't, have.

atilaneves avatar Sep 09 '22 13:09 atilaneves

I'm not sure GCAllocator should exist. What is its purpose?

Currently? Not much.

If this PR goes through? To provide an interface between generic allocator-aware library code (e.g., containers) and D's built-in garbage collector.

If it deallocates, then how is it different from Mallocator (I'm guessing it's worse because it's slower).

There are some minor differences (no need to call addRange, GC will act as a backstop if you forget to free), but mostly, prior to this PR, it isn't different. That's why I think it should be changed.

In what situations would one want to parameterise on the allocator type and slot in GCAllocator?

Currently? Almost none.

If this PR goes through? The same situations where you'd currently use a GC-based container like the ones in std.container.

The GC has guarantees that GCAllocator doesn't, and can't, have.

Currently, yes.

If this PR goes through, they will provide almost exactly the same guarantees.

pbackus avatar Sep 09 '22 17:09 pbackus

to provide an interface between generic allocator-aware library code (e.g., containers) and D's built-in garbage collector.

I'm still trying to figure out what reason one would have to want to do that.

GC will act as a backstop if you forget to free

That's the problem - if client code forgets to deallocate, it will be wrong for every other allocator. The documentation does say to not support deallocate if it never does anything, but that means (like in this PR) a static if everywhere where memory would be deallocated, which is a burden on the user I'm not sure I'm comfortable with.

Since client code has to know when to possibly deallocate, why would one ever want to use GCAllocator in a context that isn't benchmarking it against other allocators?

There is however something to be said about using an allocator that can't possibly deallocate, since then we'll never have aliasing issues when freeing - it can't happen. But reallocation still remains a problem in that case.

If this PR goes through? The same situations where you'd currently use a GC-based container like the ones in std.container.

Those containers aren't allocator-aware, any that are need to deallocate.

atilaneves avatar Sep 09 '22 18:09 atilaneves

I'm still trying to figure out what reason one would have to want to do that.

Same reasons you'd want to use GC over manual memory management in any other context.

The most obvious one is memory safety. If this PR is merged, then containers instantiated with the new version of GCAllocator will be usable in @safe code, just like containers that allocate directly with new.

The documentation does say to not support deallocate if it never does anything, but that means (like in this PR) a static if everywhere where memory would be deallocated, which is a burden on the user I'm not sure I'm comfortable with.

That burden exists regardless of whether this PR is merged. deallocate has always been documented as an optional method of the allocator interface, and code that calls deallocate without a static if check while claiming to support arbitrary allocators has always been incorrect.

(If desired, I can split the addition of these static if checks out into their own PR—they are a necessary bug fix regardless of whether the changes to GCAllocator are merged.)

There is however something to be said about using an allocator that can't possibly deallocate, since then we'll never have aliasing issues when freeing - it can't happen. But reallocation still remains a problem in that case.

Reallocation will be handled by std.experimental.allocator.common.reallocate.

Those containers aren't allocator-aware, any that are need to deallocate.

The point is that if this PR is merged, an allocator-aware container instantiated with the new version of GCAllocator will behave the same way (w.r.t. memory management) as the containers in std.container, which allocate with new.

pbackus avatar Sep 09 '22 19:09 pbackus

@atilaneves Ping, anything I can do to help move this and #8555 forward? Would be happy to discuss options at this weekend's online beerconf, if that would help.

pbackus avatar Sep 23 '22 12:09 pbackus

@atilaneves Ping.

pbackus avatar Oct 15 '22 14:10 pbackus

I missed the first ping somehow, sorry about that. We should probably chat about this, feel free to email me.

atilaneves avatar Oct 18 '22 12:10 atilaneves

@pbackus @atilaneves have you reached a consensus on this?

RazvanN7 avatar Nov 15 '22 14:11 RazvanN7

Not yet, since we haven't talked, and I really think we should.

atilaneves avatar Nov 28 '22 21:11 atilaneves