flink-web Add Buffer debloating blog post

Apr 05 '22 08:04 akalash

"instabilities", do you mean "imbalances" or "varying conditions" (instabilities refer rather to failures than to stable conditions)

Let's discuss it later but I think that instabilities is more correct word since I mean situations when conditions/environment are not stable(for example, it can be related to losing constant losing and establishing network connections)

why "minimum possible overhead", do you mean "minimum latency" (typical duo is "maximum throughput" and/vs "minimum latency")

latency in general is not so important here. The latency for aligned checkpoints and memory usage for unaligned checkpoints are important. So I try to find somehow the correct word for both of these cases. Perhaps, something like "maximum throughput" vs "minimum effective memory"( or "memory overhead")

For Java non-networking/Netty affiliated people, make clear that the term buffer is here used as a data structure to store records but also as the unit that is transfered via network.

It is exactly what is described in Network buffer, isn't it? -- Flink gather records into the network buffer which is the smallest unit of sending data to subtasks. Perhaps, it makes sense to rephrase it but I think the explanation is exactly as you described.

Apr 12 '22 15:04 akalash

With instabilities you're in fact right because the parameters are not stable. Nevertheless, I find it confusing because instability usually refers to some "collapse" scenario which is not the main aspect here, right? I find "varying conditions" a bit more precise here because varying conditions require to adapt which buffer debloating tries to approach.

"gather records" and "smallest unit of sending data" is a bit confusing/non-technical. For example, Flink buffers outbound records into so called network buffers which are, roughly speaking, byte arrays of limited size and represent the smallest unit of data that is then sent via network.

Apr 19 '22 09:04 smattheis

Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.

I didn't get the question. It is backpressure to the upstream task from downstream task(there is no any or there). So it means that the current task can not proceed due to output buffers being full. Maybe I need to clarify something but I don't really understand what exactly.

Is it important to note that, in theory, one could adjust both parameters? Or is there a conceptual reason to adjust only buffer size? If not, I would remove that sentence.

I think it is important to know that since if the user know about two parameters(buffer size, buffer numbers) it is obvious question what and how buffer debloating change. Ideally, we should change both these parameters but by simplicity reason(at least as first implementation) we decided to change only buffer size. Perhaps, we can just simplify this sentence like Currently, the buffer debloating manages the memory usage by adjusting only the size of the network buffer while the number of network buffers remains always the constant

Looks good but why is 'Multiple inputs and unions' section from the linked manual missing?

We actually already implemented that so we should fix the original documentation instead. at least we need to check what was implemented already. I will do that.

Apr 29 '22 16:04 akalash

To checkpointing:

Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.

I didn't get the question. It is backpressure to the upstream task from downstream task(there is no any or there). So it means that the current task can not proceed due to output buffers being full. Maybe I need to clarify something but I don't really understand what exactly.

You're right I find my question confusing myself. What I actually mean is, why do you distinguish here backpressure and no backpressure scenarios? That's not so intuitive to me because the benefits could theoretically (I admit only theoretically and probably never in practice) be visible also if there is no backpressure, i.e., if input buffers have high fill level whilst never being full with no backpressure such that buffer debloating limits data in the buffers and reduces checkpoint time and checkpoint size likewise. Nevertheless, if I get your idea right I would suggest to have a little transition like and make your intuition explicit, e.g.:

The benefits of buffer debloating are best visible when backpressure occurs. The reason is that no backpressure means that the downstream task is processing input faster than (or equally fast as) data arrives such that its input buffer is barely filled and, consequently, buffer debloating has almost no visible effect.

This is the opposite when backpressure occurs as it means input buffers are frequently filled. Since buffer debloating limits the amount of data in the input buffer, we observe the following benefits with regard to checkpointing:

...

To buffer debloating mechanism:

I think it is important to know that since if the user know about two parameters(buffer size, buffer numbers) it is obvious question what and how buffer debloating change. Ideally, we should change both these parameters but by simplicity reason(at least as first implementation) we decided to change only buffer size.

I would add a side phrase with exactly that.

For simplicity, buffer debloating currently only caps the maximal used buffer size ...

and at the end of the paragraph mention

Nevertheless, the benefits of buffer debloating with regard to checkpointing are effective as described before.

Other comments:

I think you accidentally reverted some changes, e.g., in the summary: https://github.com/apache/flink-web/pull/524/commits/2f5be59dad428aa70eb97ecf018a412af13f37df

May 11 '22 14:05 smattheis

@akalash Didn't you want to ultimately complete this blog post?

Feb 24 '23 08:02 MartijnVisser

@akalash Didn't you want to ultimately complete this blog post?

Oh, it is so a good point. I surely have to find time for this.

Feb 24 '23 08:02 akalash