scream icon indicating copy to clipboard operation
scream copied to clipboard

Rate adaptation seemingly unstable

Open samhurst opened this issue 3 years ago • 53 comments

Hello,

I've been working with SCReAM a bit more lately, and I've encountered an issue where when using the GStreamer elements over a constrained network link, the rate adaptation seems unstable. I'm trying to simulate what happens when during a stream, the network capacity drops below the value configured as the maximum bit rate for the SCReAM sender. The sender is configured with a maximum bit rate of 10Mbit/s, using the settings -initrate 2500 -minrate 500 -maxrate 10000 -nosummary. The full GStreamer sending pipeline is below:

export SENDPIPELINE="videotestsrc is-live=true pattern=\"smpte\" horizontal-speed=10 ! video/x-raw,format=I420,width=1920,height=1080,framerate=25/1 ! x264enc name=video threads=4 speed-preset=ultrafast tune=fastdecode+zerolatency ! queue ! rtph264pay ssrc=1 ! queue max-size-buffers=2 max-size-bytes=0 max-size-time=0 ! screamtx name=screamtx params=\" -initrate 2500 -minrate 500 -maxrate 10000 -nosummary\" ! udpsink host=10.0.0.168 port=5000 sync=true rtpbin name=r udpsrc port=6001 address=10.0.0.194 ! queue ! screamtx.rtcp_sink screamtx.rtcp_src ! r.recv_rtcp_sink_0 "

I'm using the netem tool to constrain the bandwidth of the link to include 40ms of latency at each end (i.e. 80ms RTT), and limiting the sending rate of both machines to 8Mbit/s:

sudo tc qdisc add dev enp0s31f6 root netem delay 40ms rate 8Mbit limit 40

This is a graph of the actual transmission rate (green) and the target encoder bit rate (blue) looks like with the network restriction applied for a full five minutes:

scream-gst-x264enc-bw8Mbit-40ms-20220408-3

I think it's safe to say that the target bit rate selected is quite erratic, and it doesn't seem to match up with the graphs shown in README.md, where the line does seem to wobble a bit but it stays tightly bound around one point. I've also run the scream_bw_test_tx/rx application and I get results like this, which show a still unstable target encoder bitrate but it's a lot more closely grouped.

scream-scream_bw_test_tx-bw8Mbit-40ms-20220408-3

Using iperf3 in UDP mode, I see that the actual performance of the network is fairly stable, sending 10Mbit/s of traffic results in a pretty uniform 7.77Mbit/s of actual throughput.

I suppose my real question is - is this expected behaviour? The huge swings in target bit rate cause a lot of decoding artifacts in the video stream, and I see a lot of packet loss as it keeps bouncing off the limit. If this is not expected behaviour, can you tell me how best to optimise my sending pipeline to suit?

samhurst avatar Apr 08 '22 15:04 samhurst

Hi I have actually had issues when applying rate limitations with netem. I suspect that this is because it implements rate policing and that the integration time is rather long. You may want to try this instead sudo tc qdisc del dev p4p1 root sudo tc qdisc add dev p4p1 root handle 1: htb default 1 sudo tc class add dev p4p1 parent 1: classid 1:1 htb rate 10000kbit Unfortunately this does not work along with netem so you need to apply the netem delay on the reverse path

IngJohEricsson avatar Apr 08 '22 16:04 IngJohEricsson

BTW... replace p4p1 with the applicable interface name :-)

IngJohEricsson avatar Apr 08 '22 16:04 IngJohEricsson

Hello,

Thanks for getting back to me. I had done some testing with the tbf qdisc instead of just straight netem, but that didn't perform that well either. I've tried using the htb qdisc as you suggested, but I'm still seeing a lot of variability (including SCReAM's target bitrate shooting way beyond the 8Mbit/s that I'm trying to set:

scream-gst-x264enc-htb

I'm wondering if this reaction is due to some bursting behaviour in the htb that I'm not that familiar with how to control. I've tried adding a hard ceiling of 8Mbit/s with a burst of 6000 and a cburst of 1500, but that doesn't seem to do much to help:

scream-gst-x264enc-htb-8m-ciel-burst

samhurst avatar Apr 11 '22 10:04 samhurst

Hi This looks really strange and I have not seen these large variations before. One observation is that I see that you run with a quite large resolution (19201080), the test computers that I use cannot handle x264enc at this high resolution without overloading the CPU. Can you try the same experiment with a lower resultion like e.g 640480 ?

/Ingemar

From: Sam Hurst @.> Sent: Monday, 11 April 2022 12:41 To: EricssonResearch/scream @.> Cc: Ingemar Johansson S @.>; Comment @.> Subject: Re: [EricssonResearch/scream] Rate adaptation seemingly unstable (Issue #44)

Hello,

Thanks for getting back to me. I had done some testing with the tbf qdisc instead of just straight netem, but that didn't perform that well either. I've tried using the htb qdisc as you suggested, but I'm still seeing a lot of variability (including SCReAM's target bitrate shooting way beyond the 8Mbit/s that I'm trying to set:

[Image removed by sender. scream-gst-x264enc-htb]https://user-images.githubusercontent.com/9945958/162723673-9c3a2c57-8805-4595-9446-9d38166ce13d.png

I'm wondering if this reaction is due to some bursting behaviour in the htb that I'm not that familiar with how to control. I've tried adding a hard ceiling of 8Mbit/s with a burst of 6000 and a cburst of 1500, but that doesn't seem to do much to help:

[Image removed by sender. scream-gst-x264enc-htb-8m-ciel-burst]https://user-images.githubusercontent.com/9945958/162723698-cad9c6c7-3d57-4187-b98a-028263d8e5d9.png

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-57011ba3d4059cde&q=1&e=57ec6095-b0dc-4097-98f5-3c435f52e388&u=https%3A%2F%2Fgithub.com%2FEricssonResearch%2Fscream%2Fissues%2F44%23issuecomment-1094893001, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-2745ee6e9fc5c72b&q=1&e=57ec6095-b0dc-4097-98f5-3c435f52e388&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACRZ2GC6NNUQLBF6RDW7LVLVEP6U5ANCNFSM5S45DF6A. You are receiving this because you commented.Message ID: @.***>

IngJohEricsson avatar Apr 11 '22 11:04 IngJohEricsson

Hi Ingemar,

The i7-1185G7 that I'm running these tests on doesn't seem to struggle with 10Mbit/s 1080p (~24% CPU usage, or 3% for all 8 threads of the host CPU). I tried a test with a resolution of 640x480, but the encoder seems to top out at around 6.7Mbit/s there so it didn't hit the network bit rate limit. Instead, I changed the test so that the --maxrate option to screamtx was 7000, and limited the network bitrate to 5Mbit/s. I still observe the same behaviour: scream-gst-x264enc-htb-5m-640x480

Many thanks, -Sam

samhurst avatar Apr 11 '22 12:04 samhurst

This is really strange. I probably need to try this out myself, but I don't believe I have time to do it until next week the earliest. Is it possible for you to try with the SCReAM BW test application on the same bottleneck ?

IngJohEricsson avatar Apr 11 '22 13:04 IngJohEricsson

Just run that and here's the result, with the -maxrate parameter on scream_bw_test_tx also set to 7000:

scream-scream_bw_test_tx-bw5Mbit

In case it helps, here's the CSV from my screamtx run above: gst-x264enc-htb-5m-640x480.csv

samhurst avatar Apr 11 '22 13:04 samhurst

OK, this looks more reasonable. I have to look into what causes the problems with the plugin

IngJohEricsson avatar Apr 11 '22 13:04 IngJohEricsson

Can you by (with the plugin example) chance also log the RTP bitrate, i.e the bitrate that comes from the video encoder ? Or alternatively post the entire log ?

IngJohEricsson avatar Apr 11 '22 13:04 IngJohEricsson

By "entire log", do you mean the whole GStreamer log? And I can probably get the actual bit rate using GstShark, I'm not sure if the x264enc element directly reports the bit rate output. If that's what you have in mind, I'll give that a go.

samhurst avatar Apr 11 '22 13:04 samhurst

You should get a quite verbose log on stdout, if you collect it the. I should be able to dig up the necessary info (I hope) /Ingemar

Hämta Outlook för iOShttps://aka.ms/o0ukef


Från: Sam Hurst @.> Skickat: Monday, April 11, 2022 3:39:47 PM Till: EricssonResearch/scream @.> Kopia: Ingemar Johansson S @.>; Comment @.> Ämne: Re: [EricssonResearch/scream] Rate adaptation seemingly unstable (Issue #44)

By "entire log", do you mean the whole GStreamer log? And I can probably get the actual bit rate using GstShark, I'm not sure if the x264enc element directly reports the bit rate output. If that's what you have in mind, I'll give that a go.

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-00280ab810626af3&q=1&e=34022b2f-825c-437a-abc9-f616f43c1385&u=https%3A%2F%2Fgithub.com%2FEricssonResearch%2Fscream%2Fissues%2F44%23issuecomment-1095065992, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-88f005766db065f9&q=1&e=34022b2f-825c-437a-abc9-f616f43c1385&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACRZ2GBMPY4ILAMERL6ZCULVEQTSHANCNFSM5S45DF6A. You are receiving this because you commented.Message ID: @.***>

IngJohEricsson avatar Apr 11 '22 13:04 IngJohEricsson

Here's the GStreamer log with GstShark bitrate logging as well as GST_DEBUG=":2,scream:9". I didn't seem to get much output from GStreamer by itself, so I turned up the element debugging. Hopefully this gets you what you're looking for.

gst-x264enc-htb-5m-640x480-gstshark-dbg9.csv gst-x264enc-htb-5m-640x480-gstshark-dbg9-trimmed.log.gz

samhurst avatar Apr 11 '22 14:04 samhurst

Thanks for the log. I plotted a graph. It looks like the video encoder internal rate control is quite sluggish. It lags more than 2 seconds behind the target bitrate. I have not seen this behavior with x264enc before, it may be related to some parameter setting perhaps ?

image

IngJohEricsson avatar Apr 12 '22 11:04 IngJohEricsson

That is possible. Before raising this ticket, I had also performed testing with vaapih264enc, but that had even worse performance (combined results with the first graph on this issue, although the colours are different):

scream-gst-vaapih264enc-bw8Mbit-40ms-20220408-3gst-x264enc-bw8Mbit-40ms-20220408-3

I'm using the version of libx264 that came with my distribution, version 160. The parameters which I'm passing into x264enc are threads=4 speed-preset=ultrafast tune=fastdecode+zerolatency, which is the same as is used in the sender.sh example in the gstscream/scripts/ directory.

If I run another test without any tuning, then the graph looks like this:

scream-gst-x264enc-notune-htb-5m-640x480-gstshark-dbg9

However, I'm almost certain this is because the encoder is hitting it's internal encoding bit rate limit for a 640x480 picture, if I up the frame size to 1024x576 then we're back to the yo-yo:

scream-gst-x264enc-notune-htb-5m-1024x576-dbg gst-x264enc-notune-htb-5m-1024x576-dbg.csv gst-x264enc-notune-htb-5m-1024x576-dbg-gst-trimmed.log.gz

samhurst avatar Apr 12 '22 13:04 samhurst

OK, thanks. In the vaapi example it looks like an I-frame is generated every 4 seconds or so. You can perhaps try and set keyframe-period=100 , the spikes should then occur less often. It is quite obvious that x264enc does not act optimally for this kind of rate adaptive use. I tried to look in the parameter set but I don't find anything that can be tuned. Perhaps one can try with and increased qp-step (default is 4) and set vbv-buf-capacity to a lower value than the default 600 Is there any chance you can try with nvenc or the h264 encoders that come with the NVIDIA Jetson Nano ?

IngJohEricsson avatar Apr 12 '22 13:04 IngJohEricsson

I've tried playing with the qp-step and vbv-buf-capacity parameters like you suggested, but cranking up the qp-step to 32 and the vbv-buf-capacity down to 100 milliseconds doesn't make much difference.

Sadly, I don't have any immediate access to a Jetson nano, nor any other NVidia encoding/decoding hardware.

samhurst avatar Apr 12 '22 15:04 samhurst

OK, You can perhaps try with these additional parameters -rateincrease 1000 -ratescale 0.2 as additional parameters after -maxrate 10000 in the sender.sh script This will make the rate increase slower and thus reduce the risk of overshoot because the video encoder rate control loop is slow/sluggish (I had this issue with omxh264enc in the Raspberry PI)

IngJohEricsson avatar Apr 13 '22 06:04 IngJohEricsson

I tried with -rateincrease 1000 -ratescale 0.2, and I could count fewer peaks but still the same large drops. I've also tried dropping them to -rateincrease 500 -ratescale 0.1 but it's just spacing the peaks out further, and doesn't seem to stop the overshoot with the large drop in response.

scream-gst-x264enc-htb-5m-1024x576-dbg-rateinc500-ratescale0 1

samhurst avatar Apr 13 '22 10:04 samhurst

OK. Yes, one should expect more sparse peaks. I was hoping that it would reduce the drops but that does not work. Not sure what more can be done. It is difficult to handle such cases when the video coder rate control loops are this slow. Somehow I believe that it much be some setting. Perhaps the ultrafast preset that makes things bad ?, but I am just speculating here.

IngJohEricsson avatar Apr 13 '22 13:04 IngJohEricsson

It's better to have TC set on a dedicated device. When this is not possible:
When using linux lc to constrain the rate of traffic flows, an application can receive backpressure forcing it to slow down the rate at which it sends traffic. For example, when backpressure is applied, an iperf3 flow configured to send a udp traffic flow at a certain rate is not able to sustain that rate but will adapt to the rate configured in tc. On the other hand, in a situation without backpressure being applied, the iperf3 flow will continue sending at the configured rate and tc will drop packets when the tc queue surpasses the configured limit. Whether application backpressure or tc packet dropped can be controlled by the setting of the socket send buffer and the qdisc limit. For example in the case of iperf3,

  1. using a socket send buffer of 212992 and a tc qdisc limit down to around 100 packets (varying dependent on the rate of the flows and the size of packets), application backpressure is experienced without tc packet drops.
  2. using a socket send buffer of 720896 and a tc qdisc limit of at least up to 256 there is no application backpressure, even with flows well above the configured tc rate limit.

The default and max socket send buffer is set using the command: sysctl -w net.core.wmem_max=212992 sysctl -w net.core.wmem_default=212992 sysctl -w net.core.wmem_max=720896 sysctl -w net.core.wmem_default=720896 sysctl -w net.core.wmem_max=2097152 sysctl -w net.core.wmem_default=2097152 By changing these all send buffers will get this larger default. The settings can be read using: cat /proc/sys/net/core/wmem_default cat /proc/sys/net/core/wmem_max

To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process: net.core.wmem_default=720896

jacobt21 avatar Apr 28 '22 17:04 jacobt21

Apologies for being quiet here for a while, a mixture of other projects taking time as well as some personal leave last month meaning it's been longer to get this response out than I'd hoped.

I ended up going away and trying to engage with both the GStreamer and x264 developers to see if there was any way of reducing the latency on the rate controller within the encoder, but this excercise did not bear much fruit. However, as a part of this effort I did end up writing a simple test that would take the SCReAM algorithm out of the loop and just allow me to see how the encoder reacted to standalone bit rate changes. I note that especially during times of observed congestion or rate limiting, the SCReAM algorithm could update the bitrate property serveral times a second, making it difficult to actually observe the reaction to the change. Here's an example of x264enc changing from 10Mbit/s to 8MBit/s, when running with a GOP length of 10:

x264-buffer-bitrate-test-idr-10-10mb-to-8mb

The data shown in the graph above is taken from gst-shark's buffer and bitrate tracers, each blue cross is the size of an encoded frame (against the left y-axis), the golden line is the bitrate set on the encoder (against the right y-axis), and the red line is a per-second average bitrate of the buffers that are flowing from the encoder. It takes at least a second for the x264 encoder to even begin ramping down it's encoded bit rate, and over two seconds before the average has reached the requested bit rate. Interestingly, it doesn't even seem to track with the GOP length, as I'd expect the encoder to use that as a natural point to break it's target bit rate, but x264enc doesn't seem to do this.

As you suggested (https://github.com/EricssonResearch/scream/issues/44#issuecomment-1096733102) I managed to get some testing done with an NVidia GPU (RTX 3080) using nvenc. Using the same test as earlier, I can see that nvenc reacts quite differently to x264enc, with a reaction to the bit rate change occuring almost immediately:

nvh264-firebrand-test-strict-gop15-10to8mb

However, something that I did notice that is different to the behaviour of x264enc is that whenever the encoder is reconfigured with a new bit rate, it abandons the current GOP and creates a new IDR. The first few buffers of this are then fairly large, and no matter how much I try to tune nvenc, I can't seem to tame that behaviour. The encoder certainly does it's best to keep the average bitrate down after this, and the average paced out over a second is well below the requested bit rate.

I then moved onto running screamtx with nvenc, and I feel that the issue whereby every reconfiguration with a new bit rate starts to cause serious problems. I restricted the bandwidth to 8Mbit/s overall again, with a maximum allowed bandwidth of 10Mbit/s (with buffer sizes set how Jacob suggests).

scream-firebrand-nvh264enc-scream-sender-8Mbit-nozerolatency

firebrand-nvh264enc-scream-sender-8Mbit-nozerolatency

(Top graph is plotted from the SCReAM CSV file, bottom graph is plotted using output from gst-shark similarly to the ones above)

It looks like the SCReAM rate controller tries to set it's initial bandwidth, the encoder massively overshoots and then the rate controller tries to turn down the rate, which causes the encoder to keep overshooting. This keeps happening so much that the rate controller seems to just keep trending along the bottom of the allowed bit rate.

Is there any way of backing off the SCReAM congestion controller so that it doesn't do quite so many updates? I feel that this might solve this particular problem.

samhurst avatar Jun 08 '22 11:06 samhurst

Hi Sam, The problem with nvenc that you’ve noticed: “that whenever the encoder is reconfigured with a new bit rate, it abandons the current GOP and creates a new IDR.”

Related to change: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad.git (fetch)

09fd34dbb0 sys/nvcodec/gstnvbaseenc.c (Seungha Yang 2019-08-31 17:34:13 +0900 1705) reconfigure_params.resetEncoder = TRUE; 09fd34dbb0 sys/nvcodec/gstnvbaseenc.c (Seungha Yang 2019-08-31 17:34:13 +0900 1706) reconfigure_params.forceIDR = TRUE; 09fd34dbb0 sys/nvcodec/gstnvbaseenc.c (Seungha Yang 2019-08-31 17:34:13 +0900 1707) reconfigure = TRU

09fd34dbb0 sys/nvcodec/gstnvbaseenc.c (Seungha Yang 2019-08-31 17:34:13 +0900 2439) GST_VIDEO_CODEC_FRAME_SET_FORCE_KEYFRAME (frame);

You might want to try to revert that patch or part of that patch and rebuild nvenc plugins.

“Is there any way of backing off the SCReAM congestion controller so that it doesn't do quite so many updates? I feel that this might solve this particular problem.”

Let’s us think about this issue.

Regards,

jacobt21 avatar Jun 08 '22 14:06 jacobt21

Hi Yes, it should be possible to add a hysteresis function that inhibits video rate updates until the target bitrate increases or decreases more than e.g +/- 5% . This change however requires some testing as there is a certain risk that the rate control can dead-lock. I cannot promise any update to this in a near future (i.e before the summer)

IngJohEricsson avatar Jun 09 '22 08:06 IngJohEricsson

Thanks to Jacob's pointer, I've modified the nvbaseenc code to allow me to set those values to FALSE, and the NVidia encoder doesn't generate a new IDR every time there's a reconfiguration. This fixes the issue I was seeing with the rate never getting off the bottom of the graph, as it now behaves fairly normally:

scream-firebrand-nvh264enc-scream-sender-no-idr-on-reconfigure-8Mbit

However, I'm still seeing the oscillation that I was seeing with x264enc, even though this encoder is much better at reacting to bit rate changes.

samhurst avatar Jun 09 '22 15:06 samhurst

I see the same pattern, but I attribute to the encoder, not scream. I this case scream keeps rate constant: targetBitrate_rateRtp

jacobt21 avatar Jun 09 '22 16:06 jacobt21

Hi Jacob,

The green line on my graph indicates the targetBitrate as set by the SCReAM congestion controller. I should have specified that my testing was performed with a network-level limitation of 8Mbit/s, and I'm trying to understand what the behaviour of the scream congestion controller is when faced with a network that has a lower amount of bandwidth than what the SCReAM congestion controller was originally configured to use. For example, a mobile user streaming video that moves to a new mast that has higher levels of congestion and/or a lower peak throughput available for that user.

From my previous discussions with Ingemar, it seemed like the expected behaviour would be that the congestion controller would trend towards the network limit, and not keep going over and then dropping ~25% of the bit rate in reaction to network congestion. Currently, the only way I get a flat line for the target bitrate is if the configured SCReAM maximum bit rate is lower than the bandwidth available (i.e. network has 9Mbit/s of bandwidth, SCReAM configured with a maximum of 8Mbit/s).

-Sam

samhurst avatar Jun 10 '22 10:06 samhurst

Hi Sam. As SCReAM adapts against the detection of increased queue delay you'll indeed get the behavior as shown in your figure. The reason is that once the queue starts to grow, you are essentially one round trip behind with the rate reduction, thus you'll get an overshoot. There are ways to reduce the magnitude of this oscillation. Try for instance with these extra options -rateincrease 5000 -ratescale 0.2 That slows down the rate increase but it also reduces the overshoot. /Ingemar

IngJohEricsson avatar Jun 10 '22 10:06 IngJohEricsson

Hi

I have now added a -hysteresis option that should reduce the amount of small rate changes quite considerably . For instance with -hysteresis 0.1 the bitrate must increase more than 10% or decrease more than 2.5% (1/4th of the value) for a new rate value to be presented to the encoder. If that condition is not met , then the previous rate value is returned by the getTargetBitrate(..) function . It is still needed to update the wrapper_lib that is used by the gstreamer plugin /Ingemar

IngJohEricsson avatar Jun 10 '22 12:06 IngJohEricsson

Hi Ingemar,

With what you say about the increased queueing delay, would a potential fix be to make the queue larger so that it covers multiple round trips? I could also try making the round trip itself longer using tc, and experiment with that. At the moment, I'm running on two hosts connected directly to one another so the round trip time is a couple of milliseconds at worst.

I've tried adding the options you described, but all it appears to do is decrease the frequency of overshoots that I see, not the amplitude of the reaction from the congestion controller.

Many thanks for all your help to date by the way, this has all been quite helpful. I look forward to testing the hysteresis and see if that helps matters.

-Sam

samhurst avatar Jun 10 '22 15:06 samhurst

Hi Actually the proposed settings should reduce the frequency of the overshoots. Try and set it real low (-rateincrease 5000 -ratescale 0.2) It is actually quite difficult to target a specific queue delay over a longer time span. In essence it requires that the following conditions are fulfilled

  1. The video coder deliver video frames with an exact size
  2. The bottleneck has a constant throughput
  3. above is rarely true , 2) is essentially never true with cellular access technology like 4G and 5G. This actually leads us to L4S technology which allows quite fast rate adaptation speed and very little overshoot. So instead of trying engineer around the problem with overshoot, we strive to add a congestion signal from the network. That will allow for good performance in the presence of issues 1) and 2) above.

Yes, a very short RTT will itself increase the rate adaptation speed in SCReAM, we have mostly tried of an links with RTT 10ms or more.

/Ingemar

IngJohEricsson avatar Jun 10 '22 16:06 IngJohEricsson