Improve retries for non-live remote streams.
The short story behind this commit is that with this change, attempts will be made to resume non-live remote streams which end because of a failure at the Player level, such as because of a data corruption, (as opposed to the Network level) when previously playback would have ended. This is most noticable on long running streams like those hosted on Mixcloud. It potentially relieves some of the symptoms reported in issue #130.
The longer story with some history is as follows:
The _willRetry subroutine was added to Slim::Player::StreamingController in version 7.6 of slimserver with the last commit to change the subroutine in September 2011. The functional purpose of the subroutine is to retry playing a stream when if fails at the Player level. However it would appear via code inspection and testing that that _willRetry will not run as intended when called from at least two out of the three subroutines from which it is called and it seems likely that it's never worked.
One of the first things that _willRetry does on line 1394 is check whether $retryData is defined.
However the only place we have found $retryData to be assigned is at line 921 of StreamingController in the _RetryOrNext subroutine, which is only called from the ReadyToStream state.
$retryData does appear in the constructor of Slim::Player::Song at line 74 but is not mentioned anywhere else in the codebase that we can find or in any of the plugins that we've searched.
And looking at the expected contents of $retryData in StreamingController, we don't think that it was ever intended to be defined outside of StreamingController. The expected contents are "start" and "count" where the first is the time when StreamingController started retrying and the second is the number of retries.
The three places that _willRetry is called from are: _Stream, _StopNextIfMore and _SyncStopNext.
_StopNextIfMore and _SyncStopNext are both called when in the StreamingFailed state (lines 205 and 206). But $retryData will have been set to undef in _Playing at line 370 which will have been called before the StreamingFailed state can be reached.
So, as far as we can tell, _willRetry will always return 0 at line 1396 when called from _StopNextIfMore or _SyncStopNext.
Also, the current implentation only handles remote radio streams where playback is restarted from the begining. See on line 918 the check for isLive and line 921 where seekdata is omitted in the call to _Stream.
In this commit we have attempted to understand how the _willRetry subroutine was originally intended to work and implement the same logic. So we've kept the same retry intervals and retry limits for example.
When streaming to players that need the remote stream to be transcoded, such as legacy hardware players which cannot play AAC for example, the same problem manifests itself but the code to relieve the issue needed to be added to the _RetryOrNext subroutine. An else statement has been added to the inner if statement in the _RetryOrNext subroutine so previously functioning restarts of radio streams should be unchanged but now transcoded non-live streams will also be restarted.
The test case we have been using to to develop this commit is using the Mixcloud plugin with the following stream which usually fails to play to completion:
https://www.mixcloud.com/johndigweed/transitions-with-john-digweed-and-denney/
It is not entirely predictable if and when this stream will fail so testing this commit with it requires patience and a lot of time.
Failure at the Player level seems to occur more frequently when using "Normal streaming" on the Settings -> Advanced -> Network screen than when using "Persistent mode". We do not know why this would be the case but would suggest that logically the difference is likely to be in the way the remote server handles the streamed data.
I have not been able to look at the PR and I'm not sure I can find time but FYI I did some modification around a similar issue not so long ago. It was to handle retry when player vanished because of reboot or power loss. I'm not saying at all that your PR is wrong, but just wanted to say that there are a fair bit of difficulties when this happens, with a fair bit of permutations and combinations, including when player comes back after the "forget" period.
@philippe44 thanks for the comment. The changes in this PR are focused on problems with remote server stream and are in different subroutines from those that handle local player reconnect. We've not changed the _Stream, playerReconnect or continue subroutines, which are where all the references to local player reconnect occur.
@philippe44 thanks for the comment. The changes in this PR are focused on problems with remote server stream and are in different subroutines from those that handle local player reconnect. We've not changed the _Stream, playerReconnect or continue subroutines, which are where all the references to local player reconnect occur.
Understood but you call _Stream() in more cases. Maybe you could make sure that when players vanishes/reboot/power off, they come back as before, whether the forgetClient timer elapsed or not. I'm sorry I cannot help more at this point
@philippe44 noted. With a remote stream playing, we've tested player power off/on and player restart within the 5 minute forgetClient timer period and after the forgetClient timer elapsed. We tested all these scenarios both before a retry caused by remote server and after. We've also tested all of these scenarios with both native and server transcoded playback. In all cases the player behaved as expected with playback resuming at the correct position in the stream if the player comes back within the timer period and not resuming if the player comes back after the timer period.
That's great then, thanks - testing is always the most difficult. @mherger might ask for the opportunity of code factorization as two of changes seems similar, but AFAIC, I have no useful comment to make.
@philippe44 Thanks. Have responded to your inline comments and reverted code in the spots you hightlighted.
Thanks guys! I haven't forgotten this PR. But it touches on code I'm not too familiar with... so please give me some time. I'll hopefully be able to get back to this by the end of the week.
One question to get started: what side-effects could you think of? What should I be looking for outside the Mixcloud use case?
@michaelherger Side effects are only likely in streaming failure scenarios, noting that normal end-of-file events can be considered as failure scenarios in the context of the StreamingController, beside stream corruptions or errors opening a file/stream for example.
It would be sensible to consider local files as well as remote streams - both are handled by StreamingController. Slimsever server-side transcoded files/stream are handled differently than non-transcoded files/streams so it would also be sensible to look there.
The scenarios where the new and changed code is called are as follows:
-
In the _Stream subroutine when there is is an error opening the file/stream.
-
In _StopNextIfMore and _SyncStopNext where both are called when StreamingFailed and a Buffering, Wait-to-Sync or Streamout (end of file) event occurs.
-
And _RetryOrNext is called when ReadyToStream and playing-state is Playing and a Streamout event occurs.
The new code puts a boundary around end-of-file events so that it will only retry if more than 10 seconds of a track has played and more than 10 seconds is remaining to play. Existing code is unchanged for live radio where the duration is not known.
Failures that trigger the new and changed code via 2 above are most likely when playing remote streams. 3 above specifically excludes local files by checking isRemote on the Song. 1 above is the only scenario where a local file is really likely to trigger new and changed code.
Hope that helps.
So I thought that if already I didn't have time to look into this, I'd give it a try with the various sources I'm using. Radio Paradise Lossless was working fine. But Spotty with SqueezeAMP wouldn't like your changes: ten seconds before the end of the track odd things would happen. Sometimes it would repeat the last seconds of the first track 2-4x before jumping to the next track. But only the audio would play the next track. The UI (state) would stick on the previous track, and stay one behind for the continuation.
This is with LMS 8.3 on a Pi4, running Pi OS. I could revert back to git trunk, and playback would be fine. Apply the patch again, and the same oddities would happen. It's definitely reproducible on my production server.
Unfortunately on my dev machine (macOS 12.3.1) things are working ok (with a local SqueezePlay instance). So this probably isn't easily reproducible. Maybe it's a question of buffer sizes in the player? I don't even know the sizes for my two software players I've tested...
Thanks for feedback @michaelherger. We'll have a think about why that might be happening with Spotty on Pi OS. Which player where you using with Pi?
I first tried with a SqueezeAMP (SqueezeAmp.16.1023.master-cmake). But I double checked with a Touch and am seeing the same behaviour there. Interestingly the macOS Squeezeplay would still behave correctly, even when connected to the same server. And the Touch would play correctly when connected to macOS LMS...
I'll see whether I can confirm these findings with other players when I have more time (probably tomorrow night).
@michaelherger Because you said the last seconds are repeated 2-4x before jumping the problem you are seeing is likely on line 1429 in the _willRetry subroutine.
Suspect that either or both of $song->duration() and $elapsed (which is defined as $self->playingSongElapsed()) are not reporting the same values as on the other server/player combination.
You could test this theory by adding logging lines to output these two values on the line before the if at line 1429.
There are checks in playingSongElapsed() for undefined $song->duration() so we may need to add a check for that on 1429. Something like this:
if (!$song->duration() || $song->retryData->{'count'} > scalar @retryIntervals || $elapsed < 10 || ($song->duration() - $elapsed < 10)) {
This would mean that Spotty streams on the server/player combinations you highlighted will not retry until the $song->duration problem is resolved elsewhere in codebase though.
Please do let us know how you get on and thanks for looking at this.
BTW, playingSongElapsed is not always reliable. Players that can't sync might even not report it
@philippe44 Thanks for confirming. We have added checks for undefined elapsed and duration at line 1429 to the PR accordingly.
@philippe44 any idea which players are affected? Can't immediately see how those players can be made to retry at this level without those two pieces of information. Seems a shame for users of those players but doesn't mean we shouldn't fix retries for those players that can support it.
I've created a while ago these bridges that transform UPnP, Chromecast and AirPlay into LMS (slimproto). UPnP are a real pain and have a lot of diversity in behavior. Some do not comply to standard and don't provide time progress. I've tried to fake time position for these, but I'm not sure it always works. This is one example I can think of. Other than that, I would have to look at SongElapsed code but I think there was some convoluted pieces to create / recreate true position from what player reports. Remember that players report playtime since last slimproto "play" command, not an absolute position of which there are not aware of. For example if you seek 2 mins in a track, LMS sends audio data starting from the estimated offset and players will report time from there, being totally unaware that this is not the beginning of the track.
Ok, here's a recording I did with the latest version of your changes: around ten seconds before the end of the track you'll see how progress bars are starting to jump both in the Material skin and on device. Then those ten seconds are repeated twice, before audio (and audio only!) continues to the next track. But the progress bar and metadata are stuck on the previous track.
https://www.dropbox.com/s/czexhefd4ai5wws/repeating.mov?dl=0
Corresponding log file: server 2.log.zip
Thanks for testing @michaelherger and for the attachments. We've reviewed the video and the logfile and made two changes to the PR as a result:
-
The same undef checks needed to be added to StreamingController in the _RetryOrNext subroutine because elapsed and song duration can be undefined even when transcoding, as is the case when streaming using Spotty.
-
We've added a sanity check next to the undef checks in _willRetry and _RetryOrNext to check that elapsed is less than duration. We can't see how any of the changes in this PR could cause the values of track time remaining to become incorrectly formated strings as seen in the web UI in the video you sent. At one point in the video the track time remaining reads "-00:0-1:0-1" for example. So we suspect that this might be a cause of the code changed in the PR misbehaving rather than a symptom. The sanity check should work around this.
@michaelherger you could check the second of these points by reverting the StreamingController and watching the last 10 seconds of the same track play out in the web UI again. We've added debug log lines to the PR also so re-running the same test you ran before should produce a more useful log output.
I'm not worried about the UI artifact. It's just its expression of confusion :joy:. What I wanted to show with the video was how ten seconds before the end of the track it somehow jumped to the end, just to jump around some more thereafter. Plus die audible prove of the repetition.
In the new log I seeked three minutes in to shorten the wait for the issue to kick in. That happened around timestamp 10:22:56.5576. Whatever comes thereafter is what is related to the jumping at the end of that track.
Thanks for this @michaelherger.
Your initial seek can be seen at lines 82 and 141: Line 82 [22-05-08 10:22:14.9267] Slim::Player::Song::open (370) Adding seekdata { timeOffset => "170.804928131417" } Line 141 [22-05-08 10:22:26.7117] Slim::Player::Song::open (370) Adding seekdata { timeOffset => "170.804928131417" }
Then _RetryOrNext is called twice when the player is reporting ~12.5 seconds left to play and restarts. The third time _RetryOrNext is called with <1 second left to play and then does not restart. The interesting lines are as follows:
Line 194 [22-05-08 10:22:33.3820] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 10925127 Line 215 [22-05-08 10:22:44.9890] Slim::Player::StreamingController::_RetryOrNext (925) Elapsed: 187.319153095559 Line 216 [22-05-08 10:22:44.9904] Slim::Player::StreamingController::_RetryOrNext (926) Duration: 199.006 Line 217 [22-05-08 10:22:44.9916] Slim::Player::StreamingController::_RetryOrNext (932) Getting seek data from protocol handler. Line 218 [22-05-08 10:22:44.9929] Slim::Player::StreamingController::_RetryOrNext (934) Restarting playback at time offset: 187.319153095559
Line 249 [22-05-08 10:22:46.1325] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 11337005 Line 277 [22-05-08 10:22:56.5532] Slim::Player::StreamingController::_RetryOrNext (925) Elapsed: 187.485696039513 Line 278 [22-05-08 10:22:56.5549] Slim::Player::StreamingController::_RetryOrNext (926) Duration: 199.006 Line 279 [22-05-08 10:22:56.5563] Slim::Player::StreamingController::_RetryOrNext (932) Getting seek data from protocol handler. Line 280 [22-05-08 10:22:56.5576] Slim::Player::StreamingController::_RetryOrNext (934) Restarting playback at time offset: 187.48569603951
Line 318 [22-05-08 10:22:57.6124] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 11739298 Line 339 [22-05-08 10:23:07.7950] Slim::Player::StreamingController::_RetryOrNext (925) Elapsed: 198.744084175423 Line 340 [22-05-08 10:23:07.7959] Slim::Player::StreamingController::_RetryOrNext (926) Duration: 199.006 Line 341[22-05-08 10:23:07.7968] Slim::Player::StreamingController::_RetryOrNext (927) Will not retry - no player sync or track is within 10 seconds of end.
Looking at lines 194, 249 and 318 Slim::Player::Source::_readNextChunk is reporting the byte position when the end of file or error was reached. Clearly, the legitimate end of the track could not be at three different byte positions, so it seems that something really has gone wrong, in code not changed by this PR, at least the first two times.
So, @philippe44 and @michaelherger can you think of a better way of checking that a retry is desirable than <10 seconds of track remaining? Or is the best option just to increase the amount of time remaining when no retry should be attempted and what should we set the limit to? We're keeping in mind that the use-case this PR is primarily targeting is long-running tracks where tracks are being truncated long before the legitimate end of the track.
So, @philippe44 and @michaelherger can you think of a better way of checking that a retry is desirable than <10 seconds of track remaining? Or is the best option just to increase the amount of time remaining when no retry should be attempted and what should we set the limit to? We're keeping in mind that the use-case this PR is primarily targeting is long-running tracks where tracks are being truncated long before the legitimate end of the track.
TBH: I don't know, really. I've been lucky enough not to have to deal with this part of the code... Maybe one reason why I tried to avoid it is because I feared the side effects any change to this relatively stable system could have.
What I'm wondering is how your use case is different from the issues @philippe44 addressed in the http code changes he did. Aren't you dealing with http streams?
I will try to have a look and wrap my head around the issue in the coming days
Thanks @michaelherger and @philippe44.
@michaelherger we understand your reluctance to alter something that you've not worked on before and that has remained untouched for more than a decade.
@michaelherger do you think you could send a server.log produced while playing back the last 30 seconds of the same track with StreamingController.pm from the base branch please? We'd like to check whether we can see the track being truncated there also.
@philippe44 predicted that you'd ask about his recent changes to this module and said: "@mherger might ask for the opportunity of code factorization as two of changes seems similar, but AFAIC, I have no useful comment to make." We looked through the history and assume your referring to PR #699. If so, then the changes are quite different. PR #699 deals with what happens when a player powers off and then on or restarts. PR #786 is focused on players that, while still connected to the slimserver, have experienced a playback error like an unexpected end of file/stream or a stream/file corruption.
It would be nice to be able to establish why these tracks/streams are failing and fix the route causes. However not all potential causes for failure are in the slimserver code so there will always be a need for slimserver to handle failures more gracefully than it does currently. By setting the amount of time at the end of a track when a retry should be attempted appropriately we're hoping that an improvement can be made for the majority of users but recognise that it is a balancing act.
The changes I was thinking of are #581 and #591, which make http streaming more robust.
Here's a log with standard LMS 8.3: server.log.zip
Thanks!
@michaelherger we'll take a deeper look at #581 and #591 but neither changed StreamingController.pm.
Here's a commentary on your latest log file:
The song duration according to video you sent is 199 seconds.
You used seek to get to 154.051334702259 seconds in the track. So time remaining to play should be ~45 seconds.
Line 21 [22-05-09 22:30:11.5526] Slim::Player::Song::open (370) Adding seekdata { timeOffset => "154.051334702259" }
Then the player announces it is playing at 22:30:13.1861. So track should end at approximately 22:30:58.
Line 46 [22-05-09 22:30:13.1861] Slim::Player::StreamingController::playerTrackStarted (2201) 24:6f:28:d2:d3:64
We see the EoF error after ~15 seconds at 22:30:28:
Line 78 [22-05-09 22:30:28.9611] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 13301365
The player announces it is stopped at 22:30:32.
Line 90 [22-05-09 22:30:32.0292] Slim::Player::StreamingController::playerEndOfStream (2287) 24:6f:28:d2:d3:64
Then the next track is announced as playing at 22:30:58.0732, which was the calculated end of the last track.
Line 165 [22-05-09 22:30:58.0732] Slim::Player::StreamingController::playerTrackStarted (2201) 24:6f:28:d2:d3:64
If the logging can be taken literally then 26 seconds of the track was truncated.
So @michaelherger did the track sound truncated? Did you have a 26 second silent gap between tracks?
The track played fine to the last tone, of course. I wouldn’t complain about broken playback with your patch if it had been broken before already 😉.
I didn’t mention the other PRs for their solution (the files they modified), but for the problem they addressed.
As for philippe44‘s remark about me requesting re-factoring: I don’t think it’s related to my remark, but probably to me requesting changing your code, as it might introduce some repetition in itself or within that file? Don’t know really, as I haven’t looked into the code changes yet.
As for philippe44‘s remark about me requesting re-factoring: I don’t think it’s related to my remark, but probably to me requesting changing your code, as it might introduce some repetition in itself or within that file? Don’t know really, as I haven’t looked into the code changes yet.
Yes, I think I saw some opportunity to remove some redundancy (tbc) and I know you (rightfully) pay attention to that.
Players announce playerTrackStarted/PLAYING/STREAMING in the server.log when they start playing.
Players announce playerEndOfStream/STREAMOUT/STOPPED/IDLE/playerReadyToStream in the server.log when they finish streaming, not when they finish playing.
Players buffer stream and play from buffer between end of streaming and end of stream but this is not logged in server.log.
For transcoded streams sub _RetryOrNext in PR now checks that there is more than the duration of the player buffer remaining to stream before retrying.
For non-transcoded streams sub _willRetry in PR now checks that the player has not received all of the stream bytes before retrying.
On test server Spotify tracks do not report streambitrate in Slim::Player::Song so duration of player's buffer cannot be calculated and therefore Spotify tracks will not retry.
Ok, it looks as if this time you nailed it! At least my playback issue 😁. My test now passes. Great!
Now I should probably take some time to actually review the change... One first comment: please rewrite all log statements in the way of
main::DEBUGLOG && $log->is_debug && $log->debug("log that thing");
It's a minor change, but helps lower the load in these busy parts of the code. For blocks where you have many different log statements and nothing else (like you introduced it with the latest commit), you can of course have one condition around them all.
A few early comments, as I'm having a hard time to understand what the actual code changes are. Is the description at the top of this PR still up to date and valid?
- please use local variables to prevent repeated calls of things like
master($self)->songBytes. It makes the code noisy and hard to read. In one place you even removed such a variable ($retry), which caused more unnecessary changes. - avoid changing code which doesn't really change in functionality (like eg. the removal of the aforementioned variable) or improve readability/code quality
- use tabs to indent the code
- use some whitespace (lines) to "group" chunks of code