upmpdcli-docker icon indicating copy to clipboard operation
upmpdcli-docker copied to clipboard

Very large playlist timeout

Open superceu opened this issue 4 months ago • 49 comments

Hello, I've configured upmpdcli-docker as a media server for Navidrome. I would like to be able to play from my renderer (a Wiim mini) the Navidrome favorite songs playlist, a huge playlist (~1500 songs). The problem is that ~200 songs is the maximum number that seems to work in a reasonable time: CMDTALK: subsonic-app.py: browse executed (element [pl]) in [14.055]

The big playlist goes in timeout:

CMDTALK: subsonic-app.py: browse: args: --{'cmdtalk:proc': 'browse', 'count': '100', 'offset': '0', 'flag': 'children', 'objid': '0$subsonic$/eyJuIjogInRhZyIsICJ2IjogInBsIiwgInJuZCI6ICJ3cnFoankifQ==/eyJuIjogInBsIiwgInYiOiAiamJSdnhhZDhUSkJNN21CTjloZUlzciIsICJybmQiOiAid3FiZnNwIn0='}--
CMDTALK: subsonic-app.py: browse: item_identifier name: --pl-- value: --jbRvxad8TJBM7mBN9heIsr--
CMDTALK: subsonic-app.py: browse: should serve element: --pl-- [jbRvxad8TJBM7mBN9heIsr]
CMDTALK: subsonic-app.py: browse: found elem handler for: --ElementType.PLAYLIST--
:3:../src/mediaserver/cdplugins/cmdtalk.cpp:140::CmdTalk:readDataElement: fatal timeout (60 S)

I enabled SUBSONIC_ENABLE_IMAGE_CACHING but it didn't help.

Since this doesn't seem to be a CPU issue (hardware is old, but CPU is far from 100%), what's the issue here? Is there something that I can do?

superceu avatar Oct 11 '25 19:10 superceu

Hello, I would try to suggest a few things to try.

  • if you have postman and you are confortable using it, maybe try to connect to the navidrome instance and invoke the playlist endpoint (https://www.subsonic.org/pages/api.jsp#getPlaylist) and see how long it takes to get the response back.

  • I just analyzed the code, and I verified that there are no particularly expensive processing steps involved for showing the songs in a playlist. The most "expensive" thing it to get the album art. If spotify integration is active in your navidrome instance, can you repeat the test without it?

Also enabling verbose logging on the container might help. This can be done by setting

subsonicverboselogging = 1

But there is no dedicated variable for this in the docker image, so you would have to create a text file somewhere and mount it as /user/config/upmpdcli-additional.txt:ro

So if your file is named upmpdcli-additional.txt and is located where the docker compose is, place this line in the volumes section:

  - ./upmpdcli-additional.txt:/user/config/upmpdcli-additional.txt:ro

And then recreate the container. We can then understand a little bit more of what is happening.

GioF71 avatar Oct 12 '25 08:10 GioF71

Ah and image caching might make things even worse in this scenario where lots of files are involved. Image caching might be helpful if the subsonic instance is remote and considerably slower. The plugin will try and see if cover art files are available locally before downloading them from the actual server.

GioF71 avatar Oct 12 '25 08:10 GioF71

I used this request: http://navidrome_local_ip:4533/rest/getPlaylist?id=jbRvxad8TJBM7mBN9heIsr&u=admin&p=REDACTED&v=1.16.1&c=TestPostman

First response: 12s Next ones: 1.6s

superceu avatar Oct 12 '25 09:10 superceu

Ok thank you, so it"s on the plugin side. I will investigate and let you know!

GioF71 avatar Oct 12 '25 09:10 GioF71

Can you post the full log with verbose logging enabled?

GioF71 avatar Oct 12 '25 09:10 GioF71

Can you post the full log with verbose logging enabled?

Startup: upmpdcli_00_startup.log

From the Wiim app: Browse > Media server upmpdcli-subsonic > Subsonic > Playlists > playlist_name upmpdcli_01_playlist.log

superceu avatar Oct 12 '25 10:10 superceu

Thank you, this is very helpful. There are network issues but we need to understand why. What is running at local ip 192.168.10.151? Maybe a samba server? There are these issues on the log you provided:

Oct 12 12:11:00 debnas systemd-navidrome[11041]: time="2025-10-12T10:11:00Z" level=error msg="Error sending response to client" endpoint=/rest/getCoverArt error="write tcp 10.88.0.6:4533->192.168.10.151:57241: write: broken pipe" payload="{\"subsonic-response\":{\"status\":\"failed\",\"version\":\"1.16.1\",\"type\":\"navidrome\",\"serverVersion\":\"0.58.0 (9dbe0c18)\",\"openSubsonic\":true,\"error\":{\"code\":0,\"message\":\"Internal Server Error: write tcp 10.88.0.6:4533-\\u003e192.168.10.151:57241: write: broken pipe\"}}}" requestId=49286c487594/mz8LTBY0Mn-000222

If so, would it be possible to run navidrome where the files are, so we can rule out network issues loading files from a NAS?

I've done some testing, and while I did not create a 1500 tracks playlist yet, my 201 tracks took only 2.2 seconds to load.

GioF71 avatar Oct 12 '25 10:10 GioF71

Under these conditions (slow server), image caching might actually help. After a few iterations, the upmpdcli server will be able to store all the relevant images and avoid to invoke getCoverArt, which in turns might trigger loading files from the samba server. But in order to effectively have image caching enabled, you must also provide a path for the variable WEBSERVER_DOCUMENT_ROOT.

GioF71 avatar Oct 12 '25 10:10 GioF71

Maybe also check navidrome logs for more information about the issue.

GioF71 avatar Oct 12 '25 10:10 GioF71

Thank you, this is very helpful. There are network issues but we need to understand why. What is running at local ip 192.168.10.151? Maybe a samba server? There are these issues on the log you provided:

192.168.10.151 is my phone, where the wiim app ("browse > etc") is running.

If so, would it be possible to run navidrome where the files are, so we can rule out network issues loading files from a NAS?

Navidrome and upmpdcli are running in containers in the same VM. The VM is running on the NAS. Library directory is mounted in the VM using NFS from the NAS. So, NFS traffic is local inside the NAS.

The log was collected with "journalctl -f | tee -a file.log". Navidrome log messages are already included.

superceu avatar Oct 12 '25 11:10 superceu

Ok right... but then, why these errors:

"Internal Server Error: write tcp 10.88.0.6:4533-\u003e192.168.10.151:57239"

There seem to be issues serving images to your phone apparently. Can you try another control point? Maybe on a wired network? Maybe linn kazoo, upplay...

GioF71 avatar Oct 12 '25 11:10 GioF71

There seem to be issues serving images to your phone apparently. Can you try another control point? Maybe on a wired network? Maybe linn kazoo, upplay...

This is from upplay:

upmpdcli_02_upplay.log

superceu avatar Oct 12 '25 11:10 superceu

Similar errors from navidrome:

Oct 12 13:09:43 debnas systemd-navidrome[11041]: time="2025-10-12T11:09:43Z" level=error msg="SQL: `UPDATE player SET client = {:p0}, id = {:p1}, ip = {:p2}, last_seen = {:p3}, max_bit_rate = {:p4}, name = {:p5}, report_real_path = {:p6}, scrobble_enabled = {:p7}, transcoding_id = {:p8}, user_agent = {:p9}, user_id = {:p10} WHERE id = {:p11}`" args="map[p0:NavidromeUI p1:7kWgCHV8P5OS3Rl31b1asU p10:tSKLtRDNvhqfVprpND5TP2 p11:7kWgCHV8P5OS3Rl31b1asU p2:192.168.10.100 p3:2025-10-12 11:09:39.044127335 +0000 UTC m=+4422.267557573 p4:0 p5:NavidromeUI [Firefox/Windows] p6:false p7:true p8: p9:Firefox/Windows]" elapsedTime=4.45s error="context deadline exceeded" requestId=49286c487594/mz8LTBY0Mn-000391 rowsAffected=0
Oct 12 13:09:43 debnas systemd-navidrome[11041]: time="2025-10-12T11:09:43Z" level=warning msg="Could not save player" client=NavidromeUI error="context deadline exceeded" id=7kWgCHV8P5OS3Rl31b1asU requestId=49286c487594/mz8LTBY0Mn-000391 type=Firefox/Windows username=admin
Oct 12 13:09:43 debnas systemd-navidrome[11041]: time="2025-10-12T11:09:43Z" level=error msg="Could not register player" client=NavidromeUI error="context deadline exceeded" requestId=49286c487594/mz8LTBY0Mn-000391 username=admin

is it possible to avoid NFS somehow?

GioF71 avatar Oct 12 '25 11:10 GioF71

Can this issue be related?

GioF71 avatar Oct 12 '25 11:10 GioF71

is it possible to avoid NFS somehow?

Not easily. Maybe I could try SMB.

Those errors seem unrelated to me. Navidrome DB is in a local filesystem, inside the VM. NFS isn't used for that.

In the meantime, this is the same operation with upplay and image caching activated (with many cache hits):

upmpdcli_03_upplay_imagecache_on.log

superceu avatar Oct 12 '25 11:10 superceu

I made a few testing and in this scenario, using caching is actually more expensive, unless the server is very slow. In my case the navidrome server runs on an old celeron j1900 with local files and there is a reverse proxy involved. With image caching enabled, the first run takes 20 seconds for 200 entries. Then it becomes faster, but still: the first run is expensive. In the first run all the images are saved in the upmpdcli webdocumentroot directory. But with these numbers, a timeout when the number of tracks grows to 1200 is very likely. Without caching, the result is just a bit more than 2 seconds for those 200 tracks, and that indicates me that in my scenario, it's not even worth caching.

Are you using spotify and last.fm integrations in navidrome?

GioF71 avatar Oct 12 '25 13:10 GioF71

If you can send me an email address here or some form of direct contact (or discord maybe), I can send you the address and credentials of a lightweight music server, so that you can connect upmpdcli to it and see if the error persists.

GioF71 avatar Oct 12 '25 13:10 GioF71

I launched navidrome and upmpdcli containers on my computer, with library too on SSD, and created a dummy 1600 songs "Test1" playlist. I also recreated the navidrome database.

getPlaylist responds always in 0.2 sec. Now Navidrome is blazingly fast.

With upplay (servers > workdeb-subsonic > Subsonic > playlists > Test1): CMDTALK: subsonic-app.py: browse executed (element [pl]) in [9.188]

Wiim app works too (but limited to 1000 songs max) in a similar time.

Then I configured upmpdcli on the NAS to use Navidrome on my PC. With upplay (servers > upmpdcli-subsonic > Subsonic > playlists > Test1):

  • first run: with lots of cache miss ("build_cover_art_url for"), timeout after 60 sec.
  • second run: cache hits, but timeout after 60 sec.

In the logs, I see a very similar number (~1000) of "build_cover_art_url" calls both when the NAS's upmpdcli uses the NAS's Navidrome and when it uses the Navidrome instance on my PC.

superceu avatar Oct 12 '25 14:10 superceu

So it seems the issue is the I/O against the NAS... isn't it? This value

CMDTALK: subsonic-app.py: browse executed (element [pl]) in [9.188]

is pretty much aligned with my 2.1 seconds for 201 tracks in the playlist

GioF71 avatar Oct 12 '25 14:10 GioF71

about this aspect:

Then I configured upmpdcli on the NAS to use Navidrome on my PC.
With upplay (servers > upmpdcli-subsonic > Subsonic > playlists > Test1):

are the upmpdcli volumes on NFS shares as well?

GioF71 avatar Oct 12 '25 14:10 GioF71

are the upmpdcli volumes on NFS shares as well?

No, upmpdcli volumes are inside the VM.

On the NAS upmpdcli I disabled subsonicverboselogging and image caching, to reduce I/O. I can barely see any disk activity with htop. What I see in the VM is one core (of 3) at 100%.

If it were a NAS disk I/O problem, when I use the NAS's upmpdcli with the Navidrome instance on my PC, it should work, shouldn't it?

superceu avatar Oct 12 '25 14:10 superceu

I also tried the other way around: upmpdcli on my PC -> NAS navidrome. This worked in ~9 seconds, the same time as when upmpdcli and Navidrome were both on my PC.

superceu avatar Oct 12 '25 14:10 superceu

This seems to be a CPU-related issue, rather than a disk issue. It's obviously not at all certain that this will be something easy (or even possible) to optimize. If there's nothing that can be done, so be it! I'll still continue to limit Navidrome playlists to 200 songs as workaround.

Thank you for your support!

superceu avatar Oct 12 '25 14:10 superceu

Hello again, glad it's ok now. I never ran the subsonic plugin on anything stronger than the celeron j1900, and never experienced similar issues. When it was slow in the past, it was because of some excessive calls to the api, but the current version seems to be working decently well. Of course there are limitations (e.g. a playlist with lots of tracks as I learned today, because it's not something I usually do). Hope you like the plugin. Keep opening issues for improvement suggestions, etc. Lastly, from you logs it looks like we are compatriots, so ciao, alla prossima :-)

GioF71 avatar Oct 12 '25 17:10 GioF71

Hello @superceu, thanks to your issue, I took some time to have a look at a few portions of the plugin code. Regardless of the issues you were experiencing, seeing the plugin take 2.1 seconds to load just 200 items did not sound very good to me. So, the usual suspect was the call to build_cover_art_url. Just by removing it, the elapsed time dropped to less than 0.100 sec after the api call to load the playlist. To cut the long story short, in order to build the url for the cover image, there was a call to a "connect()" method which would of course trigger a connection to the subsonic server, for each item to be processed. That was because the py-sonic library would make some information (salt, encoded password) available only after a successful connection. But I need those information to prepare the image urls (and ultimately, also the streaming urls). So I had to bring some code from py-sonic to subsonic-connector, not a lot I have to say, but now I don't need to call the connect() method just to prepare a cover art url. The plugin is now a lot faster. Image caching actually slows things down but can be useful when the server is actually remote (e.g. on the internet) and/or actually significantly slower than if run locally. Ah, and also, when loading a playlist, the entries have bitdepth, samplerate and channel count correctly set (like when showing album tracks). This was a minor issue and would not hinder bit-perfect playback, but the control point would not show the correct metadata. If you want to try it, use the edge tag of the image (link here) of the image if you are not using it already. Let me know if the new version works for you as well. Cheers!

GioF71 avatar Oct 14 '25 16:10 GioF71

Great work!

About the big playlist (1445 songs) loading time: 1st time right after container startup: 22 sec From the 2nd time onwards: 9 sec!!! Impressive!

You're right about image caching, it slows things down, and it causes the loading to time out even when the images are already cached.

One thing still doesn't quite sound right to me: when I click on 'Albums > Favorite Albums' it takes about 11 seconds to show just 26 albums, which seems like a lot compared to the (upgraded) times for the playlist.

Great work again, and thank you for the top notch support!

superceu avatar Oct 14 '25 21:10 superceu

Hello, 11 seconds is quite a lot for 26 albums. But do you have "only" 26 albums in favorites, or it is just the content of a page with a "next" button? Because one aspect is that at the api level, getStarred always returns ALL the favorite albums. Can you post the logs?

GioF71 avatar Oct 14 '25 21:10 GioF71

Maybe also try to execute the getStarred2 endpoint here. https://subsonic.org/pages/api.jsp#getStarred2

GioF71 avatar Oct 14 '25 21:10 GioF71

Sorry I was wrong, that part does not use getStarred2, but getAlbumList. So there is pagination after all, I'll have to investigate. Do the other views (recently added, newest, etc) behave similarly?

GioF71 avatar Oct 15 '25 08:10 GioF71

Sorry for spamming, but from a quick look at the code, looks like caching to the local database of some information (like associating an album to a genre, an album to an artist, all done for showing covers quickly) might be somehow slow on your setup.

Also, the thing you said about image caching triggering timeout is not expected, it is just barely noticeable on my setup.

Does this happen also when you run upmpdcli on your computer, instead than on your NAS/VM?

GioF71 avatar Oct 15 '25 08:10 GioF71