stempeg icon indicating copy to clipboard operation
stempeg copied to clipboard

'Input tensor dimension should be 3d' error when decoding Engine DJ stems

Open Noir- opened this issue 1 year ago • 1 comments

Denons Engine DJ recently got support for stem separation. The file the desktop software produces seems to be in STEMS format but there's something different in the format. The output of the stem2files utility with the file I attached (for demo I took a short snippet of a CC-BY-NC licensed track by "Timbre" which can be found here) is the following:

$ stem2files ./22\ e9f9eb56-b8cb-4669-a5a9-ac4235ae1983.stems 
Traceback (most recent call last):
  File "/Users/noir/projects/ng/./stems/bin/stem2files", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/noir/projects/ng/stempeg/stempeg/cli.py", line 69, in cli
    stem2files(
  File "/Users/noir/projects/ng/stempeg/stempeg/cli.py", line 110, in stem2files
    write_stems(
  File "/Users/noir/projects/ng/stempeg/stempeg/write.py", line 776, in write_stems
    raise RuntimeError(f"Input tensor dimension should be 3d")
RuntimeError: Input tensor dimension should be 3d

I'm on MacOS 14.7.1 with ffmpeg 7.1.

The .stems file: 22 e9f9eb56-b8cb-4669-a5a9-ac4235ae1983.stems.zip

Noir- avatar Dec 08 '24 19:12 Noir-

@Noir- this is very interesting! it seems that this isn't the same stems format as native instruments uses. In fact ffprobe returns

Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 8 channels, fltp, 646 kb/s (default)

So its a single AAC audio stream with 8 channels -> I guess they use 4*2 channels interleaved. I can't decode it with ffmpeg though, so i guess we need to dig deeper...

faroit avatar Dec 09 '24 10:12 faroit