'Input tensor dimension should be 3d' error when decoding Engine DJ stems
Denons Engine DJ recently got support for stem separation. The file the desktop software produces seems to be in STEMS format but there's something different in the format. The output of the stem2files utility with the file I attached (for demo I took a short snippet of a CC-BY-NC licensed track by "Timbre" which can be found here) is the following:
$ stem2files ./22\ e9f9eb56-b8cb-4669-a5a9-ac4235ae1983.stems
Traceback (most recent call last):
File "/Users/noir/projects/ng/./stems/bin/stem2files", line 8, in <module>
sys.exit(cli())
^^^^^
File "/Users/noir/projects/ng/stempeg/stempeg/cli.py", line 69, in cli
stem2files(
File "/Users/noir/projects/ng/stempeg/stempeg/cli.py", line 110, in stem2files
write_stems(
File "/Users/noir/projects/ng/stempeg/stempeg/write.py", line 776, in write_stems
raise RuntimeError(f"Input tensor dimension should be 3d")
RuntimeError: Input tensor dimension should be 3d
I'm on MacOS 14.7.1 with ffmpeg 7.1.
The .stems file: 22 e9f9eb56-b8cb-4669-a5a9-ac4235ae1983.stems.zip
@Noir- this is very interesting! it seems that this isn't the same stems format as native instruments uses. In fact ffprobe returns
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 8 channels, fltp, 646 kb/s (default)
So its a single AAC audio stream with 8 channels -> I guess they use 4*2 channels interleaved. I can't decode it with ffmpeg though, so i guess we need to dig deeper...