Profiling report
I tested a few samples while profiling and this is what I get on a few samples.
Master audio (XLL):
time seconds seconds calls us/call us/call name
24.42 0.21 0.21 37680 5.57 8.49 interpolate_sub32_fixed
17.44 0.36 0.15 7536 19.91 32.65 parse_frame_data
13.95 0.48 0.12 dcadec_waveout_write
12.79 0.59 0.11 602880 0.18 0.18 inverse_dct32_fixed
11.63 0.69 0.10 23145408 0.00 0.00 bits_get_signed_rice
6.98 0.75 0.06 8790544 0.01 0.01 bits_get_signed
3.49 0.78 0.03 5968286 0.01 0.01 bits_get
2.33 0.80 0.02 7536 2.65 2.65 interpolate_lfe_fixed_fir
2.33 0.82 0.02 7536 2.65 16.28 parse_frame
2.33 0.84 0.02 bits_get_unsigned_rice
1.16 0.85 0.01 1055040 0.01 0.01 bits_get_unsigned_vlc
1.16 0.86 0.01 7536 1.33 46.45 filter_hd_ma_frame
0.00 0.86 0.00 1521769 0.00 0.00 bits_get1
0.00 0.86 0.00 135648 0.00 0.00 xll_map_ch_to_spkr
0.00 0.86 0.00 120576 0.00 0.00 bits_skip
0.00 0.86 0.00 90434 0.00 0.00 ta_get_size
0.00 0.86 0.00 60900 0.00 0.00 bits_seek
0.00 0.86 0.00 45216 0.00 0.00 xll_get_lsb_width
0.00 0.86 0.00 37681 0.00 0.00 bits_init
0.00 0.86 0.00 30144 0.00 0.00 bits_check_crc
0.00 0.86 0.00 22609 0.00 0.00 bits_skip1
0.00 0.86 0.00 15073 0.00 0.01 read_frame
0.00 0.86 0.00 10528 0.00 0.00 bits_get_signed_linear
0.00 0.86 0.00 7536 0.00 0.00 bits_align1
0.00 0.86 0.00 7536 0.00 45.12 core_filter
0.00 0.86 0.00 7536 0.00 32.70 core_parse
0.00 0.86 0.00 7536 0.00 0.11 exss_parse
0.00 0.86 0.00 7536 0.00 0.00 reorder_samples
0.00 0.86 0.00 7536 0.00 0.00 xll_assemble_msbs_lsbs
0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data
0.00 0.86 0.00 7536 0.00 16.28 xll_parse
0.00 0.86 0.00 22 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 17 0.00 0.00 ta_free
0.00 0.86 0.00 6 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 5 0.00 0.00 interpolator_create
Core 96kHz:
time seconds seconds calls us/call us/call name
84.91 10.91 10.91 138360 78.86 78.86 interpolate_sub64_float
4.75 11.52 0.61 27672 22.05 28.27 parse_frame_data
4.20 12.06 0.54 dcadec_waveout_write
2.33 12.36 0.30 27672 10.84 20.08 parse_x96_frame_data
1.87 12.60 0.24 19098423 0.01 0.01 bits_get_signed_vlc
0.78 12.70 0.10 26956937 0.00 0.00 bits_get
0.31 12.74 0.04 19174096 0.00 0.00 bits_get_signed
0.23 12.77 0.03 4427520 0.01 0.01 bits_get_unsigned_vlc
0.16 12.79 0.02 8135577 0.00 0.00 bits_get1
0.16 12.81 0.02 27672 0.72 0.72 reorder_samples
0.08 12.82 0.01 166032 0.06 0.06 ta_get_size
0.08 12.83 0.01 27672 0.36 394.72 core_filter
0.08 12.84 0.01 27672 0.36 48.90 core_parse
0.04 12.85 0.01 dcadec_context_filter
0.04 12.85 0.01 dcadec_context_free_exss_info
0.00 12.85 0.00 138360 0.00 0.00 bits_skip
0.00 12.85 0.00 83016 0.00 0.00 bits_skip1
0.00 12.85 0.00 55345 0.00 0.07 read_frame
0.00 12.85 0.00 55344 0.00 0.00 bits_init
0.00 12.85 0.00 55344 0.00 0.00 bits_seek
0.00 12.85 0.00 27672 0.00 0.06 alloc_x96_sample_buffer
0.00 12.85 0.00 16 0.00 0.00 ta_zalloc_size
0.00 12.85 0.00 12 0.00 0.00 ta_free
0.00 12.85 0.00 6 0.00 0.00 ta_alloc_size
0.00 12.85 0.00 5 0.00 0.00 interpolate_sub64_float_init
0.00 12.85 0.00 5 0.00 0.00 interpolator_create
Core 48kHz:
time seconds seconds calls us/call us/call name
69.77 0.60 0.60 28884 20.77 20.77 interpolate_sub32_float
10.47 0.69 0.09 7221 12.46 21.97 parse_frame_data
9.30 0.77 0.08 dcadec_waveout_write
4.65 0.81 0.04 9415936 0.00 0.00 bits_get_signed
2.33 0.83 0.02 3399280 0.01 0.01 bits_get
1.16 0.84 0.01 1010940 0.01 0.01 bits_get1
1.16 0.85 0.01 7221 1.38 1.38 reorder_samples
1.16 0.86 0.01 dcadec_stream_read
0.00 0.86 0.00 808752 0.00 0.00 bits_get_unsigned_vlc
0.00 0.86 0.00 36105 0.00 0.00 bits_skip
0.00 0.86 0.00 36105 0.00 0.00 ta_get_size
0.00 0.86 0.00 21663 0.00 0.00 bits_skip1
0.00 0.86 0.00 14443 0.00 0.01 read_frame
0.00 0.86 0.00 14442 0.00 0.00 bits_init
0.00 0.86 0.00 7221 0.00 0.00 bits_seek
0.00 0.86 0.00 7221 0.00 83.10 core_filter
0.00 0.86 0.00 7221 0.00 22.13 core_parse
0.00 0.86 0.00 12 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 5 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 4 0.00 0.00 interpolate_sub32_float_init
0.00 0.86 0.00 4 0.00 0.00 interpolator_create
0.00 0.86 0.00 4 0.00 0.00 ta_free
Not surprising the transform takes most of the time and frame parsing is next.
0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data
If we already sharing... On my XLL sample it spends more time there.
64-bit build:
% cumulative self self total
time seconds seconds calls us/call us/call name
19.01 27.24 27.24 2918320721 0.01 0.01 bits_get_signed_rice
16.28 50.57 23.33 4754970 4.91 8.09 interpolate_sub32_fixed
13.94 70.54 19.97 950994 21.00 21.00 xll_filter_band_data
10.58 85.70 15.16 76079520 0.20 0.20 idct_perform32_fixed
10.03 100.07 14.37 _mcount_private
8.45 112.18 12.11 parse_frame_data
6.30 121.21 9.03 __fentry__
4.54 127.72 6.51 950994 6.85 6.85 dcadec_waveout_write
2.21 130.88 3.16 988609488 0.00 0.00 bits_get_signed
2.20 134.03 3.15 950994 3.31 3.31 interpolate_lfe_fixed_fir
1.95 136.82 2.79 filter_hd_ma_frame
1.59 139.10 2.28 parse_frame
1.37 141.06 1.96 770208131 0.00 0.00 bits_get
32-bit build:
% cumulative self self total
time seconds seconds calls us/call us/call name
18.70 40.89 40.89 4754970 8.60 17.00 interpolate_sub32_fixed
18.26 80.82 39.93 76079520 0.52 0.52 idct_perform32_fixed
17.90 119.97 39.15 2918320721 0.01 0.01 bits_get_signed_rice
11.23 144.52 24.55 950994 25.82 25.82 xll_filter_band_data
9.68 165.70 21.18 _mcount_private
8.97 185.32 19.62 parse_frame_data
3.48 192.94 7.62 950994 8.01 8.01 dcadec_waveout_write
3.05 199.60 6.66 988609488 0.01 0.01 bits_get_signed
2.58 205.25 5.65 950994 5.94 5.94 interpolate_lfe_fixed_fir
1.99 209.60 4.35 770208131 0.01 0.01 bits_get
1.36 212.57 2.97 parse_frame
1.21 215.21 2.64 filter_hd_ma_frame
(only functions with > 1% of time)
EDIT: GCC 4.9.2, mingw-w64 3.3.0. dcadec compiled with unmodified settings apart from -pg ofc. EDIT2: But to be honest I see no point in this issue. I'm sure foo89 can use profiler on his own.
DTS Core Audio: 5.1 ch, 48 kHz, 24 bit, 1536 kbps
% cumulative self self total
time seconds seconds calls us/call us/call name
40.78 28.16 28.16 3121245 9.02 11.30 interpolate_sub32_float
16.03 39.23 11.07 floor
10.28 46.33 7.10 49939920 0.14 0.14 idct_perform32_float
9.63 52.98 6.65 parse_frame_data
6.13 57.21 4.23 624249 6.78 6.78 dcadec_waveout_write
5.36 60.91 3.70 _mcount_private
3.20 63.12 2.21 624249 3.54 3.54 interpolate_lfe_float_iir
3.11 65.27 2.15 690138024 0.00 0.00 bits_get_signed
2.94 67.30 2.03 __fentry__
1.59 68.40 1.10 443798127 0.00 0.00 bits_get
Aren't compiler vendor and settings extremely important here? The project is 100% C code. So maybe you should post compiler version and optimization settings too.
Also interesting to finally see a case where the restrict keyword actually does something.