dcadec icon indicating copy to clipboard operation
dcadec copied to clipboard

Profiling report

Open merbanan opened this issue 11 years ago • 2 comments

I tested a few samples while profiling and this is what I get on a few samples.

Master audio (XLL):

time seconds seconds calls us/call us/call name
24.42 0.21 0.21 37680 5.57 8.49 interpolate_sub32_fixed 17.44 0.36 0.15 7536 19.91 32.65 parse_frame_data 13.95 0.48 0.12 dcadec_waveout_write 12.79 0.59 0.11 602880 0.18 0.18 inverse_dct32_fixed 11.63 0.69 0.10 23145408 0.00 0.00 bits_get_signed_rice 6.98 0.75 0.06 8790544 0.01 0.01 bits_get_signed 3.49 0.78 0.03 5968286 0.01 0.01 bits_get 2.33 0.80 0.02 7536 2.65 2.65 interpolate_lfe_fixed_fir 2.33 0.82 0.02 7536 2.65 16.28 parse_frame 2.33 0.84 0.02 bits_get_unsigned_rice 1.16 0.85 0.01 1055040 0.01 0.01 bits_get_unsigned_vlc 1.16 0.86 0.01 7536 1.33 46.45 filter_hd_ma_frame 0.00 0.86 0.00 1521769 0.00 0.00 bits_get1 0.00 0.86 0.00 135648 0.00 0.00 xll_map_ch_to_spkr 0.00 0.86 0.00 120576 0.00 0.00 bits_skip 0.00 0.86 0.00 90434 0.00 0.00 ta_get_size 0.00 0.86 0.00 60900 0.00 0.00 bits_seek 0.00 0.86 0.00 45216 0.00 0.00 xll_get_lsb_width 0.00 0.86 0.00 37681 0.00 0.00 bits_init 0.00 0.86 0.00 30144 0.00 0.00 bits_check_crc 0.00 0.86 0.00 22609 0.00 0.00 bits_skip1 0.00 0.86 0.00 15073 0.00 0.01 read_frame 0.00 0.86 0.00 10528 0.00 0.00 bits_get_signed_linear 0.00 0.86 0.00 7536 0.00 0.00 bits_align1 0.00 0.86 0.00 7536 0.00 45.12 core_filter 0.00 0.86 0.00 7536 0.00 32.70 core_parse 0.00 0.86 0.00 7536 0.00 0.11 exss_parse 0.00 0.86 0.00 7536 0.00 0.00 reorder_samples 0.00 0.86 0.00 7536 0.00 0.00 xll_assemble_msbs_lsbs 0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data 0.00 0.86 0.00 7536 0.00 16.28 xll_parse 0.00 0.86 0.00 22 0.00 0.00 ta_zalloc_size 0.00 0.86 0.00 17 0.00 0.00 ta_free 0.00 0.86 0.00 6 0.00 0.00 ta_alloc_size 0.00 0.86 0.00 5 0.00 0.00 interpolator_create

Core 96kHz:

time seconds seconds calls us/call us/call name
84.91 10.91 10.91 138360 78.86 78.86 interpolate_sub64_float 4.75 11.52 0.61 27672 22.05 28.27 parse_frame_data 4.20 12.06 0.54 dcadec_waveout_write 2.33 12.36 0.30 27672 10.84 20.08 parse_x96_frame_data 1.87 12.60 0.24 19098423 0.01 0.01 bits_get_signed_vlc 0.78 12.70 0.10 26956937 0.00 0.00 bits_get 0.31 12.74 0.04 19174096 0.00 0.00 bits_get_signed 0.23 12.77 0.03 4427520 0.01 0.01 bits_get_unsigned_vlc 0.16 12.79 0.02 8135577 0.00 0.00 bits_get1 0.16 12.81 0.02 27672 0.72 0.72 reorder_samples 0.08 12.82 0.01 166032 0.06 0.06 ta_get_size 0.08 12.83 0.01 27672 0.36 394.72 core_filter 0.08 12.84 0.01 27672 0.36 48.90 core_parse 0.04 12.85 0.01 dcadec_context_filter 0.04 12.85 0.01 dcadec_context_free_exss_info 0.00 12.85 0.00 138360 0.00 0.00 bits_skip 0.00 12.85 0.00 83016 0.00 0.00 bits_skip1 0.00 12.85 0.00 55345 0.00 0.07 read_frame 0.00 12.85 0.00 55344 0.00 0.00 bits_init 0.00 12.85 0.00 55344 0.00 0.00 bits_seek 0.00 12.85 0.00 27672 0.00 0.06 alloc_x96_sample_buffer 0.00 12.85 0.00 16 0.00 0.00 ta_zalloc_size 0.00 12.85 0.00 12 0.00 0.00 ta_free 0.00 12.85 0.00 6 0.00 0.00 ta_alloc_size 0.00 12.85 0.00 5 0.00 0.00 interpolate_sub64_float_init 0.00 12.85 0.00 5 0.00 0.00 interpolator_create

Core 48kHz:

time seconds seconds calls us/call us/call name
69.77 0.60 0.60 28884 20.77 20.77 interpolate_sub32_float 10.47 0.69 0.09 7221 12.46 21.97 parse_frame_data 9.30 0.77 0.08 dcadec_waveout_write 4.65 0.81 0.04 9415936 0.00 0.00 bits_get_signed 2.33 0.83 0.02 3399280 0.01 0.01 bits_get 1.16 0.84 0.01 1010940 0.01 0.01 bits_get1 1.16 0.85 0.01 7221 1.38 1.38 reorder_samples 1.16 0.86 0.01 dcadec_stream_read 0.00 0.86 0.00 808752 0.00 0.00 bits_get_unsigned_vlc 0.00 0.86 0.00 36105 0.00 0.00 bits_skip 0.00 0.86 0.00 36105 0.00 0.00 ta_get_size 0.00 0.86 0.00 21663 0.00 0.00 bits_skip1 0.00 0.86 0.00 14443 0.00 0.01 read_frame 0.00 0.86 0.00 14442 0.00 0.00 bits_init 0.00 0.86 0.00 7221 0.00 0.00 bits_seek 0.00 0.86 0.00 7221 0.00 83.10 core_filter 0.00 0.86 0.00 7221 0.00 22.13 core_parse 0.00 0.86 0.00 12 0.00 0.00 ta_zalloc_size 0.00 0.86 0.00 5 0.00 0.00 ta_alloc_size 0.00 0.86 0.00 4 0.00 0.00 interpolate_sub32_float_init 0.00 0.86 0.00 4 0.00 0.00 interpolator_create 0.00 0.86 0.00 4 0.00 0.00 ta_free

Not surprising the transform takes most of the time and frame parsing is next.

merbanan avatar Mar 28 '15 18:03 merbanan

0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data

If we already sharing... On my XLL sample it spends more time there.

64-bit build:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 19.01     27.24    27.24 2918320721     0.01     0.01  bits_get_signed_rice
 16.28     50.57    23.33  4754970     4.91     8.09  interpolate_sub32_fixed
 13.94     70.54    19.97   950994    21.00    21.00  xll_filter_band_data
 10.58     85.70    15.16 76079520     0.20     0.20  idct_perform32_fixed
 10.03    100.07    14.37                             _mcount_private
  8.45    112.18    12.11                             parse_frame_data
  6.30    121.21     9.03                             __fentry__
  4.54    127.72     6.51   950994     6.85     6.85  dcadec_waveout_write
  2.21    130.88     3.16 988609488     0.00     0.00  bits_get_signed
  2.20    134.03     3.15   950994     3.31     3.31  interpolate_lfe_fixed_fir
  1.95    136.82     2.79                             filter_hd_ma_frame
  1.59    139.10     2.28                             parse_frame
  1.37    141.06     1.96 770208131     0.00     0.00  bits_get

32-bit build:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 18.70     40.89    40.89  4754970     8.60    17.00  interpolate_sub32_fixed
 18.26     80.82    39.93 76079520     0.52     0.52  idct_perform32_fixed
 17.90    119.97    39.15 2918320721     0.01     0.01  bits_get_signed_rice
 11.23    144.52    24.55   950994    25.82    25.82  xll_filter_band_data
  9.68    165.70    21.18                             _mcount_private
  8.97    185.32    19.62                             parse_frame_data
  3.48    192.94     7.62   950994     8.01     8.01  dcadec_waveout_write
  3.05    199.60     6.66 988609488     0.01     0.01  bits_get_signed
  2.58    205.25     5.65   950994     5.94     5.94  interpolate_lfe_fixed_fir
  1.99    209.60     4.35 770208131     0.01     0.01  bits_get
  1.36    212.57     2.97                             parse_frame
  1.21    215.21     2.64                             filter_hd_ma_frame

(only functions with > 1% of time)

EDIT: GCC 4.9.2, mingw-w64 3.3.0. dcadec compiled with unmodified settings apart from -pg ofc. EDIT2: But to be honest I see no point in this issue. I'm sure foo89 can use profiler on his own.

DTS Core Audio: 5.1 ch, 48 kHz, 24 bit, 1536 kbps

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 40.78     28.16    28.16  3121245     9.02    11.30  interpolate_sub32_float
 16.03     39.23    11.07                             floor
 10.28     46.33     7.10 49939920     0.14     0.14  idct_perform32_float
  9.63     52.98     6.65                             parse_frame_data
  6.13     57.21     4.23   624249     6.78     6.78  dcadec_waveout_write
  5.36     60.91     3.70                             _mcount_private
  3.20     63.12     2.21   624249     3.54     3.54  interpolate_lfe_float_iir
  3.11     65.27     2.15 690138024     0.00     0.00  bits_get_signed
  2.94     67.30     2.03                             __fentry__
  1.59     68.40     1.10 443798127     0.00     0.00  bits_get

kasper93 avatar Mar 28 '15 22:03 kasper93

Aren't compiler vendor and settings extremely important here? The project is 100% C code. So maybe you should post compiler version and optimization settings too.

Also interesting to finally see a case where the restrict keyword actually does something.

ghost avatar Mar 28 '15 22:03 ghost