Output scaling question
Hello, thank you for providing this model open source!
I have a question about the output range of the NAP / IHC values. I was expecting the output values to be in the [0, 1] range as mentioned in section 18.3 of Human and Machine Hearing. I understand the values to represent the "probability of firing" for a neuron.
Testing out the numpy and JAX implementations with the default design parameters and some sample input (noise and sines at varying volumes), I am getting different results though. The output seems to have a floor at a specific value of -0.81157 (consistent across different types of input), and have peaks anywhere between 2 and 10 depending on the input signal.
Here is some example code:
sr = 22050
f0 = 300
x = jnp.sin(2 * jnp.pi * f0 * jnp.arange(sr // 10) / sr) * 0.1
params_jax = carfac.CarfacDesignParameters(fs=sr)
hypers_jax, weights_jax, state_jax = carfac.design_and_init_carfac(
params_jax
)
naps, naps_fibers, state, bm, seg_ohc, seg_agc = carfac.run_segment(x, hypers_jax, weights_jax, state_jax)
fig, ax = plt.subplots(3, 1, figsize=(8, 6), sharex=True)
ax[0].plot(x)
ax[0].set_title('input')
ax[1].imshow(naps.squeeze().T, aspect='auto')
ax[1].set_title('NAP')
ax[2].plot(naps[:, 50])
ax[2].set_title('NAP channel 50')
ax[2].axhline(-0.81157, color='r')
ax[2].grid()
plt.show()
And the output:
Could you help me understand how to interpret these values? Is there some sensible design parameter setting or some post-scaling that I can apply to constrain the output to [0, 1]?
Thanks again and sorry if I'm missing something obvious, I'm very new to working with these models!
David, great question, thanks for asking, and apologies for my slow response.
Let's see if the book answers it: https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf ... I'm not finding it, though it's sort of implicit in the way I use "smoothing filters", that is lowpass filters with unity gain at DC, in the feedback from NAP output to damping factor.
The result is that when the NAP's "average" (across time and a few channels) output is 1, the "relative undamping" reduces (from 1 in silence) to 0, moving the filters's damping factors from minimum and maximum. This happens at the high end of the compressive region; if the input gets louder still, the NAP's average can increase above 1 without further reducing the filter gains.
Ultimately, the max output will be limited by what the inner hair cells do when driven hard after a silence interval (with unbounded input, the IHC's conductance g approaches 1; that's what you point out in Figure 18.3. From Figure 18.7 it looks like that would give a NAP value of 1, but I don't think that's right). Figure 2 in the v2 paper https://arxiv.org/pdf/2404.17490 has an explicit a2 factor that looks like it would be the max NAP output (run it see what its value is).
In your channel 50 waveform example, you can see that the NAP average converges to about 1, while the onset transient is quite a bit higher. I think the absolute max can be about 10; you can try some signals and see; or bound it by further analyzing the IHC. Let me know if you'd like help with that; if Figure 18.7 is buggy, we should fix it. Ah, yes, it omits the "output_gain", calculated in CARFAC_DesignIHC (in the Matlab version); looks like it's trying to get an output of about 2 when input is an amplitude 10 square wave. This still is not clearly explaining the peak numbers we're seeing though. I wrote most of this nearly 15 years ago, and sometimes I feel like kicking the guy who wrote it, as it's not all very clear.
Now in terms of "probability of firing", these NAP outputs are only loosely related; we are fixing that in v3, in which we'll still have a NAP that's 0 in silence, like v1 and v2, but also outputs proportional to firing rate or probability for each of 3 different spontaneous-rate classes (these are nonzero, positive, in silence). In general, these will not be in a range 0 to 1, since they'll be like parameters of Poisson processes, firing probability per time, or firing rate, either for a neuron or for all the number of neurons associated with the spont-rate-class and the channel, which is a set of numbers that can be used to model synaptopathy. We're still playing around with the best way to parameterize and tune v3. Please ask here or contact me personally if you'd like to be an early adopter of v3 and help us get this sorted and documented better (I can send you our start on the v3 doc).
I'm writing this while on disability leave, on opioids while recovering from total knee replacement. So excuse any goofs.
Dick