Vectorize cdf query of custom model
In my usecase I want to compress a large amount of data with a custom entropy model. Unfortunately this takes quite some time since for each compressed symbol the cdf is called. I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.
Here's my dummy code:
from scipy import stats
import constriction
import numpy as np
c = 0
def cdf_likelihood_normal(x, mu, sigma):
global c
c += 1
print(c, end="\r")
p = stats.norm.cdf(x, loc=mu, scale=sigma )
return p
def inverse_cdf_likelihood_normal(q, mu, sigma):
x = stats.norm.ppf(q, loc=mu, scale = sigma)
return x
coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)
sigma = np.ones(int(1e4))
mu = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)
p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast
coder.encode_reverse(message, entropy_model, mu, sigma) # very slow
print(coder.num_bits())
reconstruction = coder.decode(entropy_model, mu,sigma)
assert (message == reconstruction).all()
Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?
I can see how vectorizing would reduce overhead from python callbacks. Unfortunately, vectorizing is only possible for encoding; when decoding a symbol, the decoder cannot know where to evaluate the ppf before it has decoded all preceding symbols (except in case of the ChainCoder). I'll have to think a bit what the best API would be to reflect this asymmetry (and to ideally still support vectorization for decoding with a ChainCoder).