Vectorize cdf query of custom model

Open wildug opened this issue 2 years ago • 1 comments

In my usecase I want to compress a large amount of data with a custom entropy model. Unfortunately this takes quite some time since for each compressed symbol the cdf is called. I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.

Here's my dummy code:

from scipy import stats
import constriction
import numpy as np

c = 0
def cdf_likelihood_normal(x, mu, sigma):
    global c
    c += 1
    print(c, end="\r")
    p =  stats.norm.cdf(x, loc=mu, scale=sigma )
    return p

def inverse_cdf_likelihood_normal(q, mu, sigma):
    x = stats.norm.ppf(q, loc=mu, scale = sigma)
    return x

coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)


sigma =  np.ones(int(1e4))
mu    = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)

p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast

coder.encode_reverse(message, entropy_model,  mu, sigma) # very slow
print(coder.num_bits())

reconstruction = coder.decode(entropy_model, mu,sigma)

assert (message == reconstruction).all()

Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?

Jul 24 '23 14:07 wildug

I can see how vectorizing would reduce overhead from python callbacks. Unfortunately, vectorizing is only possible for encoding; when decoding a symbol, the decoder cannot know where to evaluate the ppf before it has decoded all preceding symbols (except in case of the ChainCoder). I'll have to think a bit what the best API would be to reflect this asymmetry (and to ideally still support vectorization for decoding with a ChainCoder).

Jul 30 '23 15:07 robamler