StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

make normalization optional in `crosscov` and `crosscor`

Open babaq opened this issue 3 years ago • 3 comments

In the current implementation, the normalization terms are enforced in the code, where i found inflexible for calculations that do not need them, it would be more convenient to add keyword arguments to turn them off.

for example:

function crosscov!(r::RealVector, x::RealVector, y::RealVector, lags::IntegerVector; demean::Bool=true)
    lx = length(x)
    m = length(lags)
    (length(y) == lx && length(r) == m) || throw(DimensionMismatch())
    check_lags(lx, lags)

    T = typeof(zero(eltype(x)) / 1)
    zx::Vector{T} = demean ? x .- mean(x) : x
    S = typeof(zero(eltype(y)) / 1)
    zy::Vector{S} = demean ? y .- mean(y) : y
    for k = 1 : m  # foreach lag value
        r[k] = _crossdot(zx, zy, lx, lags[k]) / lx   ## Here make dividing lx optional
    end
    return r
end

this can also applied to crosscor as well.

babaq avatar Dec 21 '22 20:12 babaq

I'm not sure it makes sense not to divide by the length since otherwise it's not the same thing. Analogously, the cov function doesn't provide an option to avoid the division by n or n-1.

ararslan avatar Dec 24 '22 00:12 ararslan

I was doing cross correlation on vectors of binary 0s and 1s, it is essentially the sliding dot product(sum(v1(t)v2(t+τ)), so do not need to divide n, if the _cross_dot function is exported, it will be no need to add this optional normalization.

babaq avatar Dec 24 '22 02:12 babaq

common example for crosscor: searching the most likely lag (max crosscor) does not need to normalize. The user maybe needs to do some normalization before anyways (so that data matches reference).

and bumping #341 same input size?

i9e1 avatar Jun 05 '23 12:06 i9e1