numo-narray icon indicating copy to clipboard operation
numo-narray copied to clipboard

Numo::NArray#var

Open hatappi opened this issue 7 years ago • 2 comments

I found a difference in Numo::NArray#var when I was comparing with numpy.

numpy

>>> np.var(np.array([[1, 2], [3, 4]], dtype="f"))
1.25

Numo::NArray

> Numo::DFloat[[1, 2], [3, 4]].var()
=> 1.6666666666666667

hatappi avatar Mar 22 '18 15:03 hatappi

The variance of numo is "unbiased" sample variance whose denominator is N-1 where N is number of samples.

However, the numpy's denominator is N. You can get the same result with Numo with ddof=1 as:

In [3]: np.var(np.array([[1, 2], [3, 4]], dtype="f"), ddof=1)
Out[3]: 1.6666666

However, Numo currently does not support ddof argument. So, we can not get the same result with numpy's default (in my understanding).

Numo should support ddof argument although the default behavior should keep different with Numpy for backward compatibility.

sonots avatar Mar 22 '18 15:03 sonots

In my opinion, I prefer Numo's choice of default.

The standard use of Numo is for data samples, for which the correct choice of variance is the sample variance (denominator n-1). Population variance (denominator n, numpy's default), is only correct if you have the whole population. If you have a population approximation, i.e. n extremely big, the difference between the two vanishes anyway, as per theory. References: [wiki], [SO].

I would agree on suggesting an extra argument to fetch the population variance, but only for completeness, and with a more understandable argument name than ddof (example: ary.var(type: :population)).

I wish for Numo to become a better alternative to numpy, rather than to only emulate it.

[EDIT] Oh looks like Sonots-san already answered that, sorry for the double reply.

giuse avatar Mar 22 '18 16:03 giuse