core.matrix Support for broadcasting in N-dimensional arrays

Hi there,

We are using core.matrix in the development of anglican, a probabilistic programming language written in Clojure. While core.matrix has been super useful in our development process, we do find that there is one bit of functionality missing that we typically rely on when implementing machine learning algorithms in other languages like Matlab or python + numpy. This has to do with the way core.matrix handles broadcasting.

I'll give an example. Suppose we have two matrices:

An array a which has shape [l 1 n]
An array b which has shape [l m 1]

In numpy, the operation a + b would yield a matrix of size [l m n] in which a is broadcast along the second dimension and b is broadcast along the third dimension. Similarly, in Matlab one could call bsxfun(@plus, a ,b) to achieve the same effect.

In core.matrix broadcast can be used to pre-pend dimensions (i.e. transform from [l n] to [m l n] or [l m] to [n l m], but it is not possible to broadcast in a different manner.

One could in principle work around this by doing something like

(transpose (broadcast a [m l n]) [1 0 2])

but unfortunately this operation, while defined in the API, is not implemented.

Is there a fundamental reason why these types of operations cannot be performed in the core.matrix framework? If this is a simple matter of implementing a couple of functions, then we could potentially provide a pull request if some direction can be given on what steps would be needed.

Feb 12 '16 16:02 jwvdm

There are at least three reasons why we do broadcasting the current way:

Avoiding hidden errors: If you are adding arrays with different shapes, it is often a logic bug. e.g. (add [1 2 3] [1])should IMHO be considered an error because the argument shapes are inconsistent.
Performance: it's usually much more efficient to broadcast by adding a new major dimension: can be done with Clojure vectors (for example) simply by repeating the same value in a new vector. Also this form of broadcasting does not require the whole shape to be analysed for each operation: if the dimensionalities are the same, you can assume that no broadcasting is required
Implementation simplicity: for many implementations, broadcasting by duplicating along the major dimension is relatively simple. Supporting arbitrary broadcasting on any dimension would make it much more complex for core.matrix implementers to handle all the required cases.

Having said that: I'm willing to consider an augmented broadcasting function that provides the functionality you need. It can probably be implemented using transpose, reshape and broadcast in a fully generic way that will support any shape.

Feb 13 '16 00:02 mikera

Thank you for the detailed response!

I think it might be helpful to give a concrete use case that occurs fairly commonly in a statistics/machine learning context:

Suppose we have a matrix x of shape [n d] that represents a collection of n data points. Two operations that one might want to perform are:

Calculate a matrix sigma of shape [d d] that contains an empirical estimate of covariances, which one would ideally do using something like:

(mean (mul (reshape x [n 1 d]) 
           (reshape x [n d 1])))

Calculate a matrix delta shape [n n] that contains distances between points, which one would like to do with something like

(let [x-diff (sub (reshape x [1 n d])
                  (reshape x [n 1 d]))]
  (sqrt 
    (sum 
      (transpose 
        (mul x-diff x-diff)))))

We have found ways to implement these operations of course, but this currently requires creative use of transposes that I would argue are more likely to result in errors than accidentally broadcasting across dimensions. The broadcasting behavior shown above is in fact fairly easy to reason about in practice, and it is not for nothing that both Matlab and Numpy implement these types of operations.

In short, I would agree that (add [1 2 3] [1]) should preferably result in an error. That said, it would be extremely helpful to be able to do

(add (reshape [1 2 3] [3 1]) 
     (reshape [1] [1 1]))

If performance is a concern here, then I could imagine requiring that users explicitly call a wrapper function with documented performance characteristics

(broadcast-operator
  add 
  (reshape [1 2 3] [3 1]) 
  (reshape [1] [1 1]))

As a final note, generic broadcasting of this type could indeed be implemented as far as I can tell, but I'm almost certain it will require an implementation of the (transpose m dim-ordering) syntax. Also, as long as this operation is fast, I would imagine that this type of broadcasting could be done without taking too much of a performance hit.

Feb 14 '16 14:02 jwvdm

Understand the use cases, it would be nice to support them providing we can find a way to do this without harming performance in the standard cases.

I'm willing to consider an "enhanced" broadcasting operation that supports this. An observation from your use cases is that you seem to want to do both a reshape and a broadcast simultaneously, perhaps this can inform the design.

Feb 15 '16 09:02 mikera

Came across this issue when trying to do some of Andrew Ng's Deep Learning course problems in Clojure.

The linear part of a forward propagation is, by the course's convention, Z = WA + b, where b is a column vector to be broadcast across the columns of WA. It was a surprise to find that this can't be (easily) done in core.matrix.

Dec 26 '17 21:12 xiongtx