Differentiating function that calls pmap

Open ngiann opened this issue 7 years ago • 0 comments

I have been using ReverseDiff for some time now and my experience with it has been very positive.

I want to use ReverseDiff to differentiate a function which is expensive to evaluate. I thought that what I could do is split the computation of this expensive function between different workers using pmap. Unfortunately, this fails.

I wrote some minimal code that displays the problem. The code below implements a simple linear regression problem. In slightly more detail:

I generate 100 5-dim inputs x and a 5-dim weight vector w.
I then multiply them to get the 1-dimensional output targets y. I also add a bit of noise on y.
The linear regression problem consists in recovering the weight vector w starting from some random weight vector w0
I then set up the objective function that calculates the mean squared error. It is this objective function which contains the call to pmap.
I then define the gradient of my objective using ReverseDiff.
The defined objective and gradient are passed to Optim.optimize so that the objective is minimised, which leads to the estimate ŵ for the true weight vector.
Finally, the code reports the initial weight vector w0 (where the optimisation started from) the best best estimate ŵ and the true weight vector w. If the code below works correctly, one should see that ŵ and w are very similar.

Before running the code, I add two workers with addprocs(2). If you run the code below, you will notice that it doesn't work as expected. What we actually see is that ŵ is identical to w0, i.e. it is as if the optimisation never took place.

However: if I change the pmap in the objective function to map, it all works, meaning that ŵ is indeed very close to w.

Is this a (known) issue or is there something wring with the code? Many thanks.

@everywhere using ReverseDiff
using Optim

function demo_pmap()

  # fix random seed
  srand(101)

  # generate some data for linear regression problem

  # number of data items
  N = 100
  # inputs
  x = [randn(5) for n=1:N]
  # true weights
  w = randn(5)
  # output targets
  t = [dot(w,xn)+0.01*randn() for xn in x]

  # set up objective function to minimise
  function objective(w)

      # error due to one input-output (x,t) pair
      function aux(t,x)
        return (t-dot(w,x))^2
      end

      # evaluate in parallel and return error to be minimised
      reduce(+, pmap(aux, t, x))
  end

  # define gradient using AD
  function helper!(storage, w)
    # One could pre-compute the tape for efficiency
    ReverseDiff.gradient!(storage, objective, w)
  end

  # minimise using Optim
  w0 = randn(size(w))
  ŵ  = optimize(objective, helper!, w0, LBFGS(), Optim.Options(iterations=100)).minimizer

  # report - last two should be close
  display(w0)
  display(ŵ)
  display(w)

end

Jul 15 '18 21:07 ngiann