Differentiating function that calls pmap
I have been using ReverseDiff for some time now and my experience with it has been very positive.
I want to use ReverseDiff to differentiate a function which is expensive to evaluate. I thought that what I could do is split the computation of this expensive function between different workers using pmap.
Unfortunately, this fails.
I wrote some minimal code that displays the problem. The code below implements a simple linear regression problem. In slightly more detail:
- I generate 100 5-dim inputs
xand a 5-dim weight vectorw. - I then multiply them to get the 1-dimensional output targets
y. I also add a bit of noise ony. - The linear regression problem consists in recovering the weight vector
wstarting from some random weight vectorw0 - I then set up the objective function that calculates the mean squared error. It is this objective function which contains the call to
pmap. - I then define the gradient of my objective using ReverseDiff.
- The defined objective and gradient are passed to
Optim.optimizeso that the objective is minimised, which leads to the estimateŵfor the true weight vector. - Finally, the code reports the initial weight vector
w0(where the optimisation started from) the best best estimateŵand the true weight vectorw. If the code below works correctly, one should see thatŵandware very similar.
Before running the code, I add two workers with addprocs(2).
If you run the code below, you will notice that it doesn't work as expected. What we actually see is that ŵ is identical to w0, i.e. it is as if the optimisation never took place.
However: if I change the pmap in the objective function to map, it all works, meaning that ŵ is indeed very close to w.
Is this a (known) issue or is there something wring with the code? Many thanks.
@everywhere using ReverseDiff
using Optim
function demo_pmap()
# fix random seed
srand(101)
# generate some data for linear regression problem
# number of data items
N = 100
# inputs
x = [randn(5) for n=1:N]
# true weights
w = randn(5)
# output targets
t = [dot(w,xn)+0.01*randn() for xn in x]
# set up objective function to minimise
function objective(w)
# error due to one input-output (x,t) pair
function aux(t,x)
return (t-dot(w,x))^2
end
# evaluate in parallel and return error to be minimised
reduce(+, pmap(aux, t, x))
end
# define gradient using AD
function helper!(storage, w)
# One could pre-compute the tape for efficiency
ReverseDiff.gradient!(storage, objective, w)
end
# minimise using Optim
w0 = randn(size(w))
ŵ = optimize(objective, helper!, w0, LBFGS(), Optim.Options(iterations=100)).minimizer
# report - last two should be close
display(w0)
display(ŵ)
display(w)
end