GPflow icon indicating copy to clipboard operation
GPflow copied to clipboard

Training loss gradients slow down with multiple calls in quick succession

Open a-mch opened this issue 5 years ago • 6 comments

Computing the gradients of a model training loss repeatedly in a loop apparently leads to a significant loss in performance compared to when the gradient calls are spaced out. This seems to be unique to GPflow - a similar test on a pure TensorFlow loss function didn't lead to the same behaviour.

To reproduce

Minimal, reproducible example

import time
import numpy as np
import tensorflow as tf
import gpflow

# Setup model
inputs = np.random.randn(1000, 5)
targets = np.random.randn(1000, 1)
kernel = gpflow.kernels.SquaredExponential()
model = gpflow.models.GPR((inputs, targets), kernel=kernel)

# Evaluate loss to initialise Tensorflow
model.training_loss()

# Repeated calls with minimal wait between - SLOW
print('Calls with minimal wait between:')
for _ in range(10):
    t0 = time.time()
    with tf.GradientTape() as g:
        loss = model.training_loss()
    dldp = g.gradient(loss, model.trainable_variables)
    print('\tComputation took %.3f milliseconds' % ((time.time() - t0) * 1000))
    time.sleep(0.02)

time.sleep(1)

# Repeated calls with longer wait between - FAST
print('\nCalls with longer wait between:')
for _ in range(10):
    t0 = time.time()
    with tf.GradientTape() as g:
        loss = model.training_loss()
    dldp = g.gradient(loss, model.trainable_variables)
    print('\tComputation took %.3f milliseconds' % ((time.time() - t0) * 1000))
    time.sleep(0.2)

Output Calls with minimal wait between: Computation took 28.922 milliseconds Computation took 146.606 milliseconds Computation took 135.637 milliseconds Computation took 128.655 milliseconds Computation took 129.651 milliseconds Computation took 128.657 milliseconds Computation took 127.656 milliseconds Computation took 129.652 milliseconds Computation took 130.649 milliseconds Computation took 128.654 milliseconds

Calls with longer wait between: Computation took 23.936 milliseconds Computation took 23.937 milliseconds Computation took 23.935 milliseconds Computation took 24.935 milliseconds Computation took 24.935 milliseconds Computation took 23.935 milliseconds Computation took 24.934 milliseconds Computation took 24.934 milliseconds Computation took 24.933 milliseconds Computation took 24.933 milliseconds

Expected behaviour

Subsequent gradient calls to take the (roughly) same length of time.

System information

  • GPflow version: 2.0.5
  • GPflow installed from: pip install gpflow
  • TensorFlow version: v2.2.0
  • Python version: 3.7.6
  • Operating system: Windows 10

a-mch avatar Jul 27 '20 12:07 a-mch

That is very odd. Can't think of anything off the top of my head that could cause it though, given that you're evaluating a sequence of tf operations...

For speed, it's recommended to use tf.function() to avoid doing everything in eager mode. Does the same thing happen then?

markvdw avatar Jul 28 '20 13:07 markvdw

Thanks for suggestion, I have tried extracting,

@tf.function
def compute_gradient(model_):
    with tf.GradientTape() as g:
        loss_flow = model_.training_loss()
    return g.gradient(loss_flow, model_.trainable_variables)

and calling this function in the loop but that doesn't make any difference (other than a minor speed-up on the quick evaluations).

I have tried the same code on three separate machines, each with different GPUs and got the same results so it does seem to be repeatable. Interestingly, if I increase the size of the training set then I have to increase the sleep time between loop iterations to see the unaffected times - perhaps the data is being moved/cleared in the background, which is affecting subsequent calls.

a-mch avatar Jul 29 '20 13:07 a-mch

@a-mch I've just tried to reproduce this but couldn't (TensorFlow 2.2.0, but on Linux and on the CPU). I had to slightly adjust your minimal example (and updated your issue accordingly): Can you check again by copying that into a fresh file? Can you confirm whether this is an issue only on GPU or on CPU as well?

st-- avatar Oct 02 '20 14:10 st--

(NB- a function you wrap in tf.function() should not take complex objects such as the model as input!)

st-- avatar Oct 02 '20 14:10 st--

So looks like this is a GPU-only issue, and it still seems to be present with TF 2.3 and GPflow 2.1/develop. @a-mch what did you use as a "native TF" example for comparison where you say this issue isn't present? I'm suspecting it's got something to do with large matrices (GPR constructs an N x N matrix) getting shuffled to/from the GPU...

st-- avatar Oct 06 '20 14:10 st--

Hi, thanks for looking into it. I tried computing derivatives of a sq exp covariance function written with TensorFlow functions, like so,

import tensorflow as tf
import time


def compute(aa, bb, ll):
    dist = ((aa[:, None, :] - bb[None, :, :]) / ll[None, None, :]) ** 2
    return tf.reduce_sum(tf.exp(-tf.reduce_sum(dist, 2)))


def compute_gradient(a, b, l):
    with tf.GradientTape() as g:
        g.watch(l)
        y = compute(a, b, l)
    return g.gradient(y, l)


def run_test(num_repeats):
    a = tf.random.normal((1000, 5))
    b = tf.random.normal((1000, 5))
    l = tf.random.normal((5, ))

    # Initial call to initialise TensorFlow
    dydl = compute_gradient(a, b, l)

    # Call gradient tape in loop
    for _ in range(num_repeats):
        t0 = time.time()
        dydl = compute_gradient(a, b, l)
        len(dydl)                   # sanity check that it has actually computed the derivative
        print('Derivative took %.3f milliseconds' % ((time.time() - t0) * 1000))


if __name__ == '__main__':
    run_test(10)

Each iteration of the loop takes roughly the same time i.e. no slow down after the first iteration: Derivative took 2.991 milliseconds Derivative took 2.991 milliseconds Derivative took 2.993 milliseconds Derivative took 2.991 milliseconds Derivative took 3.991 milliseconds Derivative took 2.988 milliseconds Derivative took 2.032 milliseconds Derivative took 2.991 milliseconds Derivative took 2.000 milliseconds Derivative took 2.951 milliseconds

You are also correct that it the issue only appears on the GPU, if I run the code on the CPU then each loop iteration looks to take a similar amount of time.

a-mch avatar Oct 06 '20 15:10 a-mch