avalanche icon indicating copy to clipboard operation
avalanche copied to clipboard

AvalancheConcatDataset is slow for online buffer updates

Open HamedHemati opened this issue 3 years ago • 3 comments

As pointed out in PR #1024, in online strategies buffers are updated after every (sub-)experience, therefore creating an instance of AvalancheConcatDataset after every experience slows down the update process. This is not a problem for regular/offline strategies. Keeping the data in a list (and possibly with their transformations) and using the first N (eg: 1<n<=10) samples to create a small buffer dataset is always an option, but does anyone know a better way to handle creating large datasets in an efficient way?

HamedHemati avatar May 13 '22 14:05 HamedHemati

Are you sure that the problem is the AvalancheConcatDataset? I tried to measure its runtime some time ago and it was quite fast, but I may remember correctly. Can you check the runtime of the buffer update without any other strategy code?

AntonioCarta avatar May 16 '22 07:05 AntonioCarta

So, to find the the bottleneck in the update_from_dataset function of ReservoirSamplingBuffer, I added the following condition to avoid dataset concatenation after the buffer is full as a test to make sure that it's the dataset concatenation that is slowing down the update:

        new_weights = torch.rand(len(new_data))

        cat_weights = torch.cat([new_weights, self._buffer_weights])
       
        # ADDED this:
        if len(self.buffer) == self.max_size:
                    cat_data = self.buffer
        else:
                    cat_data = AvalancheConcatDataset([new_data, self.buffer])

        sorted_weights, sorted_idxs = cat_weights.sort(descending=True)

        buffer_idxs = sorted_idxs[: self.max_size]
        self.buffer = AvalancheSubset(cat_data, buffer_idxs)
        self._buffer_weights = sorted_weights[: self.max_size]

This was still a bit slower than online navie (which was expected) but is much faster than dataset concatenation after each experience.

HamedHemati avatar May 17 '22 08:05 HamedHemati

These are the results that I get from a quick profiling script after 11552 iterations (batch size = 1):

  • reservoir sampling: 116.8 sec
  • random sampling: 71.3

Not optimal but not that bad as a starting point, considering that the training loop will often be much more expensive.

from unittest.mock import Mock

import time
from os.path import expanduser

from tqdm import tqdm

from avalanche.benchmarks import fixed_size_experience_split, SplitMNIST
from avalanche.training import ReservoirSamplingBuffer
from avalanche.training import ParametricBuffer

benchmark = SplitMNIST(
    n_experiences=5,
    dataset_root=expanduser("~") + "/.avalanche/data/mnist/",
)

experience = benchmark.train_stream[0]
print("len experience: ", len(experience.dataset))

# start = time.time()
# buffer = ReservoirSamplingBuffer(100)
# for exp in tqdm(fixed_size_experience_split(experience, 1)):
#     buffer.update_from_dataset(exp.dataset)
#
# end = time.time()
# duration = end - start
# print("ReservoirSampling Duration: ", duration)


start = time.time()
buffer = ParametricBuffer(100)
for exp in tqdm(fixed_size_experience_split(experience, 1)):
    buffer.update(Mock(experience=exp, dataset=exp.dataset))

end = time.time()
duration = end - start
print("ParametricBuffer (random sampling) Duration: ", duration)

AntonioCarta avatar May 17 '22 11:05 AntonioCarta