AvalancheConcatDataset is slow for online buffer updates
As pointed out in PR #1024, in online strategies buffers are updated after every (sub-)experience, therefore creating an instance of AvalancheConcatDataset after every experience slows down the update process. This is not a problem for regular/offline strategies. Keeping the data in a list (and possibly with their transformations) and using the first N (eg: 1<n<=10) samples to create a small buffer dataset is always an option, but does anyone know a better way to handle creating large datasets in an efficient way?
Are you sure that the problem is the AvalancheConcatDataset? I tried to measure its runtime some time ago and it was quite fast, but I may remember correctly. Can you check the runtime of the buffer update without any other strategy code?
So, to find the the bottleneck in the update_from_dataset function of ReservoirSamplingBuffer, I added the following condition to avoid dataset concatenation after the buffer is full as a test to make sure that it's the dataset concatenation that is slowing down the update:
new_weights = torch.rand(len(new_data))
cat_weights = torch.cat([new_weights, self._buffer_weights])
# ADDED this:
if len(self.buffer) == self.max_size:
cat_data = self.buffer
else:
cat_data = AvalancheConcatDataset([new_data, self.buffer])
sorted_weights, sorted_idxs = cat_weights.sort(descending=True)
buffer_idxs = sorted_idxs[: self.max_size]
self.buffer = AvalancheSubset(cat_data, buffer_idxs)
self._buffer_weights = sorted_weights[: self.max_size]
This was still a bit slower than online navie (which was expected) but is much faster than dataset concatenation after each experience.
These are the results that I get from a quick profiling script after 11552 iterations (batch size = 1):
- reservoir sampling: 116.8 sec
- random sampling: 71.3
Not optimal but not that bad as a starting point, considering that the training loop will often be much more expensive.
from unittest.mock import Mock
import time
from os.path import expanduser
from tqdm import tqdm
from avalanche.benchmarks import fixed_size_experience_split, SplitMNIST
from avalanche.training import ReservoirSamplingBuffer
from avalanche.training import ParametricBuffer
benchmark = SplitMNIST(
n_experiences=5,
dataset_root=expanduser("~") + "/.avalanche/data/mnist/",
)
experience = benchmark.train_stream[0]
print("len experience: ", len(experience.dataset))
# start = time.time()
# buffer = ReservoirSamplingBuffer(100)
# for exp in tqdm(fixed_size_experience_split(experience, 1)):
# buffer.update_from_dataset(exp.dataset)
#
# end = time.time()
# duration = end - start
# print("ReservoirSampling Duration: ", duration)
start = time.time()
buffer = ParametricBuffer(100)
for exp in tqdm(fixed_size_experience_split(experience, 1)):
buffer.update(Mock(experience=exp, dataset=exp.dataset))
end = time.time()
duration = end - start
print("ParametricBuffer (random sampling) Duration: ", duration)