cortex Use `.item()` to collect results and losses-as-results only at an epoch's end

Using .item() to store results in routine call forces GPU to synchronize in order to have access at a lazy-evaluated Python number. This is suboptimal as kernel scheduling (CPU load) and kernel execution (GPU load) should be as parallel pipelines as possible, resulting in delays in the opposite case.

On the other hand, we need _all_epoch_results at the end of an epoch for visualization purposes. As @obilaniu has noted elsewhere, it's better to use .detach() to store results within a training step, and then let's process results+losses-as-results internally to get the Python/Numpy values, at the moment they are actually needed - that's the end of an epoch.

Jan 20 '19 20:01 tsirif

I see that fetching is certainly done properly at the end of the epoch with _lib.utils.convert_to_numpy function. Then, using .detach() instead of .item() in model plugin implementation is sufficient I think.

Jan 20 '19 21:01 tsirif

Interesting, I wasn't aware of this distinction. Is there a backend solution that can manage this, or is it up to the user when they design routines?

Jan 26 '19 16:01 rdevon

I have implemented it a solution, with a function nested_detach and apply it to the per routine isolated results. I will make PR soon. So now the user should, either provide just a float, or numpy ndarray, or torch tensor (detached or not).

Jan 26 '19 18:01 tsirif