high-performance {tabnet} profvis on CPU

Open cregouby opened this issue 4 years ago • 0 comments

This issue aims at improving tabnet performance through common tools and understanding of where to put effort on

proposed performance script

The goal here is to have the largest batch available to run on a CPU, in order to favor time spent in compute over time spent in data movement.

library(tabnet)

# use local caching
d_train <- data.table::fread(pins::pin("https://s3.amazonaws.com/benchm-ml--main/train-0.1m.csv"), stringsAsFactors=TRUE)
d_test <- data.table::fread(pins::pin("https://s3.amazonaws.com/benchm-ml--main/test.csv"))

## align cat. values (factors)
d_train_test <- rbind(d_train, d_test)
n1 <- nrow(d_train)
n2 <- nrow(d_test)
d_train <- d_train_test[1:n1,]
d_test <- d_train_test[(n1+1):(n1+n2),]


system.time({
  md <- tabnet_fit(dep_delayed_15min ~ . ,d_train, device="cpu",
                   epochs = 5, batch_size = 1024^2,
                   virtual_batch_size=262144, verbose = TRUE)
})

result table proposed

CPU Linux

Actual CPU profile	Expected CPU profile	Actual profvis flame graph
		!!

Profviz Data

CPU Windows

Actual CPU profile	Expected CPU profile	Actual profvis flame graph

Profviz Data

CPU MacOS

Actual CPU profile	Expected CPU profile	Actual profvis flame graph

Profviz Data

Jan 31 '22 17:01 cregouby