datastructures icon indicating copy to clipboard operation
datastructures copied to clipboard

Is `insert` slow?

Open talgalili opened this issue 7 years ago • 2 comments

I came across your packages (which is cool!), I've played with adding an as.list.stack function, but it appears to be very slow. Below is the code, please let me know if I've missed anything:

# install.packages("datastructures")
# install.packages("microbenchmark")

library("datastructures")
library("microbenchmark")

as.list.stack <- function(x, ..., mode) {
  # x is a stack
  # mode is based on the class of the top element
  n <- size(x)
  top <- peek(x)
  if(missing(mode)) {
    if(is.list(top)) {
      mode <- "list"
    } else {
      mode <- class(top)
    }
  }
  out <- vector(n, mode = mode)
  for(i in n:1) {
    out[[i]] <- pop(x)
  }
  
  out
}


##
# Basic benchmarking:
f1 <- function(n = 100) {
  x <- c()
  for(i in 1:n) {
    x <- c(x, i)
  }
  x
}

f2 <- function(n = 100) {
  x <- stack()
  for(i in 1:n) {
    insert(x, i)
  }
  as.list(x)
}

# f2(10)

Benchmarking:

> library("microbenchmark")
> microbenchmark(f1(100),f2(100), times = 10)
Unit: microseconds
    expr       min        lq       mean    median        uq       max neval
 f1(100)    73.661    99.357   141.3709   124.345   197.047   210.013    10
 f2(100) 10904.531 11115.825 13590.8941 11988.055 17615.234 18755.162    10
> microbenchmark(f1(10000),f2(10000), times = 10)
Unit: milliseconds
      expr     min       lq     mean   median       uq       max neval
 f1(10000) 190.201 203.9211 230.4935 227.3870 246.1155  291.2356    10
 f2(10000) 581.717 642.4793 699.8962 655.3675 701.4607 1004.4701    10

When doing some profiling, it seems that the insert operation is very expansive:

library(profvis)
profvis({
  n <- 10000
  x <- c()
  for(i in 1:n) {
    x <- c(x, i)
  }
  x
})
profvis({
  n <- 10000
  x <- stack()
  for(i in 1:n) {
    insert(x, i)
  }
  as.list(x)
})

I'm suspecting this might be because insert is a generic method and it has to do a lookup everytime I call it (and since it is S4, it might make it a bit slow). I've tried to find the function directly but failed to get it:

> # showMethods("insert")
> getMethod("insert", "stack")
Error in getMethod("insert", "stack") : 
  no method found for function 'insert' and signature stack

Any thoughts?

talgalili avatar Nov 09 '18 20:11 talgalili

Hey, yes you are right. Iteratively inserting scalars is rather slow compared to inserting an entire vector. So I'd just do

insert(x, seq(n))

I guess that's due to the Rcpp/Boost call. I haven't found to time to benchmark the package, so your efforts are highly appreciated, thanks.

dirmeier avatar Nov 12 '18 10:11 dirmeier

Interesting. But then, would that be faster than R?! (worth checking)

Do you see a way in which it could become faster?!

talgalili avatar Nov 12 '18 10:11 talgalili