I would suggest using the microbenchmark package to do the time
comparison. This will run each a bunch of times for a more meaningful
comparison.
One possible reason for the difference is the number of missing values
in your data (along with the number of columns). Consider the
difference in the following results:
> x <- c(1,2,NA)
> x[x==1]
[1] 1 NA> x[which(x==1)]
[1] 1
On Sat, Oct 10, 2020 at 5:25 PM 1/k^c <kchamberln at gmail.com>
wrote:>
> Hi R-helpers,
>
> Does anyone know why adding which() makes the select call more
> efficient than just using logical selection in a dataframe? Doesn't
> which() technically add another conversion/function call on top of the
> logical selection? Here is a reproducible example with a slight
> difference in timing.
>
> # Surrogate data - the timing here isn't interesting
> urltext <- paste("https://drive.google.com/",
> "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
> "-h8&export=download", sep="")
> download.file(url=urltext, destfile="tempfile.csv") # download
file first
> dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE,
header=TRUE,
> nrows=2.5e6) # read the file; 'nrows' is a slight
> # overestimate
> dat <- dat[,1:3] # select just the first 3 columns
> head(dat, 10) # print the first 10 rows
>
> # Select using which() as the final step ~ 90ms total time on my macbook
air
> system.time(
> head(
> dat[which(dat$gender2=="other"),],),
> gcFirst=TRUE)
>
> # Select skipping which() ~130ms total time
> system.time(
> head(
> dat[dat$gender2=="other", ]),
> gcFirst=TRUE)
>
> Now I would think that the second one without which() would be more
> efficient. However, every time I run these, the first version, with
> which() is more efficient by about 20ms of system time and 20ms of
> user time. Does anyone know why this is?
>
> Cheers!
> Keith
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com