Displaying 3 results from an estimated 3 matches for "cell_col".
Did you mean:
cell_co
2018 May 25
2
how to make the code more efficient using lapply
...ite
slow. How to make it faster using lapply function ? Thanks in advance!
temp.df<-c() # create an empty list to store the extracted result from each
excel file inside for-loop
for (i in list.files()) { # loop through each excel file in the directory
temp<-read_xlsx(i,sheet=1,range=cell_cols(c(1,30,38:42))) # from package
"readxl" to read in excel file
temp<-temp[grep("^geneA$|^geneB$|^geneC$",temp$Id),] # extract rows
based on temp$id
names(temp)<-gsub("^.*] ","",names(temp)) # clean up column names
temp.df<-append(temp.df,...
2018 May 25
0
how to make the code more efficient using lapply
...move that section outside
of the loop.
It will be executed when the loop finishes. As it is you are calling
list.files() each time
through the loop which could be slow.
In any case here's a possible way to do it. Warning: untested!
f <- function(fn) {
temp<-read_xlsx(fn,sheet=1,range=cell_cols(c(1,30,38:42)))
temp<-temp[temp$Id %in% c("geneA","geneB","geneC"),]
}
myL <- lapply( X=list.files(), FUN=f )
temp.df.all<-do.call("rbind",myL)
names(temp.df.all)<-gsub("^.*] ","",names(temp.df.all))
write_xlsx(temp.df.all,...
2018 May 25
1
how to make the code more efficient using lapply
...down (though probably not by much). Call it outside the loop, save the results in a vector, and use the vector inside the loop.
Here's another way (also untested).
infiles <- list.files()
nfiles <- length(infiles)
## read the first file
dfall <- read_xlsx(infiles[1], sheet=1, range=cell_cols(c(1,30,38:42)))
dfall <- dfall[dfall$Id %in% c("geneA","geneB","geneC") , ]
## I'm going to assume the colnames are all the same on input
## if that's wrong, then they have to be fixed inside the loop
## read the remaining files, appending their contents...