search for: jackarnestad

Displaying 4 results from an estimated 4 matches for "jackarnestad".

2018 Apr 14
0
Efficient way to subset rows in R for dataset with 10^7 columns
...dy probably already using virtual memory which is saved to and from hard disk storage as needed. Working in Spark with a distributed file system like Hadoop might solve some of these problems... but I haven't done real work with such tools. On April 13, 2018 6:31:32 PM PDT, Jack Arnestad <jackarnestad at gmail.com> wrote: >Yes unfortunately. The goal of the "outer" is to do feature selection >before fitting it to a model. > >Is there a way it could be parallelized? > >Thanks! > >On Fri, Apr 13, 2018 at 9:08 PM, Jeff Newmiller ><jdnewmil at dcn.davis.ca....
2018 Apr 14
2
Efficient way to subset rows in R for dataset with 10^7 columns
I have a data.table with dimensions 100 by 10^7. When I do trainIndex <- caret::createDataPartition( df$status, p = .9, list = FALSE, times = 1 ) outerTrain <- df[trainIndex] outerTest <- df[-trainIndex] Subsetting the rows of df takes over 20 minutes. What is the best way to efficiently subset this? Thanks! [[alternative
2018 Apr 21
2
Removing columns from big.matrix which have only one value
I have a very large binary matrix, stored as a big.matrix to conserve memory (it is over 2 gb otherwise - 5 million columns and 100 rows). r <- 100 c <- 10000 m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c) m4 <- cbind(m4, 1) m4 <- as.big.matrix(m4) I need to remove every column which has only one unique value (in this case, only 0s or only 1s). Because of the number of columns, I
2017 Dec 02
0
How can you find the optimal number of values to randomly sample to optimize random forest classification without trial and error?
I have data set up like the following: control1 <- sample(1:75, 3947398, replace=TRUE) control2 <- sample(1:75, 28793, replace=TRUE) control3 <- sample(1:100, 392733, replace=TRUE) control4 <- sample(1:75, 858383, replace=TRUE) patient1 <- sample(1:100, 28048, replace=TRUE) patient2 <- sample(1:50, 80400, replace=TRUE) patient3 <- sample(1:100, 48239, replace=TRUE) control