thr3ads.net - search: "arnestad"

Displaying 4 results from an estimated 4 matches for "arnestad".

Efficient way to subset rows in R for dataset with 10^7 columns

2018 Apr 14

Efficient way to subset rows in R for dataset with 10^7 columns

...You are already probably already using virtual memory which is saved to and from hard disk storage as needed. Working in Spark with a distributed file system like Hadoop might solve some of these problems... but I haven't done real work with such tools. On April 13, 2018 6:31:32 PM PDT, Jack Arnestad <jackarnestad at gmail.com> wrote: >Yes unfortunately. The goal of the "outer" is to do feature selection >before fitting it to a model. > >Is there a way it could be parallelized? > >Thanks! > >On Fri, Apr 13, 2018 at 9:08 PM, Jeff Newmiller ><jdnewmil...

Removing columns from big.matrix which have only one value

2018 Apr 21

Removing columns from big.matrix which have only one value

I have a very large binary matrix, stored as a big.matrix to conserve memory (it is over 2 gb otherwise - 5 million columns and 100 rows). r <- 100 c <- 10000 m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c) m4 <- cbind(m4, 1) m4 <- as.big.matrix(m4) I need to remove every column which has only one unique value (in this case, only 0s or only 1s). Because of the number of columns, I

Efficient way to subset rows in R for dataset with 10^7 columns

2018 Apr 14

Efficient way to subset rows in R for dataset with 10^7 columns

I have a data.table with dimensions 100 by 10^7. When I do trainIndex <- caret::createDataPartition( df$status, p = .9, list = FALSE, times = 1 ) outerTrain <- df[trainIndex] outerTest <- df[-trainIndex] Subsetting the rows of df takes over 20 minutes. What is the best way to efficiently subset this? Thanks! [[alternative

How can you find the optimal number of values to randomly sample to optimize random forest classification without trial and error?

2017 Dec 02

How can you find the optimal number of values to randomly sample to optimize random forest classification without trial and error?

I have data set up like the following: control1 <- sample(1:75, 3947398, replace=TRUE) control2 <- sample(1:75, 28793, replace=TRUE) control3 <- sample(1:100, 392733, replace=TRUE) control4 <- sample(1:75, 858383, replace=TRUE) patient1 <- sample(1:100, 28048, replace=TRUE) patient2 <- sample(1:50, 80400, replace=TRUE) patient3 <- sample(1:100, 48239, replace=TRUE) control

search for: arnestad