Displaying 4 results from an estimated 4 matches for "jackarnestad".
2018 Apr 14
0
Efficient way to subset rows in R for dataset with 10^7 columns
...dy probably already using virtual memory which is saved to and from hard disk storage as needed.
Working in Spark with a distributed file system like Hadoop might solve some of these problems... but I haven't done real work with such tools.
On April 13, 2018 6:31:32 PM PDT, Jack Arnestad <jackarnestad at gmail.com> wrote:
>Yes unfortunately. The goal of the "outer" is to do feature selection
>before fitting it to a model.
>
>Is there a way it could be parallelized?
>
>Thanks!
>
>On Fri, Apr 13, 2018 at 9:08 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca....
2018 Apr 14
2
Efficient way to subset rows in R for dataset with 10^7 columns
I have a data.table with dimensions 100 by 10^7.
When I do
trainIndex <-
caret::createDataPartition(
df$status,
p = .9,
list = FALSE,
times = 1
)
outerTrain <- df[trainIndex]
outerTest <- df[-trainIndex]
Subsetting the rows of df takes over 20 minutes.
What is the best way to efficiently subset this?
Thanks!
[[alternative
2018 Apr 21
2
Removing columns from big.matrix which have only one value
I have a very large binary matrix, stored as a big.matrix to conserve
memory (it is over 2 gb otherwise - 5 million columns and 100 rows).
r <- 100
c <- 10000
m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)
m4 <- cbind(m4, 1)
m4 <- as.big.matrix(m4)
I need to remove every column which has only one unique value (in this
case, only 0s or only 1s). Because of the number of columns, I
2017 Dec 02
0
How can you find the optimal number of values to randomly sample to optimize random forest classification without trial and error?
I have data set up like the following:
control1 <- sample(1:75, 3947398, replace=TRUE)
control2 <- sample(1:75, 28793, replace=TRUE)
control3 <- sample(1:100, 392733, replace=TRUE)
control4 <- sample(1:75, 858383, replace=TRUE)
patient1 <- sample(1:100, 28048, replace=TRUE)
patient2 <- sample(1:50, 80400, replace=TRUE)
patient3 <- sample(1:100, 48239, replace=TRUE)
control