thr3ads.net - R help - [R] Data Extraction

If this information is useful, please help other people find it:
Share via:

Muhuri, Pradip (SAMHSA/CBHSQ)

2012-Nov-22 17:20 UTC

[R] Data Extraction - benchmark()

Hi Berend,

I see you are one of the contributors to the rbecnhmark package. 

I am sorry that I am bothering you again.  I have tried to run your  code
(slightly tweaked)  involving the benchmark function, and I am getting the
following error message. What am I doing wrong?


Error in benchmark(d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4 <-
s4(df),  :
  could not find function "s1"
> 
> identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5),
identical (d1,d6)Error: unexpected ',' in "identical (d1,d2),"
> sessionInfo ()R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rbenchmark_1.0.0

loaded via a namespace (and not attached):
[1] tools_2.15.1



I would appreciate receiving your help if your time permits ..


Thanks and regards,

Pradip Muhuri

#####  Berend's code extended
N <- 100000
set.seed(13)
df<-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50))
s1 <- df[complete.cases(df),]
s2 <- na.omit(df)
s3 <- df[apply(df, 1, function(x)all(!is.na(x))), ]
s4 <- function(df) {df[apply(df, 1,
function(x)all(!is.na(x))),][,1:ncol(df)]}
s5 <- function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
s6 <- function(df) {df[complete.cases(df),][1:ncol(df)]}

require(rbenchmark)
 
benchmark( d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4 <- s4(df),
d5 <- s5(df), d6 <- s6(df),
                    columns=c("test","elapsed",
"relative", "replications") )

identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5),
identical (d1,d6)




________________________________________
From: Berend Hasselman [bhh at xs4all.nl]
Sent: Thursday, November 22, 2012 11:03 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help at r-project.org
Subject: Re: [R] Data Extraction

On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hi Berend,
>
> You have compared all 3 ways.  ... very nicely evaluated.
>
Bert's solution is indeed nice and simple. But Petr's solution is still
the quickest:
>N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
> f2 <- function(df) {df[!is.na(rowSums(df)),]}
> f3 <- function(df) {df[complete.cases(df),]}
> f4 <- function(df) {data.frame(na.omit(df))}
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <-
f4(df), columns=c("test","elapsed", "relative",
"replications"))          test elapsed relative replications
1 d1 <- f1(df)   3.588   14.888          100
2 d2 <- f2(df)   0.403    1.672          100
3 d3 <- f3(df)   0.241    1.000          100
4 d4 <- f4(df)   0.557    2.311          100>
> identical(d1,d2)
[1] TRUE> identical(d1,d3)
[1] TRUE> identical(d1,d4)[1] TRUE

Berend

Berend Hasselman

2012-Nov-22 17:42 UTC

head link

[R] Data Extraction - benchmark()

On 22-11-2012, at 18:20, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hi Berend,
> 
> I see you are one of the contributors to the rbecnhmark package. 
> 
> I am sorry that I am bothering you again.  I have tried to run your  code
(slightly tweaked)  involving the benchmark function, and I am getting the
following error message. What am I doing wrong?
> 
> 
> Error in benchmark(d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4
<- s4(df),  :
>  could not find function "s1"
> 

Because you haven't defined a function s1 (or s2, s3, s4 for that matter).
You did s1 <- df[complete.cases(df),]

Berend
>> 
>> identical (d1,d2), identical (d1,d3), identical (d1,d4), identical
(d1,d5), identical (d1,d6)
> Error: unexpected ',' in "identical (d1,d2),"
> 
>> sessionInfo ()
> R version 2.15.1 (2012-06-22)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] rbenchmark_1.0.0
> 
> loaded via a namespace (and not attached):
> [1] tools_2.15.1
> 
> 
> 
> I would appreciate receiving your help if your time permits ..
> 
> 
> Thanks and regards,
> 
> Pradip Muhuri
> 
> #####  Berend's code extended
> N <- 100000
> set.seed(13)
> df<-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50))
> s1 <- df[complete.cases(df),]
> s2 <- na.omit(df)
> s3 <- df[apply(df, 1, function(x)all(!is.na(x))), ]
> s4 <- function(df) {df[apply(df, 1,
function(x)all(!is.na(x))),][,1:ncol(df)]}
> s5 <- function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
> s6 <- function(df) {df[complete.cases(df),][1:ncol(df)]}
> 
> require(rbenchmark)
> 
> benchmark( d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4 <-
s4(df), d5 <- s5(df), d6 <- s6(df),
>                    columns=c("test","elapsed",
"relative", "replications") )
> 
> identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5),
identical (d1,d6)
> 
> 
> 
> 
> ________________________________________
> From: Berend Hasselman [bhh at xs4all.nl]
> Sent: Thursday, November 22, 2012 11:03 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: r-help at r-project.org
> Subject: Re: [R] Data Extraction
> 
> On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> 
>> Hi Berend,
>> 
>> You have compared all 3 ways.  ... very nicely evaluated.
>> 
> 
> Bert's solution is indeed nice and simple. But Petr's solution is
still the quickest:
> 
>> N <- 100000
>> set.seed(13)
>> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
>> library(rbenchmark)
>> 
>> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
>> f2 <- function(df) {df[!is.na(rowSums(df)),]}
>> f3 <- function(df) {df[complete.cases(df),]}
>> f4 <- function(df) {data.frame(na.omit(df))}
>> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <-
f4(df), columns=c("test","elapsed", "relative",
"replications"))
>          test elapsed relative replications
> 1 d1 <- f1(df)   3.588   14.888          100
> 2 d2 <- f2(df)   0.403    1.672          100
> 3 d3 <- f3(df)   0.241    1.000          100
> 4 d4 <- f4(df)   0.557    2.311          100
>> 
>> identical(d1,d2)
> [1] TRUE
>> identical(d1,d3)
> [1] TRUE
>> identical(d1,d4)
> [1] TRUE
> 
> Berend
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Nov 2012 - Data Extraction - benchmark()

[R] Data Extraction - benchmark()

[R] Data Extraction - benchmark()

Apparently Analagous Threads