thr3ads.net - R help - [R] by (tapply) and for loop differences [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Bashir Saghir (Aztek Global)

2005-Jul-05 10:04 UTC

[R] by (tapply) and for loop differences

I am getting a difference in results when running some analysis using by and
tapply compare to using a for loop. I've tried searching the web but had no
luck with the keywords I used.
 
I've attached a simple example below to illustrates my problem. I get a
difference in the mean of yvar, diff and the p-value using tapply & by
compared to a for loop. I cannot see what I am doing wrong. Can anyone help?
> # Simulate some data - I'll do 2 simulations...
> 
> xvar = rnorm(40, 20, 5)
> yvar = rnorm(40, 22, 2)
> num = factor(rep(1:2, each=20))
> sdat = data.frame(cbind(num, xvar, yvar))
> 
> # Define a function to do a simple t test and return some values...
> 
> kindtest = function(varx, vary){+    res = t.test(varx, vary)
+    x.mn = res$estimate[1]
+    y.mn = res$estimate[2]
+    diff = y.mn-x.mn
+    pval = res$p.value
+    cat("Mean xvar =", x.mn, " Mean yvar =", y.mn)
+    cat(" diff =", diff, "  p-value=", pval,
"\n\n")
+    list(x.mn=x.mn, y.mn=y.mn, diff=diff, pval=pval)
+ }

## Results from by and tapply
> attach(sdat)
>   bres = by(xvar, num, kindtest, yvar)  Mean xvar = 19.8904  Mean yvar = 21.97729 diff = 2.086891   p-value0.06222805 
Mean xvar = 19.88329  Mean yvar = 21.97729 diff = 2.093996   p-value0.05245329 
>   tres = tapply(xvar, num, kindtest, yvar)Mean xvar = 19.8904  Mean yvar = 21.97729 diff = 2.086891   p-value0.06222805 
Mean xvar = 19.88329  Mean yvar = 21.97729 diff = 2.093996   p-value0.05245329 
> detach(sdat,1)
## Results from for
> for(i in 1:2) {+   subdat= subset(sdat, num==i)
+   kindtest(subdat$xvar, subdat$yvar)
+ }
Mean xvar = 19.8904  Mean yvar = 21.98615 diff = 2.095746   p-value0.07319223 
Mean xvar = 19.88329  Mean yvar = 21.96843 diff = 2.085141   p-value0.05850057 


OKAY - I'm going to brave and show you that I am still on version 1.9.0! I
asked the IT/IS department for an upgrade when version 2 was first released!
Last I heard my request was in the black hole of documented and undocumented
processes to approve software upgrades... So this error may not occur in the
latest version... If so, just let me know which of the above is correct (if
any) and I'll just live with it (or run it at home on version 2.1.1).
Thanks.
> version         _              
platform i386-pc-mingw32
arch     i386           
os       mingw32        
system   i386, mingw32  
status                  
major    1              
minor    9.0            
year     2004           
month    04             
day      12             
language R              
 
 
Thanks,
Saghir


--------------------------------------------------------- 
Legal Notice: This electronic mail and its attachments are i...{{dropped}}

Peter Dalgaard

2005-Jul-05 10:37 UTC

head link

[R] by (tapply) and for loop differences

"Bashir Saghir (Aztek Global)" <Saghir.Bashir at ucb-group.com>
writes:
> I am getting a difference in results when running some analysis using by
and
> tapply compare to using a for loop. I've tried searching the web but
had no
> luck with the keywords I used.
>  
> I've attached a simple example below to illustrates my problem. I get a
> difference in the mean of yvar, diff and the p-value using tapply & by
> compared to a for loop. I cannot see what I am doing wrong. Can anyone
help?
> 
> > # Simulate some data - I'll do 2 simulations...
> > 
> > xvar = rnorm(40, 20, 5)
> > yvar = rnorm(40, 22, 2)
> > num = factor(rep(1:2, each=20))
> > sdat = data.frame(cbind(num, xvar, yvar))
> > 
> > # Define a function to do a simple t test and return some values...
> > 
> > kindtest = function(varx, vary){
> +    res = t.test(varx, vary)
> +    x.mn = res$estimate[1]
> +    y.mn = res$estimate[2]
> +    diff = y.mn-x.mn
> +    pval = res$p.value
> +    cat("Mean xvar =", x.mn, " Mean yvar =", y.mn)
> +    cat(" diff =", diff, "  p-value=", pval,
"\n\n")
> +    list(x.mn=x.mn, y.mn=y.mn, diff=diff, pval=pval)
> + }
> 
> ## Results from by and tapply
> 
> > attach(sdat)
> >   bres = by(xvar, num, kindtest, yvar)  
> Mean xvar = 19.8904  Mean yvar = 21.97729 diff = 2.086891   p-value>
0.06222805
> Mean xvar = 19.88329  Mean yvar = 21.97729 diff = 2.093996   p-value>
0.05245329
> 
> >   tres = tapply(xvar, num, kindtest, yvar)
> Mean xvar = 19.8904  Mean yvar = 21.97729 diff = 2.086891   p-value>
0.06222805
> Mean xvar = 19.88329  Mean yvar = 21.97729 diff = 2.093996   p-value>
0.05245329
> 
> > detach(sdat,1)
> 
> ## Results from for
> 
> > for(i in 1:2) {
> +   subdat= subset(sdat, num==i)
> +   kindtest(subdat$xvar, subdat$yvar)
> + }
> Mean xvar = 19.8904  Mean yvar = 21.98615 diff = 2.095746   p-value>
0.07319223
> Mean xvar = 19.88329  Mean yvar = 21.96843 diff = 2.085141   p-value>
0.05850057
> 
The fact that the by/tapply approach is giving you the same Mean yvar
for both groups should be a dead giveaway....

Stick print(varx) and print(vary) into kindtest, and you'll see the
point. You are passing yvar *without* subsetting (and since the t.test
isn't paired, it can hardly be expected to complain that x and y
differ in length...).

This is probably closer to the mark:

  by(sdat, num, with, kindtest(xvar, yvar))

-- 
   O__  ---- Peter Dalgaard             ??ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Jul 2005 - by (tapply) and for loop differences

[R] by (tapply) and for loop differences

[R] by (tapply) and for loop differences

Apparently Analagous Threads