thr3ads.net - R help - [R] Unexpected behaviour of plyr::ddply [Sep 2014]

If this information is useful, please help other people find it:
Share via:

walmes .

2014-Sep-17 03:36 UTC

[R] Unexpected behaviour of plyr::ddply

Hello R users,

I'm writing a brief tutorial of getting statistical measures by splitting
according strata and over columns. When I used plyr::ddply I got and
unexpected result, with NA/NaN for non existing cells. Below is a minimal
reproducible code with the result that I got. For comparison, the result of
aggregate is showed. Why this behaviour? What I can do to avoid it?
> require(plyr)
>
> hab <-+    
read.table("http://www.leg.ufpr.br/~walmes/data/ipea_habitacao.csv",
+                header=TRUE, sep=",", stringsAsFactors=FALSE,
quote="",
+                encoding="utf-8")>
> hab <- hab[,-ncol(hab)]
> names(hab) <- c("sig", "cod", "mun",
"agua", "ener", "tel", "carro",
+                 "comp", "tot")> hab <- transform(hab, sig=factor(sig))
> hab$siz <- cut(hab$tot, breaks=c(-Inf, 5000, Inf),
+                labels=c("P","G"))> str(hab, ve.len=1)'data.frame':    5596 obs. of  10 variables:
 $ sig  : Factor w/ 27 levels "AC","AL","AM",..: 1
1 1 1 1 1 1 1 1 1 ...
 $ cod  : int  1200013 1200054 1200104 1200138 1200179 1200203 1200252
1200302 1200328 1200336 ...
 $ mun  : chr  "Acrel?ndia" "Assis Brasil"
"Brasil?ia" "Bujari" ...
 $ agua : num  21.5 27.4 26.9 17.3 13.1 ...
 $ ener : num  56.2 65.3 55.9 43.9 35.9 ...
 $ tel  : num  8.85 26.71 22.73 12.28 9.19 ...
 $ carro: num  9.3 6.03 7.47 6.49 5.73 ...
 $ comp : num  0.947 1.637 1.857 0.127 0.088 ...
 $ tot  : int  1878 807 4114 1365 1267 14368 2807 5268 740 2308 ...
 $ siz  : Factor w/ 2 levels "P","G": 1 1 1 1 1 2 1 2 1 1
...>
> xtabs(~siz+sig, hab)   sig
siz  AC  AL  AM  AP  BA  CE  DF  ES  GO  MA  MG  MS  MT  PA  PB  PE  PI
PR  RJ
  P  17  72  47  13 266 103   0  43 187 152 671  52 100  80 195  97 199
310  27
  G   5  29  15   3 149  81   1  34  55  65 182  25  26  63  28  88  22
89  64
   sig
siz  RN  RO  RR  RS  SC  SE  SP  TO
  P 147  34  14 355 236  56 396 129
  G  19  18   1 112  57  19 249  10>
> x <- ddply(hab, ~sig+siz,+            colwise(.fun=mean,
+                    .cols=~agua+ener+tel+carro+comp,
na.rm=TRUE))> head(x)  sig  siz     agua     ener       tel    carro      comp
1  AC    P 19.30229 51.30535 12.857118 5.395824 0.7028235
2  AC    G 28.39300 65.95740 26.322800 8.942000 2.3806000
3  AL    P 42.14337 81.20935  4.801500 7.069958 0.5332639
4  AL    G 54.03966 87.47834  9.771724 9.428586 1.3583793
5  *AL <NA>      NaN      NaN       NaN      NaN       NaN*
6  AM    P 25.66202 61.12709  5.749596 1.980362
0.7629362>
> x <- ddply(hab, ~sig+siz,+            colwise(.fun=sum,
+                    .cols=~agua+ener+tel+carro+comp,
na.rm=TRUE))> head(x)  sig  siz     agua     ener     tel   carro   comp
1  AC    P  328.139  872.191 218.571  91.729 11.948
2  AC    G  141.965  329.787 131.614  44.710 11.903
3  AL    P 3034.323 5847.073 345.708 509.037 38.395
4  AL    G 1567.150 2536.872 283.380 273.429 39.393
5  *AL <NA>    0.000    0.000   0.000   0.000  0.000*
6  AM    P 1206.115 2872.973 270.231  93.077 35.858>
> y <- aggregate(as.matrix(hab[,4:8])~sig+siz, data=hab, FUN=mean,
+                na.rm=TRUE)> head(y)  sig siz     agua     ener       tel     carro      comp
1  AC   P 19.30229 51.30535 12.857118  5.395824 0.7028235
2  AL   P 42.14337 81.20935  4.801500  7.069958 0.5332639
3  AM   P 25.66202 61.12709  5.749596  1.980362 0.7629362
4  AP   P 37.20362 82.04515 14.929154  5.775923 0.7922308
5  BA   P 41.23200 66.45516  4.524045 10.387203 0.8404624
6  CE   P 36.78599 78.20176  6.339990  7.768981
0.6446990>
> sessionInfo()R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] plyr_1.8.1

loaded via a namespace (and not attached):
[1] compiler_3.1.1 Rcpp_0.11.2    tools_3.1.1>
Thanks in advance.
Walmes.

=========================================================================Walmes
Marques Zeviani
LEG (Laborat?rio de Estat?stica e Geoinforma??o, 25.450418 S, 49.231759 W)
Departamento de Estat?stica - Universidade Federal do Paran?
fone: (+55) 41 3361 3573
skype: walmeszeviani
homepage: http://www.leg.ufpr.br/~walmes
linux user number: 531218
=========================================================================
	[[alternative HTML version deleted]]

Patrick Connolly

2014-Sep-29 07:24 UTC

head link

[R] Unexpected behaviour of plyr::ddply

On Wed, 17-Sep-2014 at 12:36AM -0300, walmes . wrote:

|> Hello R users,
|> 
|> I'm writing a brief tutorial of getting statistical measures by
splitting
|> according strata and over columns. When I used plyr::ddply I got and
|> unexpected result, with NA/NaN for non existing cells. Below is a minimal
|> reproducible code with the result that I got. For comparison, the result
of
|> aggregate is showed. Why this behaviour? What I can do to avoid it?
|> 
|> > require(plyr)
|> >
|> > hab <-
|> +    
read.table("http://www.leg.ufpr.br/~walmes/data/ipea_habitacao.csv",
|> +                header=TRUE, sep=",", stringsAsFactors=FALSE,
quote="",
|> +                encoding="utf-8")
|> >
|> > hab <- hab[,-ncol(hab)]
|> > names(hab) <- c("sig", "cod", "mun",
"agua", "ener", "tel", "carro",
|> +                 "comp", "tot")
|> > hab <- transform(hab, sig=factor(sig))
|> > hab$siz <- cut(hab$tot, breaks=c(-Inf, 5000, Inf),
|> +                labels=c("P","G"))


However:> summary(hab$tot)   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    227    1328    2640    8264    5440 3039000      89 

Those NAs interfere with the cut() statement.

The simplest work around is
> hab <- na.omit(hab)
> Then ddply will play nicely.

HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

R help - Sep 2014 - Unexpected behaviour of plyr::ddply

[R] Unexpected behaviour of plyr::ddply

[R] Unexpected behaviour of plyr::ddply