thr3ads.net - R help - [R] 'mean' and 'sd' calculations do not match [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Ulrich Leopold

2005-Dec-08 10:50 UTC

[R] 'mean' and 'sd' calculations do not match

Dear list,

I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC.

If I compute the aggregated mean and the standard deviation I get
standard deviation values for factors where the mean was not computed.
It seems to me that this is somehow related to the NA values. But I
don't quite understand what is going wrong?

Could it be related to the data import already? Some of the imported
data got the character strings NA and others <NA>. But they are defined
from the same values, -9999.  

I used the code below. Below the code are parts of the results.

Cheers, Ulrich

Data import:

chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE,
sep ",",na.strings = "-9999")

Count EC        NO3    NO2    NH4
3504  630.0000  33.00  0.001  0.01 
3505        NA  26.66   <NA>  <NA> 
3506        NA   0.72   <NA>  <NA> 
3507        NA     NA   <NA>  <NA> 
3508        NA     NA   <NA>  <NA> 
3509        NA     NA   <NA>  <NA> 
3510 1210.0000  14.00  0.001  0.01 
3511 1265.0000  12.00  0.001  0.01 
3512 1400.0000  14.00  0.001  0.01 
3513 1427.0000  12.00  0.001  0.01 
3514 1410.0000   7.00      0     0 
3515 1520.0000   8.00  0.001  0.01 
3516 1470.0000   7.60      0     0 
3517 1170.0000  10.00  0.001  0.01 
3518 4570.0000  20.00  0.001  0.45 
3519 8560.0000   0.50   0.14  0.31 
3520  708.0000  39.00  0.001  0.01 
3521  833.0000  40.00   0.01  0.01 
3522        NA     NA   <NA>  <NA> 

Computing the mean:

aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
FUN = mean)

Count   east    north   Mean
350    89885   103160  318.50000
351    55870   103510  400.00000
352    82570   104845  637.33333
353    79119   107433         NA
354    79160   107462  362.77778
355    83010   108990         NA
356    82810   109010         NA
357    69135   112992         NA
358    55490   120140  142.25000
359    56580   120600         NA
360    56582   120607         NA
361    58050   125350         NA
362    58059   125360         NA
363    60360   128191         NA
364    65448   128293  252.50000
365  65472.5 128308.1         NA
366    61412   131141         NA

Computing the standard deviation:

aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
FUN = sd, na.rm = TRUE)

Count  east    north     Stdev.
350    89885   103160    4.9497475
351    55870   103510           NA
352    82570   104845   19.6553640
353    79119   107433           NA
354    79160   107462   73.6745848
355    83010   108990           NA
356    82810   109010   15.6950098
357    69135   112992           NA
358    55490   120140    5.3150729
359    56580   120600           NA
360    56582   120607   22.4435801
361    58050   125350           NA
362    58059   125360   23.3108523
363    60360   128191   20.9789577
364    65448   128293   10.6066017
365  65472.5 128308.1           NA
366    61412   131141    8.6184556

Petr Pikal

2005-Dec-08 13:18 UTC

head link

[R] 'mean' and 'sd' calculations do not match

Hi

you see the differenc between factors and numbers.

columns with <NA> are factors
columns with NA ar numeric

you can see it by 

str(chemicS) which will reveal a structure of your data

So either change factors by
as.numric(as.character())

or read it with forcing columns to numeric

?read.table

HTH
Petr





On 8 Dec 2005 at 11:50, Ulrich Leopold wrote:

From:           	Ulrich Leopold <uleopold at science.uva.nl>
To:             	R-help <R-help at stat.math.ethz.ch>
Organization:   	University of Amsterdam
Date sent:      	Thu, 08 Dec 2005 11:50:25 +0100
Subject:        	[R] 'mean' and 'sd' calculations do not match
> Dear list,
> 
> I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC.
> 
> If I compute the aggregated mean and the standard deviation I get
> standard deviation values for factors where the mean was not computed.
> It seems to me that this is somehow related to the NA values. But I
> don't quite understand what is going wrong?
> 
> Could it be related to the data import already? Some of the imported
> data got the character strings NA and others <NA>. But they are
> defined from the same values, -9999.  
> 
> I used the code below. Below the code are parts of the results.
> 
> Cheers, Ulrich
> 
> Data import:
> 
> chemicS <- read.table("ChemieUlli_4_Quellen.csv", header =
TRUE, sep > ",",na.strings = "-9999")
> 
> Count EC        NO3    NO2    NH4
> 3504  630.0000  33.00  0.001  0.01 
> 3505        NA  26.66   <NA>  <NA> 
> 3506        NA   0.72   <NA>  <NA> 
> 3507        NA     NA   <NA>  <NA> 
> 3508        NA     NA   <NA>  <NA> 
> 3509        NA     NA   <NA>  <NA> 
> 3510 1210.0000  14.00  0.001  0.01 
> 3511 1265.0000  12.00  0.001  0.01 
> 3512 1400.0000  14.00  0.001  0.01 
> 3513 1427.0000  12.00  0.001  0.01 
> 3514 1410.0000   7.00      0     0 
> 3515 1520.0000   8.00  0.001  0.01 
> 3516 1470.0000   7.60      0     0 
> 3517 1170.0000  10.00  0.001  0.01 
> 3518 4570.0000  20.00  0.001  0.45 
> 3519 8560.0000   0.50   0.14  0.31 
> 3520  708.0000  39.00  0.001  0.01 
> 3521  833.0000  40.00   0.01  0.01 
> 3522        NA     NA   <NA>  <NA> 
> 
> Computing the mean:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = mean)
> 
> Count   east    north   Mean
> 350    89885   103160  318.50000
> 351    55870   103510  400.00000
> 352    82570   104845  637.33333
> 353    79119   107433         NA
> 354    79160   107462  362.77778
> 355    83010   108990         NA
> 356    82810   109010         NA
> 357    69135   112992         NA
> 358    55490   120140  142.25000
> 359    56580   120600         NA
> 360    56582   120607         NA
> 361    58050   125350         NA
> 362    58059   125360         NA
> 363    60360   128191         NA
> 364    65448   128293  252.50000
> 365  65472.5 128308.1         NA
> 366    61412   131141         NA
> 
> Computing the standard deviation:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = sd, na.rm = TRUE)
> 
> Count  east    north     Stdev.
> 350    89885   103160    4.9497475
> 351    55870   103510           NA
> 352    82570   104845   19.6553640
> 353    79119   107433           NA
> 354    79160   107462   73.6745848
> 355    83010   108990           NA
> 356    82810   109010   15.6950098
> 357    69135   112992           NA
> 358    55490   120140    5.3150729
> 359    56580   120600           NA
> 360    56582   120607   22.4435801
> 361    58050   125350           NA
> 362    58059   125360   23.3108523
> 363    60360   128191   20.9789577
> 364    65448   128293   10.6066017
> 365  65472.5 128308.1           NA
> 366    61412   131141    8.6184556
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz

Peter Dalgaard

2005-Dec-08 13:43 UTC

head link

[R] 'mean' and 'sd' calculations do not match

Ulrich Leopold <uleopold at science.uva.nl> writes:
> Dear list,
> 
> I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC.
> 
> If I compute the aggregated mean and the standard deviation I get
> standard deviation values for factors where the mean was not computed.
> It seems to me that this is somehow related to the NA values. But I
> don't quite understand what is going wrong?
You're using na.rm=TRUE on the sd calculation, but not on the means!
(The NA's generated for sd are likely groups with only one observation).
 > Could it be related to the data import already? Some of the imported
> data got the character strings NA and others <NA>. But they are
defined
> from the same values, -9999.  
No. It signifies a problem, but not this one. The <NA> is used for
factor and character columns. Most likely (can't think of any other
reason) some of your data are not numeric - "," instead of
"." and
similar typos will do that to you.
> I used the code below. Below the code are parts of the results.
> 
> Cheers, Ulrich
> 
> Data import:
> 
> chemicS <- read.table("ChemieUlli_4_Quellen.csv", header =
TRUE, sep > ",",na.strings = "-9999")
> 
> Count EC        NO3    NO2    NH4
> 3504  630.0000  33.00  0.001  0.01 
> 3505        NA  26.66   <NA>  <NA> 
> 3506        NA   0.72   <NA>  <NA> 
> 3507        NA     NA   <NA>  <NA> 
> 3508        NA     NA   <NA>  <NA> 
> 3509        NA     NA   <NA>  <NA> 
> 3510 1210.0000  14.00  0.001  0.01 
> 3511 1265.0000  12.00  0.001  0.01 
> 3512 1400.0000  14.00  0.001  0.01 
> 3513 1427.0000  12.00  0.001  0.01 
> 3514 1410.0000   7.00      0     0 
> 3515 1520.0000   8.00  0.001  0.01 
> 3516 1470.0000   7.60      0     0 
> 3517 1170.0000  10.00  0.001  0.01 
> 3518 4570.0000  20.00  0.001  0.45 
> 3519 8560.0000   0.50   0.14  0.31 
> 3520  708.0000  39.00  0.001  0.01 
> 3521  833.0000  40.00   0.01  0.01 
> 3522        NA     NA   <NA>  <NA> 
> 
> Computing the mean:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = mean)
> 
> Count   east    north   Mean
> 350    89885   103160  318.50000
> 351    55870   103510  400.00000
> 352    82570   104845  637.33333
> 353    79119   107433         NA
> 354    79160   107462  362.77778
> 355    83010   108990         NA
> 356    82810   109010         NA
> 357    69135   112992         NA
> 358    55490   120140  142.25000
> 359    56580   120600         NA
> 360    56582   120607         NA
> 361    58050   125350         NA
> 362    58059   125360         NA
> 363    60360   128191         NA
> 364    65448   128293  252.50000
> 365  65472.5 128308.1         NA
> 366    61412   131141         NA
> 
> Computing the standard deviation:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = sd, na.rm = TRUE)
> 
> Count  east    north     Stdev.
> 350    89885   103160    4.9497475
> 351    55870   103510           NA
> 352    82570   104845   19.6553640
> 353    79119   107433           NA
> 354    79160   107462   73.6745848
> 355    83010   108990           NA
> 356    82810   109010   15.6950098
> 357    69135   112992           NA
> 358    55490   120140    5.3150729
> 359    56580   120600           NA
> 360    56582   120607   22.4435801
> 361    58050   125350           NA
> 362    58059   125360   23.3108523
> 363    60360   128191   20.9789577
> 364    65448   128293   10.6066017
> 365  65472.5 128308.1           NA
> 366    61412   131141    8.6184556
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
-- 
   O__  ---- Peter Dalgaard             ??ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Apparently Analagous Threads

Search for more maybe matching threads

R help - Dec 2005 - 'mean' and 'sd' calculations do not match

[R] 'mean' and 'sd' calculations do not match

[R] 'mean' and 'sd' calculations do not match

[R] 'mean' and 'sd' calculations do not match

Apparently Analagous Threads