Dear list, I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC. If I compute the aggregated mean and the standard deviation I get standard deviation values for factors where the mean was not computed. It seems to me that this is somehow related to the NA values. But I don't quite understand what is going wrong? Could it be related to the data import already? Some of the imported data got the character strings NA and others <NA>. But they are defined from the same values, -9999. I used the code below. Below the code are parts of the results. Cheers, Ulrich Data import: chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE, sep ",",na.strings = "-9999") Count EC NO3 NO2 NH4 3504 630.0000 33.00 0.001 0.01 3505 NA 26.66 <NA> <NA> 3506 NA 0.72 <NA> <NA> 3507 NA NA <NA> <NA> 3508 NA NA <NA> <NA> 3509 NA NA <NA> <NA> 3510 1210.0000 14.00 0.001 0.01 3511 1265.0000 12.00 0.001 0.01 3512 1400.0000 14.00 0.001 0.01 3513 1427.0000 12.00 0.001 0.01 3514 1410.0000 7.00 0 0 3515 1520.0000 8.00 0.001 0.01 3516 1470.0000 7.60 0 0 3517 1170.0000 10.00 0.001 0.01 3518 4570.0000 20.00 0.001 0.45 3519 8560.0000 0.50 0.14 0.31 3520 708.0000 39.00 0.001 0.01 3521 833.0000 40.00 0.01 0.01 3522 NA NA <NA> <NA> Computing the mean: aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), FUN = mean) Count east north Mean 350 89885 103160 318.50000 351 55870 103510 400.00000 352 82570 104845 637.33333 353 79119 107433 NA 354 79160 107462 362.77778 355 83010 108990 NA 356 82810 109010 NA 357 69135 112992 NA 358 55490 120140 142.25000 359 56580 120600 NA 360 56582 120607 NA 361 58050 125350 NA 362 58059 125360 NA 363 60360 128191 NA 364 65448 128293 252.50000 365 65472.5 128308.1 NA 366 61412 131141 NA Computing the standard deviation: aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), FUN = sd, na.rm = TRUE) Count east north Stdev. 350 89885 103160 4.9497475 351 55870 103510 NA 352 82570 104845 19.6553640 353 79119 107433 NA 354 79160 107462 73.6745848 355 83010 108990 NA 356 82810 109010 15.6950098 357 69135 112992 NA 358 55490 120140 5.3150729 359 56580 120600 NA 360 56582 120607 22.4435801 361 58050 125350 NA 362 58059 125360 23.3108523 363 60360 128191 20.9789577 364 65448 128293 10.6066017 365 65472.5 128308.1 NA 366 61412 131141 8.6184556
Hi you see the differenc between factors and numbers. columns with <NA> are factors columns with NA ar numeric you can see it by str(chemicS) which will reveal a structure of your data So either change factors by as.numric(as.character()) or read it with forcing columns to numeric ?read.table HTH Petr On 8 Dec 2005 at 11:50, Ulrich Leopold wrote: From: Ulrich Leopold <uleopold at science.uva.nl> To: R-help <R-help at stat.math.ethz.ch> Organization: University of Amsterdam Date sent: Thu, 08 Dec 2005 11:50:25 +0100 Subject: [R] 'mean' and 'sd' calculations do not match> Dear list, > > I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC. > > If I compute the aggregated mean and the standard deviation I get > standard deviation values for factors where the mean was not computed. > It seems to me that this is somehow related to the NA values. But I > don't quite understand what is going wrong? > > Could it be related to the data import already? Some of the imported > data got the character strings NA and others <NA>. But they are > defined from the same values, -9999. > > I used the code below. Below the code are parts of the results. > > Cheers, Ulrich > > Data import: > > chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE, sep > ",",na.strings = "-9999") > > Count EC NO3 NO2 NH4 > 3504 630.0000 33.00 0.001 0.01 > 3505 NA 26.66 <NA> <NA> > 3506 NA 0.72 <NA> <NA> > 3507 NA NA <NA> <NA> > 3508 NA NA <NA> <NA> > 3509 NA NA <NA> <NA> > 3510 1210.0000 14.00 0.001 0.01 > 3511 1265.0000 12.00 0.001 0.01 > 3512 1400.0000 14.00 0.001 0.01 > 3513 1427.0000 12.00 0.001 0.01 > 3514 1410.0000 7.00 0 0 > 3515 1520.0000 8.00 0.001 0.01 > 3516 1470.0000 7.60 0 0 > 3517 1170.0000 10.00 0.001 0.01 > 3518 4570.0000 20.00 0.001 0.45 > 3519 8560.0000 0.50 0.14 0.31 > 3520 708.0000 39.00 0.001 0.01 > 3521 833.0000 40.00 0.01 0.01 > 3522 NA NA <NA> <NA> > > Computing the mean: > > aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), > FUN = mean) > > Count east north Mean > 350 89885 103160 318.50000 > 351 55870 103510 400.00000 > 352 82570 104845 637.33333 > 353 79119 107433 NA > 354 79160 107462 362.77778 > 355 83010 108990 NA > 356 82810 109010 NA > 357 69135 112992 NA > 358 55490 120140 142.25000 > 359 56580 120600 NA > 360 56582 120607 NA > 361 58050 125350 NA > 362 58059 125360 NA > 363 60360 128191 NA > 364 65448 128293 252.50000 > 365 65472.5 128308.1 NA > 366 61412 131141 NA > > Computing the standard deviation: > > aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), > FUN = sd, na.rm = TRUE) > > Count east north Stdev. > 350 89885 103160 4.9497475 > 351 55870 103510 NA > 352 82570 104845 19.6553640 > 353 79119 107433 NA > 354 79160 107462 73.6745848 > 355 83010 108990 NA > 356 82810 109010 15.6950098 > 357 69135 112992 NA > 358 55490 120140 5.3150729 > 359 56580 120600 NA > 360 56582 120607 22.4435801 > 361 58050 125350 NA > 362 58059 125360 23.3108523 > 363 60360 128191 20.9789577 > 364 65448 128293 10.6066017 > 365 65472.5 128308.1 NA > 366 61412 131141 8.6184556 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz
Ulrich Leopold <uleopold at science.uva.nl> writes:> Dear list, > > I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC. > > If I compute the aggregated mean and the standard deviation I get > standard deviation values for factors where the mean was not computed. > It seems to me that this is somehow related to the NA values. But I > don't quite understand what is going wrong?You're using na.rm=TRUE on the sd calculation, but not on the means! (The NA's generated for sd are likely groups with only one observation).> Could it be related to the data import already? Some of the imported > data got the character strings NA and others <NA>. But they are defined > from the same values, -9999.No. It signifies a problem, but not this one. The <NA> is used for factor and character columns. Most likely (can't think of any other reason) some of your data are not numeric - "," instead of "." and similar typos will do that to you.> I used the code below. Below the code are parts of the results. > > Cheers, Ulrich > > Data import: > > chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE, sep > ",",na.strings = "-9999") > > Count EC NO3 NO2 NH4 > 3504 630.0000 33.00 0.001 0.01 > 3505 NA 26.66 <NA> <NA> > 3506 NA 0.72 <NA> <NA> > 3507 NA NA <NA> <NA> > 3508 NA NA <NA> <NA> > 3509 NA NA <NA> <NA> > 3510 1210.0000 14.00 0.001 0.01 > 3511 1265.0000 12.00 0.001 0.01 > 3512 1400.0000 14.00 0.001 0.01 > 3513 1427.0000 12.00 0.001 0.01 > 3514 1410.0000 7.00 0 0 > 3515 1520.0000 8.00 0.001 0.01 > 3516 1470.0000 7.60 0 0 > 3517 1170.0000 10.00 0.001 0.01 > 3518 4570.0000 20.00 0.001 0.45 > 3519 8560.0000 0.50 0.14 0.31 > 3520 708.0000 39.00 0.001 0.01 > 3521 833.0000 40.00 0.01 0.01 > 3522 NA NA <NA> <NA> > > Computing the mean: > > aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), > FUN = mean) > > Count east north Mean > 350 89885 103160 318.50000 > 351 55870 103510 400.00000 > 352 82570 104845 637.33333 > 353 79119 107433 NA > 354 79160 107462 362.77778 > 355 83010 108990 NA > 356 82810 109010 NA > 357 69135 112992 NA > 358 55490 120140 142.25000 > 359 56580 120600 NA > 360 56582 120607 NA > 361 58050 125350 NA > 362 58059 125360 NA > 363 60360 128191 NA > 364 65448 128293 252.50000 > 365 65472.5 128308.1 NA > 366 61412 131141 NA > > Computing the standard deviation: > > aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD), > FUN = sd, na.rm = TRUE) > > Count east north Stdev. > 350 89885 103160 4.9497475 > 351 55870 103510 NA > 352 82570 104845 19.6553640 > 353 79119 107433 NA > 354 79160 107462 73.6745848 > 355 83010 108990 NA > 356 82810 109010 15.6950098 > 357 69135 112992 NA > 358 55490 120140 5.3150729 > 359 56580 120600 NA > 360 56582 120607 22.4435801 > 361 58050 125350 NA > 362 58059 125360 23.3108523 > 363 60360 128191 20.9789577 > 364 65448 128293 10.6066017 > 365 65472.5 128308.1 NA > 366 61412 131141 8.6184556 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907