Hvidberg, Martin
2008-Jun-10 13:05 UTC
[R] How to join data.frames and vectors of different length, in an inteligent way?
I have a data set something like this: "YYYY", "Value" 1972 , 117 1984 , 73 1969 , 92 1976 , 113 1999 , 80 1996 , 78 1976 , 98 1984 , 106 1976 , 99 it could be created with:> dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99)))The real dataset is of cause much larger, app. 100.000 samples I need to adjust each value to remove any tendency of some years generally having higher values and others lower, since this is an unwanted artifact from different measuring traditions. My plan is to generate an average for each year Ay, as well as a global average Ag. Then each value should be multiplied by Ay/Ag. I can make the averages like this:> Ag <- mean(dafSamp[,2])> Ag[1] 95.11111> Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean')> AyGroup.1 x 1 1969 92.0000 2 1972 117.0000 3 1976 103.3333 4 1984 89.5000 5 1996 78.0000 6 1999 80.0000 To see how many samples from each year I could write:> Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length')> CyGroup.1 x 1 1969 1 2 1972 1 3 1976 3 4 1984 2 5 1996 1 6 1999 1 I would like to create a new vector with the adjusted values (dafSmap[,2] * Ay(for a relevant year) / Ag) I tried to write: vecAA <- dafSamp[,2] * Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag but the result is all NAs :-( Might have seen that coming, Not the same length... Question: How do I go about making such calculation? :-) Martin Hvidberg Here is the code in full, if you want to try it... dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99))) Ag <- mean(dafSamp[,2]) Ag Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean') Ay Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length') Cy vecAA <- dafSamp[,2] * Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag University of Aarhus <http://www.au.dk/en> Danmarks Miljøundersøgelser <http://www.dmu.dk/> Hvidberg, Martin <http://www2.dmu.dk/1_Om_DMU/2_medarbejdere/cv/employee2_NH.asp?PersonID=MHV> Senior Geographer (Climatology, Spatial modeling) <http://www.geogr.ku.dk/> N 55°41m43.48s E 12°06m05.13s ETRS89 National Environmental Research Inst. <http://www.dmu.dk/International/> P.O. Box 358 Frederiksborgvej 399 DK-4000 Roskilde Martin.Hvidberg@dmu.dk www.dmu.dk/AtmosphericEnvironment/ tel: fax: +45 46 30 11 55 +45 46 30 12 14 [[alternative HTML version deleted]]
Chuck Cleland
2008-Jun-10 14:24 UTC
[R] How to join data.frames and vectors of different length, in an inteligent way?
You could put the group averages back into dafSamp using ave(): dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976), c(117,73,92,113,80,78,98,106,99))) dafSamp$Ay <- ave(dafSamp$X2, dafSamp$X1, FUN=mean) dafSamp$vecAA <- dafSamp$X2 * (dafSamp$Ay / mean(dafSamp$X2)) dafSamp X1 X2 Ay vecAA 1 1972 117 117.0000 143.92640 2 1984 73 89.5000 68.69334 3 1969 92 92.0000 88.99065 4 1976 113 103.3333 122.76869 5 1999 80 80.0000 67.28972 6 1996 78 78.0000 63.96729 7 1976 98 103.3333 106.47196 8 1984 106 89.5000 99.74650 9 1976 99 103.3333 107.55841 ?ave On 6/10/2008 9:05 AM, Hvidberg, Martin wrote:> I have a data set something like this: > > > > "YYYY", "Value" > > 1972 , 117 > > 1984 , 73 > > 1969 , 92 > > 1976 , 113 > > 1999 , 80 > > 1996 , 78 > > 1976 , 98 > > 1984 , 106 > > 1976 , 99 > > > > it could be created with: > >> dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99))) > > > > The real dataset is of cause much larger, app. 100.000 samples > > > > I need to adjust each value to remove any tendency of some years generally having higher values and others lower, since this is an unwanted artifact from different measuring traditions. > > My plan is to generate an average for each year Ay, as well as a global average Ag. Then each value should be multiplied by Ay/Ag. > > > > > > I can make the averages like this: > > > >> Ag <- mean(dafSamp[,2]) > >> Ag > > [1] 95.11111 > > > >> Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean') > >> Ay > > Group.1 x > > 1 1969 92.0000 > > 2 1972 117.0000 > > 3 1976 103.3333 > > 4 1984 89.5000 > > 5 1996 78.0000 > > 6 1999 80.0000 > > > > > > To see how many samples from each year I could write: > > > >> Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length') > >> Cy > > Group.1 x > > 1 1969 1 > > 2 1972 1 > > 3 1976 3 > > 4 1984 2 > > 5 1996 1 > > 6 1999 1 > > > > > > I would like to create a new vector with the adjusted values (dafSmap[,2] * Ay(for a relevant year) / Ag) > > > > I tried to write: > > > > vecAA <- dafSamp[,2] * Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag > > > > but the result is all NAs :-( Might have seen that coming, Not the same length... > > > > Question: How do I go about making such calculation? > > > > :-) Martin Hvidberg > > > > Here is the code in full, if you want to try it... > > > > dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99))) > > Ag <- mean(dafSamp[,2]) > > Ag > > Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean') > > Ay > > Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length') > > Cy > > vecAA <- dafSamp[,2] * Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag > > > > > > > > University of Aarhus <http://www.au.dk/en> Danmarks Milj?unders?gelser <http://www.dmu.dk/> > > Hvidberg, Martin <http://www2.dmu.dk/1_Om_DMU/2_medarbejdere/cv/employee2_NH.asp?PersonID=MHV> > Senior Geographer (Climatology, Spatial modeling) <http://www.geogr.ku.dk/> > N 55?41m43.48s E 12?06m05.13s ETRS89 > National Environmental Research Inst. <http://www.dmu.dk/International/> > P.O. Box 358 > Frederiksborgvej 399 > DK-4000 Roskilde > Martin.Hvidberg at dmu.dk > www.dmu.dk/AtmosphericEnvironment/ tel: > fax: +45 46 30 11 55 > +45 46 30 12 14 > > [[alternative HTML version deleted]] > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Hvidberg, Martin
2008-Jun-11 06:52 UTC
[R] How to join data.frames and vectors of different length, in an inteligent way?
Thanks Chuck With your help I managed to write the code as I wanted it. The result looks like this: dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,7 3,92,113,80,78,98,106,99))) dafSamp$Ay <- ave(dafSamp$X2, dafSamp$X1, FUN=mean) dafSamp$AA <- dafSamp$X2 * (mean(dafSamp$X2)/dafSamp$Ay) dafSamp$My <- ave(dafSamp$X2, dafSamp$X1, FUN=median) dafSamp$MA <- dafSamp$X2 * (median(dafSamp$X2)/dafSamp$My) par(mfrow=c(1,2)) boxplot(AA~X1, data=dafSamp, main="Mean mode") boxplot(MA~X1, data=dafSamp, main="Median mode") It works like a dream.Thanks for you time Martin