Hi, folks, test=matrix(rep(letters[1:3],6),nrow=6,byrow=T) test[2,]=letters[5:7] test[4,]=c('e','f','i') test[1,3]='i' colnames(test)=c('leave','arrive','line') The code will generate a sample dataset. I have thousands rows of data. For the same 'leave' and 'arrive', how can I get the sum of square of percentages of 'line' for each combination of 'leave' and 'arrive'? For the sample dataset, we have 5 combinations of 'leave' and 'arrive', namely, 'ab', 'ef' For 'ab', the line can be 'c' or 'i'. 1st row, it is line 'i' (1/3), and 5&6th row (2/3), it is line 'c'. then the sum of square of percentage for 'ab' is (1/3)^2+(2/3)^2=0.56 For 'ef', it is 0.5^2+0.5^2=0.5 I would like the final dataset to be like as follows: ab 0.56 ef 0.5 .... How to achieve it? Thanks [[alternative HTML version deleted]]
Bill.Venables at csiro.au
2010-Jun-17 01:45 UTC
[R] how to 'average' one col wrt to another one
Is this the kind of thing you want?> test <- matrix(rep(letters[1:3],6),nrow=6,byrow=T) > test[2,] <- letters[5:7] > test[4,] <- c('e','f','i') > test[1,3] <- 'i' > colnames(test) <- c('leave','arrive','line')> test <- data.frame(test) > tab <- with(test, table(paste(leave, arrive, sep=""), line)) > tab <- tab/rowsums(tab) > > rowsums(tab^2)ab ef 0.625 0.500>Your 'ab' calculation I think may not be quite correct. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of YI LIU Sent: Thursday, 17 June 2010 11:28 AM To: r-help at r-project.org Subject: [R] how to 'average' one col wrt to another one Hi, folks, test=matrix(rep(letters[1:3],6),nrow=6,byrow=T) test[2,]=letters[5:7] test[4,]=c('e','f','i') test[1,3]='i' colnames(test)=c('leave','arrive','line') The code will generate a sample dataset. I have thousands rows of data. For the same 'leave' and 'arrive', how can I get the sum of square of percentages of 'line' for each combination of 'leave' and 'arrive'? For the sample dataset, we have 5 combinations of 'leave' and 'arrive', namely, 'ab', 'ef' For 'ab', the line can be 'c' or 'i'. 1st row, it is line 'i' (1/3), and 5&6th row (2/3), it is line 'c'. then the sum of square of percentage for 'ab' is (1/3)^2+(2/3)^2=0.56 For 'ef', it is 0.5^2+0.5^2=0.5 I would like the final dataset to be like as follows: ab 0.56 ef 0.5 .... How to achieve it? Thanks [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Yi, Try this: res <- with(data.frame(test), prop.table(table(paste(leave, arrive, sep ""), line), 1)) rowSums(res^2) HTH, Jorge On Wed, Jun 16, 2010 at 9:28 PM, YI LIU <> wrote:> Hi, folks, > > test=matrix(rep(letters[1:3],6),nrow=6,byrow=T) > test[2,]=letters[5:7] > test[4,]=c('e','f','i') > test[1,3]='i' > colnames(test)=c('leave','arrive','line') > > The code will generate a sample dataset. I have thousands rows of data. > > For the same 'leave' and 'arrive', how can I get the sum of square of > percentages of 'line' for each combination of 'leave' and 'arrive'? > > For the sample dataset, we have 5 combinations of 'leave' and 'arrive', > namely, 'ab', 'ef' > > For 'ab', the line can be 'c' or 'i'. 1st row, it is line 'i' (1/3), and > 5&6th row (2/3), it is line 'c'. then the sum of square of percentage for > 'ab' is (1/3)^2+(2/3)^2=0.56 > > For 'ef', it is 0.5^2+0.5^2=0.5 > > I would like the final dataset to be like as follows: > > ab 0.56 > ef 0.5 > .... > > How to achieve it? > > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Reasonably Related Threads
- count data with a specific range
- How to predict the mean and variance of the dependent variable after regression
- Verify the linear regression model used in R ( fundamental theory)
- Variance of the prediction in the linear regression model (Theory and programming)
- Delete rows in the data frame by limiting values in two columns