emorway
2010-Jun-25 23:08 UTC
[R] Average 2 Columns when possible, or return available value
Forum, Using the following data: DF<-read.table(textConnection("A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA"),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column "A" if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com.
Phil Spector
2010-Jun-25 23:15 UTC
[R] Average 2 Columns when possible, or return available value
Eric - What you're describing is taking the mean of each row while ignoring missing values:> apply(DF,1,mean,na.rm=TRUE)[1] 22.60 NaN NaN NaN NaN NaN NaN NaN 102.00 19.20 [11] 19.20 NaN NaN NaN 11.80 7.62 NaN NaN NaN NaN [21] NaN 75.00 NaN 18.00 NaN 12.90 If this isn't suitable for your larger problem, please describe that problem in greater detail. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 25 Jun 2010, emorway wrote:> > Forum, > > Using the following data: > > DF<-read.table(textConnection("A B > 22.60 NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > 102.00 NA > 19.20 NA > 19.20 NA > NA NA > NA NA > NA NA > 11.80 NA > 7.62 NA > NA NA > NA NA > NA NA > NA NA > NA NA > 75.00 NA > NA NA > 18.30 18.2 > NA NA > NA NA > 8.44 NA > 18.00 NA > NA NA > 12.90 NA"),header=T) > closeAllConnections() > > The second column is a duplicate reading of the first column, and when two > values are available, I would like to average column 1 and 2 (example code > below). But if there is only one reading, I would like to retain it, but I > haven't found a good way to exclude NA's using the following code: > > t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) > > Currently, row 24 is the only row with a returned value. I'd like the > result to return column "A" if it is the only available value, and average > where possible. Of course, if both columns are NA, NA is the only possible > result. > > The result I'm after would look like this (row 24 is an avg): > > 22.60 > NA > NA > NA > NA > NA > NA > NA > 102.00 > 19.20 > 19.20 > NA > NA > NA > 11.80 > 7.62 > NA > NA > NA > NA > NA > 75.00 > NA > 18.25 > NA > NA > 8.44 > 18.00 > NA > 12.90 > > This is a small example from a much larger data frame, so if you're > wondering what the deal is with list(), that will come into play for the > larger problem I'm trying to solve. > > Respectfully, > Eric > -- > View this message in context: r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Joshua Wiley
2010-Jun-25 23:17 UTC
[R] Average 2 Columns when possible, or return available value
Hello Eric, I am not sure how your need to use list() will fit in with this, but for your sample data, this will do the trick. matrix(rowMeans(DF, na.rm=TRUE), ncol=1) HTH, Josh On Fri, Jun 25, 2010 at 4:08 PM, emorway <emorway at engr.colostate.edu> wrote:> > Forum, > > Using the following data: > > DF<-read.table(textConnection("A B > 22.60 NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > 102.00 NA > ?19.20 NA > ?19.20 NA > ?NA NA > ?NA NA > ?NA NA > ?11.80 NA > ?7.62 NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?75.00 NA > ?NA NA > ?18.30 18.2 > ?NA NA > ?NA NA > ?8.44 NA > ?18.00 NA > ?NA NA > ?12.90 NA"),header=T) > closeAllConnections() > > The second column is a duplicate reading of the first column, and when two > values are available, I would like to average column 1 and 2 (example code > below). ?But if there is only one reading, I would like to retain it, but I > haven't found a good way to exclude NA's using the following code: > > t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) > > Currently, row 24 is the only row with a returned value. ?I'd like the > result to return column "A" if it is the only available value, and average > where possible. ?Of course, if both columns are NA, NA is the only possible > result. > > The result I'm after would look like this (row 24 is an avg): > > ?22.60 > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > 102.00 > ?19.20 > ?19.20 > ? ?NA > ? ?NA > ? ?NA > ?11.80 > ?7.62 > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ?75.00 > ? ?NA > ?18.25 > ? ?NA > ? ?NA > ?8.44 > ?18.00 > ? ?NA > ?12.90 > > This is a small example from a much larger data frame, so if you're > wondering what the deal is with list(), that will come into play for the > larger problem I'm trying to solve. > > Respectfully, > Eric > -- > View this message in context: r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles joshuawiley.com
Joshua Wiley
2010-Jun-25 23:21 UTC
[R] Average 2 Columns when possible, or return available value
btw, if you just wanted your exact code to work: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, na.rm=TRUE)[,-1])) You will get NaNs rather than NAs where you are missing from both rows, but that should not be a real issue. <snip> -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles joshuawiley.com
Joris Meys
2010-Jun-26 00:24 UTC
[R] Average 2 Columns when possible, or return available value
Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself. DF <- data.frame( A=rep(DF$A,10000), B=rep(DF$B,10000) )> system.time(apply(DF,1,mean,na.rm=TRUE))user system elapsed 13.26 0.06 13.46> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))user system elapsed 0.03 0.00 0.03> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,+ na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape> DF <- DF[complete.cases(DF),]> system.time(apply(DF,1,mean,na.rm=TRUE))user system elapsed 0.39 0.00 0.39> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))user system elapsed 0.01 0.00 0.02> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,+ na.rm=TRUE)[,-1])) + ) user system elapsed 10.01 0.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway <emorway at engr.colostate.edu> wrote:> > Forum, > > Using the following data: > > DF<-read.table(textConnection("A B > 22.60 NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > 102.00 NA > ?19.20 NA > ?19.20 NA > ?NA NA > ?NA NA > ?NA NA > ?11.80 NA > ?7.62 NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?NA NA > ?75.00 NA > ?NA NA > ?18.30 18.2 > ?NA NA > ?NA NA > ?8.44 NA > ?18.00 NA > ?NA NA > ?12.90 NA"),header=T) > closeAllConnections() > > The second column is a duplicate reading of the first column, and when two > values are available, I would like to average column 1 and 2 (example code > below). ?But if there is only one reading, I would like to retain it, but I > haven't found a good way to exclude NA's using the following code: > > t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) > > Currently, row 24 is the only row with a returned value. ?I'd like the > result to return column "A" if it is the only available value, and average > where possible. ?Of course, if both columns are NA, NA is the only possible > result. > > The result I'm after would look like this (row 24 is an avg): > > ?22.60 > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > 102.00 > ?19.20 > ?19.20 > ? ?NA > ? ?NA > ? ?NA > ?11.80 > ?7.62 > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ? ?NA > ?75.00 > ? ?NA > ?18.25 > ? ?NA > ? ?NA > ?8.44 > ?18.00 > ? ?NA > ?12.90 > > This is a small example from a much larger data frame, so if you're > wondering what the deal is with list(), that will come into play for the > larger problem I'm trying to solve. > > Respectfully, > Eric > -- > View this message in context: r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : helpdesk.ugent.be/e-maildisclaimer.php