Sorkin, John
2024-Jun-08 17:37 UTC
[R] Can't compute row means of two columns of a dataframe.
I have a data frame with three columns, TotalInches, Low20, High20. For each row of the dataset, I am trying to compute the mean of Low20 and High20. xxxz <- structure(list(TotalInches = c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76), Low20 = c(84, 87, 90, 93, 96, 99, 102, 106, 109, 112, 116, 119, 122, 126, 129, 133, 137, 141, 144), High20 = c(111, 115, 119, 123, 127, 131, 135, 140, 144, 148, 153, 157, 162, 167, 171, 176, 181, 186, 191 )), class = "data.frame", row.names = c(NA, -19L)) xxxz str(xxxz) xxxz$Average20 <- by(xxxz[,c("Low20","High20")],xxxz[,"TotalInches"],mean) warnings() When I run the code above, I don't get the means by row. I get the following warning messages, one for each row of the dataframe. Warning messages: 1: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA 2: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA Can someone tell my what I am doing wrong, and how I can compute the row means? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382
Bert Gunter
2024-Jun-08 17:47 UTC
[R] Can't compute row means of two columns of a dataframe.
Use apply(), not by(). xxxz$av20 <- apply(xxxz[,c("Low20","High20")],1, mean) -- Bert On Sat, Jun 8, 2024 at 10:38?AM Sorkin, John <jsorkin at som.umaryland.edu> wrote:> I have a data frame with three columns, TotalInches, Low20, High20. For > each row of the dataset, I am trying to compute the mean of Low20 and > High20. > > xxxz <- structure(list(TotalInches > c(58, 59, 60, 61, 62, 63, 64, 65, > 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76), Low20 > c(84, 87, > 90, 93, 96, 99, 102, 106, 109, 112, 116, 119, 122, 126, > 129, > 133, 137, 141, 144), High20 = c(111, 115, 119, 123, > 127, 131, > 135, 140, 144, 148, 153, 157, 162, 167, 171, 176, 181, > 186, 191 > )), class = "data.frame", row.names = c(NA, -19L)) > xxxz > str(xxxz) > xxxz$Average20 <- by(xxxz[,c("Low20","High20")],xxxz[,"TotalInches"],mean) > warnings() > > When I run the code above, I don't get the means by row. I get the > following warning messages, one for each row of the dataframe. > > Warning messages: > 1: In mean.default(data[x, , drop = FALSE], ...) : > argument is not numeric or logical: returning NA > 2: In mean.default(data[x, , drop = FALSE], ...) : > argument is not numeric or logical: returning NA > > Can someone tell my what I am doing wrong, and how I can compute the row > means? > > Thank you, > John > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical > Center Geriatrics Research, Education, and Clinical Center; > PI Biostatistics and Informatics Core, University of Maryland School of > Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
@vi@e@gross m@iii@g oii gm@ii@com
2024-Jun-08 18:15 UTC
[R] Can't compute row means of two columns of a dataframe.
John, Maybe you can clarify what you want the output to look like. It took me a while to realize what you may want as it is NOT properly described as wanting rowsums. There is a standard function called rowMeans() that probably does what you want if you want the mean of all rows as in:> rowMeans(xxxz)[1] 84.33333 87.00000 89.66667 92.33333 95.00000 97.66667 100.33333 103.66667 106.33333 109.00000 112.33333 115.00000 [13] 118.00000 121.33333 124.00000 127.33333 130.66667 134.00000 137.00000 It does not add the means to the original data.frame if you wanted it there but that is easy enough to do.> xxxz$Average20 <-rowMeans(xxxz) > head(xxxz)TotalInches Low20 High20 Average20 1 58 84 111 84.33333 2 59 87 115 87.00000 3 60 90 119 89.66667 4 61 93 123 92.33333 5 62 96 127 95.00000 6 63 99 131 97.66667 Your construct is more complex and it looks like you want to do this to a subset of two columns. Again, straightforward: xxxz$Average20 <-rowMeans(xxxz[, c("Low20", "High20")]) And I probably would do this using a dplyr mutate but that is outside the scope. This does not help explain your error, so let me look at what you are trying to do. What did you expect to use by() for in the second argument? You seem to be giving it INDICES of the first column entries. What is that for? by(xxxz[,c("Low20","High20")], xxxz[,"TotalInches"], mean) The documentation suggest this is for splitting by factors. I do not see there are multiple instances of some TotalInches so why is this needed for some kind of grouping? My guess is you are using the wrong function or the wrong way for your needs. The warnings may relate to that. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John Sent: Saturday, June 8, 2024 1:38 PM To: r-help at r-project.org (r-help at r-project.org) <r-help at r-project.org> Subject: [R] Can't compute row means of two columns of a dataframe. I have a data frame with three columns, TotalInches, Low20, High20. For each row of the dataset, I am trying to compute the mean of Low20 and High20. xxxz <- structure(list(TotalInches = c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76), Low20 c(84, 87, 90, 93, 96, 99, 102, 106, 109, 112, 116, 119, 122, 126, 129, 133, 137, 141, 144), High20 = c(111, 115, 119, 123, 127, 131, 135, 140, 144, 148, 153, 157, 162, 167, 171, 176, 181, 186, 191 )), class = "data.frame", row.names = c(NA, -19L)) xxxz str(xxxz) xxxz$Average20 <- by(xxxz[,c("Low20","High20")],xxxz[,"TotalInches"],mean) warnings() When I run the code above, I don't get the means by row. I get the following warning messages, one for each row of the dataframe. Warning messages: 1: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA 2: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA Can someone tell my what I am doing wrong, and how I can compute the row means? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ebert,Timothy Aaron
2024-Jun-08 20:05 UTC
[R] Can't compute row means of two columns of a dataframe.
Can this problem be made more direct? xxxz$Average.20 <- (xxxz$Low20 + xxxz$High20)/2 That is literally the mean of two columns. Functions can be useful if there will be more columns, but with just two this seems easier. I will point out that the average daily temperature based on the midpoint between minimum and maximum contains a fair bit of error because that is only roughly how heating and cooling respond. I admit that sometimes there are no other choices and we work with available data. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John Sent: Saturday, June 8, 2024 1:38 PM To: r-help at r-project.org (r-help at r-project.org) <r-help at r-project.org> Subject: [R] Can't compute row means of two columns of a dataframe. [External Email] I have a data frame with three columns, TotalInches, Low20, High20. For each row of the dataset, I am trying to compute the mean of Low20 and High20. xxxz <- structure(list(TotalInches c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76), Low20 = c(84, 87, 90, 93, 96, 99, 102, 106, 109, 112, 116, 119, 122, 126, 129, 133, 137, 141, 144), High20 = c(111, 115, 119, 123, 127, 131, 135, 140, 144, 148, 153, 157, 162, 167, 171, 176, 181, 186, 191 )), class = "data.frame", row.names = c(NA, -19L)) xxxz str(xxxz) xxxz$Average20 <- by(xxxz[,c("Low20","High20")],xxxz[,"TotalInches"],mean) warnings() When I run the code above, I don't get the means by row. I get the following warning messages, one for each row of the dataframe. Warning messages: 1: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA 2: In mean.default(data[x, , drop = FALSE], ...) : argument is not numeric or logical: returning NA Can someone tell my what I am doing wrong, and how I can compute the row means? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.