I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe. #my code: # This function should process a data frame and return a data frame # containing the mean, minimum, maximum, and number of observations # in the data frame for each level of MyDay. catsbyday2 <- function(df){ # Create a matrix to hold the calculated values. xx <- matrix(nrow=1,ncol=4) # Give names to the columns. colnames(xx) <- c("Mean","min","max","Nobs") cat("This is the matrix that will hold the results\n",xx,"\n") # For each level of the indexing variable, MyDay, compute the # mean, minimum, maximum, and number of observations in the # dataframe passed to the function. xx[,1] <- mean(df) xx[,2] <- min(df) xx[,3] <- max(df) xx[,4] <- length(df) cat("These are the dimensions of the matrix in the function",dim(xx),"\n") print(xx) return(xx) } # Create data frame inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30, 11,21,31, 12,22,32, 15,25,35)) str(inJan2Test) cat("This is the data frame","\n") inJan2Test xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE) xx class(xx) str(xx) names(xx) # Create a data frame in the format that I expect aggregate would return examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) examplar str(examplar) While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want. (1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of (2) The column names I define in the function are not part of the data frame that is created. (3) The returned values on each row are separated by commas. I would expect them to be separated by spaces. (4) When I run str() on the output it appears that the output dataframe contains a list.> str(xx)'data.frame': 3 obs. of 2 variables: $ Group.1: num 1 2 3 $ x :List of 3 ..$ : num [1, 1:4] 12 10 15 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" ..$ : num [1, 1:4] 22 20 25 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" ..$ : num [1, 1:4] 32 30 35 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" I want it to simply be a numeric dataframe: mean min max length 12 10 15 4 22 20 25 4 32 30 35 4 which should return the following str examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) examplar str(examplar) 'data.frame': 3 obs. of 4 variables: $ mean : num 12 22 32 $ min : num 10 20 30 $ max : num 15 25 35 $ length: num 4 4 4 John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382
On Wed, 11 Dec 2024, Sorkin, John writes:> I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe. > > #my code: > # This function should process a data frame and return a data frame > # containing the mean, minimum, maximum, and number of observations > # in the data frame for each level of MyDay. > catsbyday2 <- function(df){ > # Create a matrix to hold the calculated values. > xx <- matrix(nrow=1,ncol=4) > # Give names to the columns. > colnames(xx) <- c("Mean","min","max","Nobs") > cat("This is the matrix that will hold the results\n",xx,"\n") > > # For each level of the indexing variable, MyDay, compute the > # mean, minimum, maximum, and number of observations in the > # dataframe passed to the function. > xx[,1] <- mean(df) > xx[,2] <- min(df) > xx[,3] <- max(df) > xx[,4] <- length(df) > cat("These are the dimensions of the matrix in the function",dim(xx),"\n") > print(xx) > return(xx) > } > > # Create data frame > inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30, > 11,21,31, > 12,22,32, > 15,25,35)) > str(inJan2Test) > cat("This is the data frame","\n") > inJan2Test > > xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE) > xx > class(xx) > str(xx) > names(xx) > > # Create a data frame in the format that I expect aggregate would return > examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > > While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want. > > (1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of > (2) The column names I define in the function are not part of the data frame that is created. > (3) The returned values on each row are separated by commas. I would expect them to be separated by spaces. > (4) When I run str() on the output it appears that the output dataframe contains a list. >> str(xx) > 'data.frame': 3 obs. of 2 variables: > $ Group.1: num 1 2 3 > $ x :List of 3 > ..$ : num [1, 1:4] 12 10 15 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 22 20 25 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 32 30 35 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > > I want it to simply be a numeric dataframe: > > mean min max length > 12 10 15 4 > 22 20 25 4 > 32 30 35 4 > > which should return the following str > > examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > 'data.frame': 3 obs. of 4 variables: > $ mean : num 12 22 32 > $ min : num 10 20 30 > $ max : num 15 25 35 > $ length: num 4 4 4You'll no doubt get answers that use 'aggregate', but for such calculations I find 'tapply' much easier/clearer: res <- tapply(inJan2Test$AveragePM2_5, ## what to compute on inJan2Test$MyDay, ## what to group by function(x) c(mean = mean(x), ## what to do for each group min = min(x), max = max(x), length = length(x))) The result will be a list of vectors, which you can bind together: do.call(rbind, res) ## min max mean length ## 1 10 15 12 4 ## 2 20 25 22 4 ## 3 30 35 32 4 (Though the result is a numeric matrix. But that is only one 'as.data.frame' away from a data.frame, if it has to be one.) kind regards Enrico> John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382-- Enrico Schumann Lucerne, Switzerland https://enricoschumann.net
?s 20:31 de 11/12/2024, Sorkin, John escreveu:> I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe. > > #my code: > # This function should process a data frame and return a data frame > # containing the mean, minimum, maximum, and number of observations > # in the data frame for each level of MyDay. > catsbyday2 <- function(df){ > # Create a matrix to hold the calculated values. > xx <- matrix(nrow=1,ncol=4) > # Give names to the columns. > colnames(xx) <- c("Mean","min","max","Nobs") > cat("This is the matrix that will hold the results\n",xx,"\n") > > # For each level of the indexing variable, MyDay, compute the > # mean, minimum, maximum, and number of observations in the > # dataframe passed to the function. > xx[,1] <- mean(df) > xx[,2] <- min(df) > xx[,3] <- max(df) > xx[,4] <- length(df) > cat("These are the dimensions of the matrix in the function",dim(xx),"\n") > print(xx) > return(xx) > } > > # Create data frame > inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30, > 11,21,31, > 12,22,32, > 15,25,35)) > str(inJan2Test) > cat("This is the data frame","\n") > inJan2Test > > xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE) > xx > class(xx) > str(xx) > names(xx) > > # Create a data frame in the format that I expect aggregate would return > examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > > While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want. > > (1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of > (2) The column names I define in the function are not part of the data frame that is created. > (3) The returned values on each row are separated by commas. I would expect them to be separated by spaces. > (4) When I run str() on the output it appears that the output dataframe contains a list. >> str(xx) > 'data.frame': 3 obs. of 2 variables: > $ Group.1: num 1 2 3 > $ x :List of 3 > ..$ : num [1, 1:4] 12 10 15 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 22 20 25 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 32 30 35 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > > I want it to simply be a numeric dataframe: > > mean min max length > 12 10 15 4 > 22 20 25 4 > 32 30 35 4 > > which should return the following str > > examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > 'data.frame': 3 obs. of 4 variables: > $ mean : num 12 22 32 > $ min : num 10 20 30 > $ max : num 15 25 35 > $ length: num 4 4 4 > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, The code can be made much simpler. The summary statistics function is a one-liner, it just computes and returns a named vector. But the statistics are now in a matrix, the last column is a matrix column and if you print the result, agg, you will see the name AveragePM2_5 with suffixes "mean", "min", "max" and "nobs" appended. You can solve this by removing that column from the result and cbind it with the rest of the agg data.frame. catsbyday2 <- function(x) { c(mean = mean(x), min = min(x), max = max(x), nobs = length(x)) } agg <- aggregate(AveragePM2_5 ~ MyDay, inJan2Test, FUN = catsbyday2) # The 2nd column is a matrix 3x4 str(agg) #> 'data.frame': 3 obs. of 2 variables: #> $ MyDay : num 1 2 3 #> $ AveragePM2_5: num [1:3, 1:4] 12 22 32 10 20 30 15 25 35 4 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:4] "mean" "min" "max" "nobs" # this solves it, the method cbind.data.frame is # called since the 1st argument is a df cbind(agg[-ncol(agg)], agg[[ncol(agg)]]) #> MyDay mean min max nobs #> 1 1 12 10 15 4 #> 2 2 22 20 25 4 #> 3 3 32 30 35 4 # a data.frame agg[-ncol(agg)] #> MyDay #> 1 1 #> 2 2 #> 3 3 # the matrix column agg[[ncol(agg)]] #> mean min max nobs #> [1,] 12 10 15 4 #> [2,] 22 20 25 4 #> [3,] 32 30 35 4 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com