Sorkin, John
2025-Jan-24 19:03 UTC
[R] Interpreting the output of str on a data frame created using aggregate function
I ran the following code: marginalcats <- aggregate(meanbyCensusIDAndDay3$cats, list(meanbyCensusIDAndDay3$CensusID),table) followed by str(marginalcats) I received the following output: 'data.frame': 844 obs. of 2 variables: $ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ... $ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ... I am trying to understand the output. I believe it says that marginalcats (1) is a data frame (2) the df has two elements (I) Group.1 and (II) x (3) Group.1 is a ?list? of number (4) x which is a 844x7 matrix having value "Good", "Moderate", etc. A few questions: (A) Is the interpretation given above correct? (B) Does the .. ..$ : NULL mean that the matrix has no row names? (C) What does "attr(*, "dimnames")=List of 2" mean? (D) Does it mean that the dimensions of the matrix are stored as two separate lists? (E) If so, how do I access the lists? When I enter dimnames(marginalcatsx$x) I receive: [[1]] NULL [[2]] [1] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" "Very Unhealthy" "Hazardous1" [7] "Hazardous2" Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382
Duncan Murdoch
2025-Jan-24 19:13 UTC
[R] Interpreting the output of str on a data frame created using aggregate function
I'll answer your question inline. On 2025-01-24 2:03 p.m., Sorkin, John wrote:> I ran the following code: > marginalcats <- aggregate(meanbyCensusIDAndDay3$cats, > list(meanbyCensusIDAndDay3$CensusID),table) > followed by > str(marginalcats) > > I received the following output: > 'data.frame': 844 obs. of 2 variables: > $ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ... > $ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ... > ..- attr(*, "dimnames")=List of 2 > .. ..$ : NULL > .. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ... > > I am trying to understand the output. I believe it says that marginalcats > (1) is a data frame > (2) the df has two elements (I) Group.1 and (II) xCorrect so far.> (3) Group.1 is a ?list? of numberNo, it's a numeric vector. The length doesn't print because it's in a dataframe, so the length is 844, the number of dataframe observations.> (4) x which is a 844x7 matrix having value "Good", "Moderate", etc.Correct.> > A few questions: > (A) Is the interpretation given above correct? > (B) Does the .. ..$ : NULL mean that the matrix has no row names?Yes.> (C) What does "attr(*, "dimnames")=List of 2" mean?That says x has an attribute called "dimnames", which is a list with two elements. They are the row names (which is NULL, you don't have any) and the column names.> (D) Does it mean that the dimensions of the matrix are stored as two separate lists?No. The dim is an attribute which is shown implicitly as "[1:844, 1:7]", i.e. c(844, 7). Duncan Murdoch> (E) If so, how do I access the lists? > When I enter > dimnames(marginalcatsx$x) > I receive: > > [[1]] > NULL > > [[2]] > [1] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" "Very Unhealthy" "Hazardous1" > [7] "Hazardous2" > > Thank you, > John > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2025-Jan-24 20:22 UTC
[R] Interpreting the output of str on a data frame created using aggregate function
?s 19:03 de 24/01/2025, Sorkin, John escreveu:> I ran the following code: > marginalcats <- aggregate(meanbyCensusIDAndDay3$cats, > list(meanbyCensusIDAndDay3$CensusID),table) > followed by > str(marginalcats) > > I received the following output: > 'data.frame': 844 obs. of 2 variables: > $ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ... > $ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ... > ..- attr(*, "dimnames")=List of 2 > .. ..$ : NULL > .. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ... > > I am trying to understand the output. I believe it says that marginalcats > (1) is a data frame > (2) the df has two elements (I) Group.1 and (II) x > (3) Group.1 is a ?list? of number > (4) x which is a 844x7 matrix having value "Good", "Moderate", etc. > > A few questions: > (A) Is the interpretation given above correct? > (B) Does the .. ..$ : NULL mean that the matrix has no row names? > (C) What does "attr(*, "dimnames")=List of 2" mean? > (D) Does it mean that the dimensions of the matrix are stored as two separate lists? > (E) If so, how do I access the lists? > When I enter > dimnames(marginalcatsx$x) > I receive: > > [[1]] > NULL > > [[2]] > [1] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" "Very Unhealthy" "Hazardous1" > [7] "Hazardous2" > > Thank you, > John > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, What str is telling you is that the 2nd column is a matrix column, it has a dim attribute and has two dimensions. Those dimensions have colnames but not rownames assigned. The example below tries to produce a result similar to yours, numbers will vary. df1 <- data.frame(x = rep(letters[1:3], 8), y = rep(1:12, each = 2)) agg <- aggregate(df1$x, by = list(df1$y), table) str(agg) #> 'data.frame': 12 obs. of 2 variables: #> $ Group.1: int 1 2 3 4 5 6 7 8 9 10 ... #> $ x : int [1:12, 1:2] 1 1 1 1 1 1 1 1 1 1 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:2] "a" "b" n <- ncol(agg) cbind(agg[-n], agg[[n]]) #> Group.1 a b #> 1 1 1 1 #> 2 2 1 1 #> 3 3 1 1 #> 4 4 1 1 #> 5 5 1 1 #> 6 6 1 1 #> 7 7 1 1 #> 8 8 1 1 #> 9 9 1 1 #> 10 10 1 1 #> 11 11 1 1 #> 12 12 1 1 The 2nd column is a matrix because the values of y are always present for each value of x. Confusing? I think it is, another example makes it more clear agg2 <- aggregate(cyl ~ gear, mtcars, table) str(agg2) #> 'data.frame': 3 obs. of 2 variables: #> $ gear: num 3 4 5 #> $ cyl :List of 3 #> ..$ : 'table' int [1:3(1d)] 1 2 12 #> .. ..- attr(*, "dimnames")=List of 1 #> .. .. ..$ : chr [1:3] "4" "6" "8" #> ..$ : 'table' int [1:2(1d)] 8 4 #> .. ..- attr(*, "dimnames")=List of 1 #> .. .. ..$ : chr [1:2] "4" "6" #> ..$ : 'table' int [1:3(1d)] 2 1 2 #> .. ..- attr(*, "dimnames")=List of 1 #> .. .. ..$ : chr [1:3] "4" "6" "8" m <- ncol(agg2) cbind(agg2[-m], agg2[[m]]) #> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 2 agg2[[m]] #> [[1]] #> #> 4 6 8 #> 1 2 12 #> #> [[2]] #> #> 4 6 #> 8 4 #> #> [[3]] #> #> 4 6 8 #> 2 1 2 Now the second vector doesn't have gear == 8, there is an imbalance in the table()'s results lengths'. So the vectors cannot be cbind'ed and there is an error. What the question asks for, to interpret str's output, is visible above. Since the output of table are 3 vectors all of the same length then the output of aggregate cannot cbind those vectors and cannot output a matrix, like both of these can: $ x : int [1:844, 1:7] # OP $ x : int [1:12, 1:2] # my 1st example This is not different of the *apply functions that default to simplifying if possible, if not output a list. In fact I believe it's exactly the same behavior. Here is a 3rd example with commented code. I hope it is simple to follow. need_some_stats <- function(x) { c(Count = length(x), Mean = mean(x), Var = var(x)) } agg3 <- aggregate(mpg ~ gear, mtcars, need_some_stats) # 2nd column is a matrix 3x3 str(agg3) #> 'data.frame': 3 obs. of 2 variables: #> $ gear: num 3 4 5 #> $ mpg : num [1:3, 1:3] 15 12 5 16.1 24.5 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:3] "Count" "Mean" "Var" # ugly output, the matrix column has its colnames # prefixed with the data.frame's 2nd column name (mpg). agg3 #> gear mpg.Count mpg.Mean mpg.Var #> 1 3 15.00000 16.10667 11.36781 #> 2 4 12.00000 24.53333 27.84424 #> 3 5 5.00000 21.38000 44.34200 # make it a data.frame with all columns atomic vectors. # the single `[` is meant to extract a sub-data.frame # the double `[[` is meant to extract the last vector (a matrix) p <- ncol(agg3) cbind(agg3[-p], agg3[[p]]) #> gear Count Mean Var #> 1 3 15 16.10667 11.36781 #> 2 4 12 24.53333 27.84424 #> 3 5 5 21.38000 44.34200 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
@vi@e@gross m@iii@g oii gm@ii@com
2025-Jan-25 01:49 UTC
[R] Interpreting the output of str on a data frame created using aggregate function
John, Others have helped educate you on your initial guesses and I just want to add that "str()" is not the only function people use to see their data. People often want to also look at what class(es) an object has or the names inside it and so on or what dimensions it contains. There are various commands that can be helpful. One to consider is a function called glimpse(), as in the dplyr package. It sort of shows a rotated version of the data that may be of use at times. And, if you were using a GUI like RSTUDIO, you have one pane (typically upper right, sharing the space with other possible tabs) in which you can click on a variable being shown and have it open up to show parts as needed and even to view the data in another VIEW window, typically replacing the upper left you normally edit in as a tab. In that environment you can even ask to view or edit a variable such as a data frame. One very useful technique I use is to NOT study something complex. Copy the part you want and look at it. Yor second component called $x can be copied out and examined. It looks in one sense like a matrix with the same 844 rows as the whole dataframe and seven columns. Or, is it a sub-data.frame of some kind? Modern R lets you embed all kinds of objects including other lists if you do it carefully, within each cell. Extracting it as a whole, may let you examine it using whatever tools apply. And, believe it or not, sometimes it pays to read the damn documentation. When I typed: ?aggregate Into my session, I looked a bit further down and found this section: --- Value For the time series method, a time series of class "ts" or class c("mts", "ts"). For the data frame method, a data frame with columns corresponding to the grouping variables in by followed by aggregated columns from x. If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]]. --- You seem to have invoked the data.frame method. Perhaps the above makes sense to you. And, it suggests perhaps a way to get a time series out that you can investigate further and see if that may be helpful. OR, and I hesitate to say this, since you do want to master base R methods, consider whether using aggragate() is a good route to get what you want. I am sure it is fine, but some may like to combine several dplyr verbs in the tidyverse packages or use other methods from yet other packages where you may more easily understand the output. Another possibility once you figure out what you have and compare it to what you want, is to use base R primitives or dplyr ones to do things like unnest() to transform an embedded data structure into something a tad different and perhaps suitable for your purposes. I am not someone who believes the old ways are best, especially when many other aspects of the world have moved forward. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John Sent: Friday, January 24, 2025 2:03 PM To: r-help at r-project.org (r-help at r-project.org) <r-help at r-project.org> Subject: [R] Interpreting the output of str on a data frame created using aggregate function I ran the following code: marginalcats <- aggregate(meanbyCensusIDAndDay3$cats, list(meanbyCensusIDAndDay3$CensusID),table) followed by str(marginalcats) I received the following output: 'data.frame': 844 obs. of 2 variables: $ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ... $ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ... I am trying to understand the output. I believe it says that marginalcats (1) is a data frame (2) the df has two elements (I) Group.1 and (II) x (3) Group.1 is a ?list? of number (4) x which is a 844x7 matrix having value "Good", "Moderate", etc. A few questions: (A) Is the interpretation given above correct? (B) Does the .. ..$ : NULL mean that the matrix has no row names? (C) What does "attr(*, "dimnames")=List of 2" mean? (D) Does it mean that the dimensions of the matrix are stored as two separate lists? (E) If so, how do I access the lists? When I enter dimnames(marginalcatsx$x) I receive: [[1]] NULL [[2]] [1] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" "Very Unhealthy" "Hazardous1" [7] "Hazardous2" Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;? PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- Interpreting the output of str on a data frame created using aggregate function
- aggregate(as.formula("some formula"), data, function) error when called from in a function
- need descriptive help
- ggplot with major and MINOR tick marks on a log scale
- ggplot with major and MINOR tick marks on a log scale