Kelly Thompson
2022-Oct-21 22:32 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
As my end result, I want a matrix or data frame, with one row for each year, and one column for each category. On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at gmail.com> wrote:> > # I think this might be a better example. > > # I have data presented in a "vertical" dataframe as shown below in > data_original. > # I want this data in a matrix or "grid", as shown below. > # What I show below seems like one way this can be done. > > # My question: Are there easier or better ways to do this, especially > in Base R, and also in R packages? > > #create data > set.seed(1) > data_original <- data.frame(year = rep(1990:1999, length = 50), > category = sample(1:5, size = 50, replace = TRUE), sales > sample(0:99999, size = 50 , replace = TRUE) ) > dim(data_original) > > #remove rows where data_original$year == 1990 & data_original$category > == 5, to ensure there is at least one NA in the "grid" > data_original <- data_original[ (data_original$year == 1990 & > data_original$category == 5) == FALSE, ] > dim(data_original) > > #aggregate data > data_aggregate_sum_by_year_and_category <- aggregate(x > data_original$sales, by = list(year = data_original$year, category > data_original$category), FUN = sum) > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > 'category', 'sum_of_sales') > dim(data_aggregate_sum_by_year_and_category) > > data_expanded <- expand.grid(year > unique(data_aggregate_sum_by_year_and_category$year), category > unique(data_aggregate_sum_by_year_and_category$category)) > dim(data_expanded) > data_expanded <- merge(data_expanded, > data_aggregate_sum_by_year_and_category, all = TRUE) > dim(data_expanded) > > mat <- matrix(data = data_expanded$sum_of_sales, nrow > length(unique(data_expanded$year)), ncol > length(unique(data_expanded$category)) , byrow = TRUE, dimnames > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > data_original > data_expanded > mat > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at gmail.com> wrote: > > > > ### > > #I have data presented in a "vertical" data frame as shown below in > > data_original. > > #I want this data in a matrix or "grid", as shown below. > > #What I show below seems like one way this can be done. > > > > #My question: Are there easier or better ways to do this, especially > > in Base R, and also in R packages? > > > > #reproducible example > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > data_expanded <- expand.grid(unique(data_original$year), > > unique(data_original$size), stringsAsFactors = FALSE ) > > colnames(data_expanded) <- c('year', 'size') > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > mat <- matrix(data = data_expanded $n, nrow > > length(unique(data_expanded $year)), ncol > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > data_original > > data_expanded > > mat
Bert Gunter
2022-Oct-21 23:20 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
"As my end result, I want a matrix or data frame, with one row for each year, and one column for each category." If I understand you correctly, no reshaping gymnastics are needed -- just use ?tapply: set.seed(1) do <- data.frame(year = rep(1990:1999, length = 50), category = sample(1:5, size = 50, replace = TRUE), sales = sample(0:99999, size = 50 , replace = TRUE) ) with(do, tapply(sales, list(year, category),sum)) ## which gives the matrix: 1 2 3 4 5 1990 13283 NA 55083 87522 64877 1991 NA 80963 NA 30100 28277 1992 9391 202916 NA 55090 NA 1993 29696 167344 NA NA 17625 1994 98015 99521 NA 70536 52252 1995 157003 NA 26875 NA 11366 1996 32986 88683 6562 79475 95282 1997 13601 NA 134757 12398 NA 1998 30537 51117 31333 20204 NA 1999 39240 87845 62479 NA 98804 If this is not what you wanted, you may need to explain further or await a response from someone more insightful than I. Cheers, Bert On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <kt1572757 at gmail.com> wrote:> > As my end result, I want a matrix or data frame, with one row for each > year, and one column for each category. > > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at gmail.com> wrote: > > > > # I think this might be a better example. > > > > # I have data presented in a "vertical" dataframe as shown below in > > data_original. > > # I want this data in a matrix or "grid", as shown below. > > # What I show below seems like one way this can be done. > > > > # My question: Are there easier or better ways to do this, especially > > in Base R, and also in R packages? > > > > #create data > > set.seed(1) > > data_original <- data.frame(year = rep(1990:1999, length = 50), > > category = sample(1:5, size = 50, replace = TRUE), sales > > sample(0:99999, size = 50 , replace = TRUE) ) > > dim(data_original) > > > > #remove rows where data_original$year == 1990 & data_original$category > > == 5, to ensure there is at least one NA in the "grid" > > data_original <- data_original[ (data_original$year == 1990 & > > data_original$category == 5) == FALSE, ] > > dim(data_original) > > > > #aggregate data > > data_aggregate_sum_by_year_and_category <- aggregate(x > > data_original$sales, by = list(year = data_original$year, category > > data_original$category), FUN = sum) > > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > > 'category', 'sum_of_sales') > > dim(data_aggregate_sum_by_year_and_category) > > > > data_expanded <- expand.grid(year > > unique(data_aggregate_sum_by_year_and_category$year), category > > unique(data_aggregate_sum_by_year_and_category$category)) > > dim(data_expanded) > > data_expanded <- merge(data_expanded, > > data_aggregate_sum_by_year_and_category, all = TRUE) > > dim(data_expanded) > > > > mat <- matrix(data = data_expanded$sum_of_sales, nrow > > length(unique(data_expanded$year)), ncol > > length(unique(data_expanded$category)) , byrow = TRUE, dimnames > > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > > > > data_original > > data_expanded > > mat > > > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at gmail.com> wrote: > > > > > > ### > > > #I have data presented in a "vertical" data frame as shown below in > > > data_original. > > > #I want this data in a matrix or "grid", as shown below. > > > #What I show below seems like one way this can be done. > > > > > > #My question: Are there easier or better ways to do this, especially > > > in Base R, and also in R packages? > > > > > > #reproducible example > > > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > > > data_expanded <- expand.grid(unique(data_original$year), > > > unique(data_original$size), stringsAsFactors = FALSE ) > > > colnames(data_expanded) <- c('year', 'size') > > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > > > mat <- matrix(data = data_expanded $n, nrow > > > length(unique(data_expanded $year)), ncol > > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > > > data_original > > > data_expanded > > > mat > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
CALUM POLWART
2022-Oct-22 06:27 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
You asked for base R but also said or using other methods. So for completeness here is a solution using Tidyverse library(tidyverse) data_original <- data.frame( year = c('1990', '1999', '1990', '1989'), size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) data_original |> pivot_wider( id_cols = year, names_from = size, values_from = n ) # A tibble: 3 x 5 year s l xl xs <chr> <dbl> <dbl> <dbl> <dbl> 1 1990 99 NA 3 NA 2 1999 NA 33 NA NA 3 1989 NA NA NA 4 If you really want a matrix add |> as.matrix() To the end. I saw you asked if you can use column position and the answer is the same here you can replace size with 2. BUT beware this is prone to errors if data gets changes in order etc. This obviously needs to use tidyverse which some people hate. But it does make the code much more readable to understand what you are doing... On Fri, 21 Oct 2022, 23:34 Kelly Thompson, <kt1572757 at gmail.com> wrote:> As my end result, I want a matrix or data frame, with one row for each > year, and one column for each category. > > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at gmail.com> > wrote: > > > > # I think this might be a better example. > > > > # I have data presented in a "vertical" dataframe as shown below in > > data_original. > > # I want this data in a matrix or "grid", as shown below. > > # What I show below seems like one way this can be done. > > > > # My question: Are there easier or better ways to do this, especially > > in Base R, and also in R packages? > > > > #create data > > set.seed(1) > > data_original <- data.frame(year = rep(1990:1999, length = 50), > > category = sample(1:5, size = 50, replace = TRUE), sales > > sample(0:99999, size = 50 , replace = TRUE) ) > > dim(data_original) > > > > #remove rows where data_original$year == 1990 & data_original$category > > == 5, to ensure there is at least one NA in the "grid" > > data_original <- data_original[ (data_original$year == 1990 & > > data_original$category == 5) == FALSE, ] > > dim(data_original) > > > > #aggregate data > > data_aggregate_sum_by_year_and_category <- aggregate(x > > data_original$sales, by = list(year = data_original$year, category > > data_original$category), FUN = sum) > > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > > 'category', 'sum_of_sales') > > dim(data_aggregate_sum_by_year_and_category) > > > > data_expanded <- expand.grid(year > > unique(data_aggregate_sum_by_year_and_category$year), category > > unique(data_aggregate_sum_by_year_and_category$category)) > > dim(data_expanded) > > data_expanded <- merge(data_expanded, > > data_aggregate_sum_by_year_and_category, all = TRUE) > > dim(data_expanded) > > > > mat <- matrix(data = data_expanded$sum_of_sales, nrow > > length(unique(data_expanded$year)), ncol > > length(unique(data_expanded$category)) , byrow = TRUE, dimnames > > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > > > > data_original > > data_expanded > > mat > > > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at gmail.com> > wrote: > > > > > > ### > > > #I have data presented in a "vertical" data frame as shown below in > > > data_original. > > > #I want this data in a matrix or "grid", as shown below. > > > #What I show below seems like one way this can be done. > > > > > > #My question: Are there easier or better ways to do this, especially > > > in Base R, and also in R packages? > > > > > > #reproducible example > > > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > > > data_expanded <- expand.grid(unique(data_original$year), > > > unique(data_original$size), stringsAsFactors = FALSE ) > > > colnames(data_expanded) <- c('year', 'size') > > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > > > mat <- matrix(data = data_expanded $n, nrow > > > length(unique(data_expanded $year)), ncol > > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > > > data_original > > > data_expanded > > > mat > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]