Kelly Thompson
2022-Oct-21 22:32 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
As my end result, I want a matrix or data frame, with one row for each year, and one column for each category. On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at gmail.com> wrote:> > # I think this might be a better example. > > # I have data presented in a "vertical" dataframe as shown below in > data_original. > # I want this data in a matrix or "grid", as shown below. > # What I show below seems like one way this can be done. > > # My question: Are there easier or better ways to do this, especially > in Base R, and also in R packages? > > #create data > set.seed(1) > data_original <- data.frame(year = rep(1990:1999, length = 50), > category = sample(1:5, size = 50, replace = TRUE), sales > sample(0:99999, size = 50 , replace = TRUE) ) > dim(data_original) > > #remove rows where data_original$year == 1990 & data_original$category > == 5, to ensure there is at least one NA in the "grid" > data_original <- data_original[ (data_original$year == 1990 & > data_original$category == 5) == FALSE, ] > dim(data_original) > > #aggregate data > data_aggregate_sum_by_year_and_category <- aggregate(x > data_original$sales, by = list(year = data_original$year, category > data_original$category), FUN = sum) > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > 'category', 'sum_of_sales') > dim(data_aggregate_sum_by_year_and_category) > > data_expanded <- expand.grid(year > unique(data_aggregate_sum_by_year_and_category$year), category > unique(data_aggregate_sum_by_year_and_category$category)) > dim(data_expanded) > data_expanded <- merge(data_expanded, > data_aggregate_sum_by_year_and_category, all = TRUE) > dim(data_expanded) > > mat <- matrix(data = data_expanded$sum_of_sales, nrow > length(unique(data_expanded$year)), ncol > length(unique(data_expanded$category)) , byrow = TRUE, dimnames > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > data_original > data_expanded > mat > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at gmail.com> wrote: > > > > ### > > #I have data presented in a "vertical" data frame as shown below in > > data_original. > > #I want this data in a matrix or "grid", as shown below. > > #What I show below seems like one way this can be done. > > > > #My question: Are there easier or better ways to do this, especially > > in Base R, and also in R packages? > > > > #reproducible example > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > data_expanded <- expand.grid(unique(data_original$year), > > unique(data_original$size), stringsAsFactors = FALSE ) > > colnames(data_expanded) <- c('year', 'size') > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > mat <- matrix(data = data_expanded $n, nrow > > length(unique(data_expanded $year)), ncol > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > data_original > > data_expanded > > mat
Bert Gunter
2022-Oct-21 23:20 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
"As my end result, I want a matrix or data frame, with one row for each
year, and one column for each category."
If I understand you correctly, no reshaping gymnastics are needed --
just use ?tapply:
set.seed(1)
do <- data.frame(year = rep(1990:1999, length = 50),
category = sample(1:5, size = 50, replace = TRUE),
sales = sample(0:99999, size = 50 , replace = TRUE) )
with(do, tapply(sales, list(year, category),sum))
## which gives the matrix:
1 2 3 4 5
1990 13283 NA 55083 87522 64877
1991 NA 80963 NA 30100 28277
1992 9391 202916 NA 55090 NA
1993 29696 167344 NA NA 17625
1994 98015 99521 NA 70536 52252
1995 157003 NA 26875 NA 11366
1996 32986 88683 6562 79475 95282
1997 13601 NA 134757 12398 NA
1998 30537 51117 31333 20204 NA
1999 39240 87845 62479 NA 98804
If this is not what you wanted, you may need to explain further or
await a response from someone more insightful than I.
Cheers,
Bert
On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <kt1572757 at gmail.com>
wrote:>
> As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category.
>
> On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at
gmail.com> wrote:
> >
> > # I think this might be a better example.
> >
> > # I have data presented in a "vertical" dataframe as shown
below in
> > data_original.
> > # I want this data in a matrix or "grid", as shown below.
> > # What I show below seems like one way this can be done.
> >
> > # My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #create data
> > set.seed(1)
> > data_original <- data.frame(year = rep(1990:1999, length = 50),
> > category = sample(1:5, size = 50, replace = TRUE), sales > >
sample(0:99999, size = 50 , replace = TRUE) )
> > dim(data_original)
> >
> > #remove rows where data_original$year == 1990 &
data_original$category
> > == 5, to ensure there is at least one NA in the "grid"
> > data_original <- data_original[ (data_original$year == 1990 &
> > data_original$category == 5) == FALSE, ]
> > dim(data_original)
> >
> > #aggregate data
> > data_aggregate_sum_by_year_and_category <- aggregate(x > >
data_original$sales, by = list(year = data_original$year, category > >
data_original$category), FUN = sum)
> > colnames(data_aggregate_sum_by_year_and_category) <-
c('year',
> > 'category', 'sum_of_sales')
> > dim(data_aggregate_sum_by_year_and_category)
> >
> > data_expanded <- expand.grid(year > >
unique(data_aggregate_sum_by_year_and_category$year), category > >
unique(data_aggregate_sum_by_year_and_category$category))
> > dim(data_expanded)
> > data_expanded <- merge(data_expanded,
> > data_aggregate_sum_by_year_and_category, all = TRUE)
> > dim(data_expanded)
> >
> > mat <- matrix(data = data_expanded$sum_of_sales, nrow > >
length(unique(data_expanded$year)), ncol > >
length(unique(data_expanded$category)) , byrow = TRUE, dimnames > > list(
unique(data_expanded$year), unique(data_expanded$category) ) )
> >
> >
> > data_original
> > data_expanded
> > mat
> >
> > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at
gmail.com> wrote:
> > >
> > > ###
> > > #I have data presented in a "vertical" data frame as
shown below in
> > > data_original.
> > > #I want this data in a matrix or "grid", as shown
below.
> > > #What I show below seems like one way this can be done.
> > >
> > > #My question: Are there easier or better ways to do this,
especially
> > > in Base R, and also in R packages?
> > >
> > > #reproducible example
> > >
> > > data_original <- data.frame(year = c('1990',
'1999', '1990', '1989'),
> > > size = c('s', 'l', 'xl', 'xs'),
n = c(99, 33, 3, 4) )
> > >
> > > data_expanded <- expand.grid(unique(data_original$year),
> > > unique(data_original$size), stringsAsFactors = FALSE )
> > > colnames(data_expanded) <- c('year', 'size')
> > > data_expanded <- merge(data_expanded, data_original, all =
TRUE)
> > >
> > > mat <- matrix(data = data_expanded $n, nrow > > >
length(unique(data_expanded $year)), ncol > > >
length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > >
> > > data_original
> > > data_expanded
> > > mat
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
CALUM POLWART
2022-Oct-22 06:27 UTC
[R] getting data from a "vertical" table into a "2-dimensional" grid
You asked for base R but also said or using other methods. So for
completeness here is a solution using Tidyverse
library(tidyverse)
data_original <- data.frame(
year = c('1990', '1999', '1990', '1989'),
size = c('s', 'l', 'xl', 'xs'), n = c(99, 33,
3, 4) )
data_original |>
pivot_wider(
id_cols = year,
names_from = size,
values_from = n
)
# A tibble: 3 x 5
year s l xl xs
<chr> <dbl> <dbl> <dbl> <dbl>
1 1990 99 NA 3 NA
2 1999 NA 33 NA NA
3 1989 NA NA NA 4
If you really want a matrix add
|>
as.matrix()
To the end.
I saw you asked if you can use column position and the answer is the same
here you can replace size with 2. BUT beware this is prone to errors if
data gets changes in order etc.
This obviously needs to use tidyverse which some people hate. But it does
make the code much more readable to understand what you are doing...
On Fri, 21 Oct 2022, 23:34 Kelly Thompson, <kt1572757 at gmail.com> wrote:
> As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category.
>
> On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 at
gmail.com>
> wrote:
> >
> > # I think this might be a better example.
> >
> > # I have data presented in a "vertical" dataframe as shown
below in
> > data_original.
> > # I want this data in a matrix or "grid", as shown below.
> > # What I show below seems like one way this can be done.
> >
> > # My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #create data
> > set.seed(1)
> > data_original <- data.frame(year = rep(1990:1999, length = 50),
> > category = sample(1:5, size = 50, replace = TRUE), sales > >
sample(0:99999, size = 50 , replace = TRUE) )
> > dim(data_original)
> >
> > #remove rows where data_original$year == 1990 &
data_original$category
> > == 5, to ensure there is at least one NA in the "grid"
> > data_original <- data_original[ (data_original$year == 1990 &
> > data_original$category == 5) == FALSE, ]
> > dim(data_original)
> >
> > #aggregate data
> > data_aggregate_sum_by_year_and_category <- aggregate(x > >
data_original$sales, by = list(year = data_original$year, category > >
data_original$category), FUN = sum)
> > colnames(data_aggregate_sum_by_year_and_category) <-
c('year',
> > 'category', 'sum_of_sales')
> > dim(data_aggregate_sum_by_year_and_category)
> >
> > data_expanded <- expand.grid(year > >
unique(data_aggregate_sum_by_year_and_category$year), category > >
unique(data_aggregate_sum_by_year_and_category$category))
> > dim(data_expanded)
> > data_expanded <- merge(data_expanded,
> > data_aggregate_sum_by_year_and_category, all = TRUE)
> > dim(data_expanded)
> >
> > mat <- matrix(data = data_expanded$sum_of_sales, nrow > >
length(unique(data_expanded$year)), ncol > >
length(unique(data_expanded$category)) , byrow = TRUE, dimnames > > list(
unique(data_expanded$year), unique(data_expanded$category) ) )
> >
> >
> > data_original
> > data_expanded
> > mat
> >
> > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 at
gmail.com>
> wrote:
> > >
> > > ###
> > > #I have data presented in a "vertical" data frame as
shown below in
> > > data_original.
> > > #I want this data in a matrix or "grid", as shown
below.
> > > #What I show below seems like one way this can be done.
> > >
> > > #My question: Are there easier or better ways to do this,
especially
> > > in Base R, and also in R packages?
> > >
> > > #reproducible example
> > >
> > > data_original <- data.frame(year = c('1990',
'1999', '1990', '1989'),
> > > size = c('s', 'l', 'xl', 'xs'),
n = c(99, 33, 3, 4) )
> > >
> > > data_expanded <- expand.grid(unique(data_original$year),
> > > unique(data_original$size), stringsAsFactors = FALSE )
> > > colnames(data_expanded) <- c('year', 'size')
> > > data_expanded <- merge(data_expanded, data_original, all =
TRUE)
> > >
> > > mat <- matrix(data = data_expanded $n, nrow > > >
length(unique(data_expanded $year)), ncol > > >
length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > >
> > > data_original
> > > data_expanded
> > > mat
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]