thr3ads.net - R help - [R] Regroup and create new dataframe [Jun 2018]

If this information is useful, please help other people find it:
Share via:

nguy2952 University of Minnesota

2018-Jun-01 15:54 UTC

[R] Regroup and create new dataframe

Hello folks,

I have a big project to work on and the dataset is classified so I am just
going to use my own example so everyone can understand what I am targeting.

Let's take Target as an example: We consider three brands of tape: Target
brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
Product_Name(which contains three brands of tape), Sales, and Region. I
want to create a new data frame that looks like this:

                      Year of Record       Sales     Region
  Target Brand
  3M
  Avery

Here is what I did.

   1.

   I split the original data frame which I called data1:

   X = split(data1, Product_name)

   2.

   Unlist X

   X1 = unlist(X)

   3.

   Create a new data frame

   new_df = as.data.frame(X1)


But, when I used the command View(new_df), I had only two columns: The left
one is similar to TargetBrand.Sales, etc. and the right one is just
"X1"

I did not achieve what I wanted.

**A potentially big question from readers:*

Why am I doing this?

*Answer:*

I want to run a multiple regression model later to see among different
regions, what the sales look like for these three brands of tape:

*Does Mid-west buy more house brand than East Coast?*

or

*Does region really affect the sales? Are Mid-West's purchases similar to
those of East Coast and West Coast?*

I need help. Please give me guidance.

Sincerely,
Hugh N

	[[alternative HTML version deleted]]

David L Carlson

2018-Jun-01 18:55 UTC

head link

[R] Regroup and create new dataframe

Your question raises several issues. First, we do not do homework here, so if
this is an assignment, you will not get much help. Second, you need to send your
emails as plain text, not html. Third, you need to provide a reproducible
example and send your data using dput() so that we can follow what you have
tried so far. For example, here's a data set that resembles what you have
described:

set.seed(42)
Tape <- data.frame(Year=2011:2015, Product=rep(c("Target",
"3M", "Avery"),
     each=5), Sales=sample(1000:2000, 15), Region=rep(c("North",
"South",
     "West"), each=5), stringsAsFactors=FALSE)
dput(Tape)
structure(list(Year = c(2011L, 2012L, 2013L, 2014L, 2015L, 2011L, 
2012L, 2013L, 2014L, 2015L, 2011L, 2012L, 2013L, 2014L, 2015L
), Product = c("Target", "Target", "Target",
"Target", "Target",
"3M", "3M", "3M", "3M", "3M",
"Avery", "Avery", "Avery", "Avery",
"Avery"), Sales = c(1915L, 1937L, 1285L, 1828L, 1639L, 1517L, 
1732L, 1133L, 1652L, 1699L, 1453L, 1711L, 1924L, 1252L, 1456L
), Region = c("North", "North", "North",
"North", "North", "South",
"South", "South", "South", "South",
"West", "West", "West", "West",
"West")), .Names = c("Year", "Product",
"Sales", "Region"), row.names = c(NA,
-15L), class = "data.frame")

It is not clear what you want in your new data frame. This one has 5 years of
data for each tape brand and you seem to want one row for each tape brand?
Tables created in html and then sent to a plain text mailing list can be
dramatically different from the original format. It is not clear that you cannot
answer your questions from the data as presented here. Look at the results of
unlist(split(Tape, Tape$Product)). You should see that this is nowhere near what
you described.

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of nguy2952
University of Minnesota
Sent: Friday, June 1, 2018 10:54 AM
To: r-help at r-project.org
Subject: [R] Regroup and create new dataframe

Hello folks,

I have a big project to work on and the dataset is classified so I am just going
to use my own example so everyone can understand what I am targeting.

Let's take Target as an example: We consider three brands of tape: Target
brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
Product_Name(which contains three brands of tape), Sales, and Region. I want to
create a new data frame that looks like this:

                      Year of Record       Sales     Region
  Target Brand
  3M
  Avery

Here is what I did.

   1.

   I split the original data frame which I called data1:

   X = split(data1, Product_name)

   2.

   Unlist X

   X1 = unlist(X)

   3.

   Create a new data frame

   new_df = as.data.frame(X1)


But, when I used the command View(new_df), I had only two columns: The left one
is similar to TargetBrand.Sales, etc. and the right one is just "X1"

I did not achieve what I wanted.

**A potentially big question from readers:*

Why am I doing this?

*Answer:*

I want to run a multiple regression model later to see among different regions,
what the sales look like for these three brands of tape:

*Does Mid-west buy more house brand than East Coast?*

or

*Does region really affect the sales? Are Mid-West's purchases similar to
those of East Coast and West Coast?*

I need help. Please give me guidance.

Sincerely,
Hugh N

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2018-Jun-01 18:58 UTC

head link

[R] Regroup and create new dataframe

Hello,

I don't understand why you are splitting data1 and then unlisting the 
result.

if you want to apply a modeling function to each of the subdf's, split 
by Product name, you can follow more or less these steps:

0. Create a dataset

set.seed(9376)??? # Make the results reproducible

n <- 100
PN <- c("Target Brand", "3M", "Avery")
data1 <- data.frame(Product_name = sample(PN, n, TRUE),
 ??????????????????? Year_of_Record = sample(2011:2018, n, TRUE),
 ??????????????????? Sales = runif(n, 10, 1000),
 ??????????????????? Region = sample(letters[1:5], n, TRUE)
 ??????????????????? )

head(data1)


1. Split the dataset by product name. Thsi gives a list of subdf's.


X <- split(data1, data1$Product_name)


2. Now lappy a modeling function to each subdf.


modelFun <- function(DF){

 ??? lm(Sales ~ Region, data = DF)

}

model_list <- lapply(X, modelFun )
model_smry <- lapply(model_list, summary)
model_smry[[1]]
#
#Call:
#? lm(formula = Sales ~ Region, data = DF)
#
#Residuals:
#? Min????? 1Q? Median????? 3Q???? Max
#-487.41 -196.17??? 1.76? 195.96? 498.48
#
#Coefficients:
#? Estimate Std. Error t value Pr(>|t|)
#(Intercept)? 437.300??? 108.147?? 4.044 0.000355 ***
#? Regionb????? 437.019??? 167.540?? 2.608 0.014229 *
#? Regionc????? 102.989??? 179.341?? 0.574 0.570217
#Regiond????? 105.520??? 152.942?? 0.690 0.495721
#Regione?????? -5.638??? 138.342? -0.041 0.967773
#---
#? Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
#
#Residual standard error: 286.1 on 29 degrees of freedom
#Multiple R-squared:? 0.2426,??? Adjusted R-squared:? 0.1381
#F-statistic: 2.322 on 4 and 29 DF,? p-value: 0.08039

Hope this helps,


Rui Barradas


?s 16:54 de 01-06-2018, nguy2952 University of Minnesota
escreveu:> Hello folks,
>
> I have a big project to work on and the dataset is classified so I am just
> going to use my own example so everyone can understand what I am targeting.
>
> Let's take Target as an example: We consider three brands of tape:
Target
> brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
> Product_Name(which contains three brands of tape), Sales, and Region. I
> want to create a new data frame that looks like this:
>
>                        Year of Record       Sales     Region
>    Target Brand
>    3M
>    Avery
>
> Here is what I did.
>
>     1.
>
>     I split the original data frame which I called data1:
>
>     X = split(data1, Product_name)
>
>     2.
>
>     Unlist X
>
>     X1 = unlist(X)
>
>     3.
>
>     Create a new data frame
>
>     new_df = as.data.frame(X1)
>
>
> But, when I used the command View(new_df), I had only two columns: The left
> one is similar to TargetBrand.Sales, etc. and the right one is just
"X1"
>
> I did not achieve what I wanted.
>
> **A potentially big question from readers:*
>
> Why am I doing this?
>
> *Answer:*
>
> I want to run a multiple regression model later to see among different
> regions, what the sales look like for these three brands of tape:
>
> *Does Mid-west buy more house brand than East Coast?*
>
> or
>
> *Does region really affect the sales? Are Mid-West's purchases similar
to
> those of East Coast and West Coast?*
>
> I need help. Please give me guidance.
>
> Sincerely,
> Hugh N
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David L Carlson

2018-Jun-01 19:09 UTC

head link

[R] Regroup and create new dataframe

Responses should be copied to r-help using ReplyAll. You are still sending html
formatted emails. If you are using Microsoft Outlook, click the Format Text tab
and select ?Aa Plain Text?. No one has asked you to reveal the data set, only to
create one with a similar structure. Is the data I sent reasonably close? What
should it look like after it is transformed?

David C

From: nguy2952 University of Minnesota <nguy2952 at umn.edu> 
Sent: Friday, June 1, 2018 1:57 PM
To: David L Carlson <dcarlson at tamu.edu>
Subject: Re: [R] Regroup and create new dataframe

Hi,
This is not an assignment for school.
This is a project at WORK.?
I am not allowed to reveal the dataset.

Thanks!

On Fri, Jun 1, 2018 at 1:55 PM, David L Carlson <mailto:dcarlson at
tamu.edu> wrote:
Your question raises several issues. First, we do not do homework here, so if
this is an assignment, you will not get much help. Second, you need to send your
emails as plain text, not html. Third, you need to provide a reproducible
example and send your data using dput() so that we can follow what you have
tried so far. For example, here's a data set that resembles what you have
described:

set.seed(42)
Tape <- data.frame(Year=2011:2015, Product=rep(c("Target",
"3M", "Avery"),
? ? ?each=5), Sales=sample(1000:2000, 15), Region=rep(c("North",
"South",
? ? ?"West"), each=5), stringsAsFactors=FALSE)
dput(Tape)
structure(list(Year = c(2011L, 2012L, 2013L, 2014L, 2015L, 2011L, 
2012L, 2013L, 2014L, 2015L, 2011L, 2012L, 2013L, 2014L, 2015L
), Product = c("Target", "Target", "Target",
"Target", "Target",
"3M", "3M", "3M", "3M", "3M",
"Avery", "Avery", "Avery", "Avery",
"Avery"), Sales = c(1915L, 1937L, 1285L, 1828L, 1639L, 1517L, 
1732L, 1133L, 1652L, 1699L, 1453L, 1711L, 1924L, 1252L, 1456L
), Region = c("North", "North", "North",
"North", "North", "South",
"South", "South", "South", "South",
"West", "West", "West", "West",
"West")), .Names = c("Year", "Product",
"Sales", "Region"), row.names = c(NA,
-15L), class = "data.frame")

It is not clear what you want in your new data frame. This one has 5 years of
data for each tape brand and you seem to want one row for each tape brand?
Tables created in html and then sent to a plain text mailing list can be
dramatically different from the original format. It is not clear that you cannot
answer your questions from the data as presented here. Look at the results of
unlist(split(Tape, Tape$Product)). You should see that this is nowhere near what
you described.

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: R-help <mailto:r-help-bounces at r-project.org> On Behalf Of
nguy2952 University of Minnesota
Sent: Friday, June 1, 2018 10:54 AM
To: mailto:r-help at r-project.org
Subject: [R] Regroup and create new dataframe

Hello folks,

I have a big project to work on and the dataset is classified so I am just going
to use my own example so everyone can understand what I am targeting.

Let's take Target as an example: We consider three brands of tape: Target
brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
Product_Name(which contains three brands of tape), Sales, and Region. I want to
create a new data frame that looks like this:

? ? ? ? ? ? ? ? ? ? ? Year of Record? ? ? ?Sales? ? ?Region
? Target Brand
? 3M
? Avery

Here is what I did.

? ?1.

? ?I split the original data frame which I called data1:

? ?X = split(data1, Product_name)

? ?2.

? ?Unlist X

? ?X1 = unlist(X)

? ?3.

? ?Create a new data frame

? ?new_df = as.data.frame(X1)


But, when I used the command View(new_df), I had only two columns: The left one
is similar to TargetBrand.Sales, etc. and the right one is just "X1"

I did not achieve what I wanted.

**A potentially big question from readers:*

Why am I doing this?

*Answer:*

I want to run a multiple regression model later to see among different regions,
what the sales look like for these three brands of tape:

*Does Mid-west buy more house brand than East Coast?*

or

*Does region really affect the sales? Are Mid-West's purchases similar to
those of East Coast and West Coast?*

I need help. Please give me guidance.

Sincerely,
Hugh N

? ? ? ? [[alternative HTML version deleted]]

______________________________________________
mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=LnVfoBf5smekeCFlal5rmpELFRoDrB3H3ij_lZJRy0w&s=nPZ3F6nROsY3KM0z7y6ixAAYLjMGVhEZyuXMi3bg0rY&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=LnVfoBf5smekeCFlal5rmpELFRoDrB3H3ij_lZJRy0w&s=XNpVm_6i2GLFLk3FzpM9T15aezUet1BA0FlapuVXdmc&eand
provide commented, minimal, self-contained, reproducible code.

David L Carlson

2018-Jun-01 19:35 UTC

head link

[R] Regroup and create new dataframe

No html!, Copy the list using Reply-All. 

The data frame group_PrivateLabel does not contain variables called Product_Name
or Region.

David C

From: nguy2952 University of Minnesota <nguy2952 at umn.edu> 
Sent: Friday, June 1, 2018 2:13 PM
To: David L Carlson <dcarlson at tamu.edu>
Subject: Re: [R] Regroup and create new dataframe

Hi David,
your example is perfect!
I am still learning so please stay with me.
So, I am running a regression model:
model1 = lm(MarginDollars ~ Region + Product_Name, group_PrivateLabel)
I have an error message:?
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :?
? contrasts can be applied only to factors with 2 or more levels
> str(group_PrivateLabel)'data.frame':	14802 obs. of? 12 variables:
?$ ACCTG_YEAR_KEY? ? ? ? : int? 2019 2019 2019 2019 2019 2019 2019 2019 2019
2019 ...
?$ ITEM_CATEGORY_DESCR? ?: Factor w/ 462 levels "ABRASIVE AND POLISHING
MATERIAL",..: 145 145 145 145 145 145 145 145 145 145 ...
?$ ITEM_DESCR? ? ? ? ? ? : Factor w/ 12319 levels "@EASE X-RAY BADGE QTRLY
SRVC",..: 8263 8263 8263 8263 8263 8263 8263 8263 8263 8264 ...
?$ PRODUCT_SUB_LINE_DESCR: Factor w/ 3 levels
"Handpieces","PRIVATE LABEL",..: 2 2 2 2 2 2 2 2 2 2 ...
?$ MAJOR_CATEGORY_DESCR? : Factor w/ 25 levels "AIR ABRASION",..: 4 4
4 4 4 4 4 4 4 4 ...
?$ CUST_BRANCH_DESCR? ? ?: Factor w/ 60 levels "ALBUQUERQUE",..: 58 35
24 55 8 22 22 46 46 35 ...
?$ CUST_STATE_KEY? ? ? ? : Factor w/ 52 levels
"AK","AL","AR",..: 15 49 44 16 6 6 28 6 6 49 ...
?$ CUST_REGION_DESCR? ? ?: Factor w/ 7 levels "MOUNTAIN WEST
REGION",..: 2 2 5 4 6 6 6 6 6 2 ...
?$ Sales? ? ? ? ? ? ? ? ?: num? 25.9 13.5 28.5 28.5 57 ...
?$ QtySold? ? ? ? ? ? ? ?: int? 2 1 2 2 5 2 1 3 3 1 ...
?$ MFGCOST? ? ? ? ? ? ? ?: num? 13.2 6.6 13.2 13.2 33 13.2 6.6 19.8 19.8 6.6 ...
?$ MarginDollars? ? ? ? ?: num? 11.72 6.43 14.28 14.28 21.45 ...

What can I do?
Everything seems to fit perfectly to what I learned at school.
I am just working on a real-life huge data set. The regression model should
work.

Please help.

On Fri, Jun 1, 2018 at 2:09 PM, David L Carlson <mailto:dcarlson at
tamu.edu> wrote:
Responses should be copied to r-help using ReplyAll. You are still sending html
formatted emails. If you are using Microsoft Outlook, click the Format Text tab
and select ?Aa Plain Text?. No one has asked you to reveal the data set, only to
create one with a similar structure. Is the data I sent reasonably close? What
should it look like after it is transformed?

David C

From: nguy2952 University of Minnesota <mailto:nguy2952 at umn.edu> 
Sent: Friday, June 1, 2018 1:57 PM
To: David L Carlson <mailto:dcarlson at tamu.edu>
Subject: Re: [R] Regroup and create new dataframe

Hi,
This is not an assignment for school.
This is a project at WORK.?
I am not allowed to reveal the dataset.

Thanks!

On Fri, Jun 1, 2018 at 1:55 PM, David L Carlson <mailto:mailto:dcarlson at
tamu.edu> wrote:
Your question raises several issues. First, we do not do homework here, so if
this is an assignment, you will not get much help. Second, you need to send your
emails as plain text, not html. Third, you need to provide a reproducible
example and send your data using dput() so that we can follow what you have
tried so far. For example, here's a data set that resembles what you have
described:

set.seed(42)
Tape <- data.frame(Year=2011:2015, Product=rep(c("Target",
"3M", "Avery"),
? ? ?each=5), Sales=sample(1000:2000, 15), Region=rep(c("North",
"South",
? ? ?"West"), each=5), stringsAsFactors=FALSE)
dput(Tape)
structure(list(Year = c(2011L, 2012L, 2013L, 2014L, 2015L, 2011L, 
2012L, 2013L, 2014L, 2015L, 2011L, 2012L, 2013L, 2014L, 2015L
), Product = c("Target", "Target", "Target",
"Target", "Target",
"3M", "3M", "3M", "3M", "3M",
"Avery", "Avery", "Avery", "Avery",
"Avery"), Sales = c(1915L, 1937L, 1285L, 1828L, 1639L, 1517L, 
1732L, 1133L, 1652L, 1699L, 1453L, 1711L, 1924L, 1252L, 1456L
), Region = c("North", "North", "North",
"North", "North", "South",
"South", "South", "South", "South",
"West", "West", "West", "West",
"West")), .Names = c("Year", "Product",
"Sales", "Region"), row.names = c(NA,
-15L), class = "data.frame")

It is not clear what you want in your new data frame. This one has 5 years of
data for each tape brand and you seem to want one row for each tape brand?
Tables created in html and then sent to a plain text mailing list can be
dramatically different from the original format. It is not clear that you cannot
answer your questions from the data as presented here. Look at the results of
unlist(split(Tape, Tape$Product)). You should see that this is nowhere near what
you described.

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: R-help <mailto:mailto:r-help-bounces at r-project.org> On Behalf Of
nguy2952 University of Minnesota
Sent: Friday, June 1, 2018 10:54 AM
To: mailto:mailto:r-help at r-project.org
Subject: [R] Regroup and create new dataframe

Hello folks,

I have a big project to work on and the dataset is classified so I am just going
to use my own example so everyone can understand what I am targeting.

Let's take Target as an example: We consider three brands of tape: Target
brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
Product_Name(which contains three brands of tape), Sales, and Region. I want to
create a new data frame that looks like this:

? ? ? ? ? ? ? ? ? ? ? Year of Record? ? ? ?Sales? ? ?Region
? Target Brand
? 3M
? Avery

Here is what I did.

? ?1.

? ?I split the original data frame which I called data1:

? ?X = split(data1, Product_name)

? ?2.

? ?Unlist X

? ?X1 = unlist(X)

? ?3.

? ?Create a new data frame

? ?new_df = as.data.frame(X1)

But, when I used the command View(new_df), I had only two columns: The left one
is similar to TargetBrand.Sales, etc. and the right one is just "X1"

I did not achieve what I wanted.

**A potentially big question from readers:*

Why am I doing this?

*Answer:*

I want to run a multiple regression model later to see among different regions,
what the sales look like for these three brands of tape:

*Does Mid-west buy more house brand than East Coast?*

or

*Does region really affect the sales? Are Mid-West's purchases similar to
those of East Coast and West Coast?*

I need help. Please give me guidance.

Sincerely,
Hugh N

? ? ? ? [[alternative HTML version deleted]]

______________________________________________
mailto:mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=LnVfoBf5smekeCFlal5rmpELFRoDrB3H3ij_lZJRy0w&s=nPZ3F6nROsY3KM0z7y6ixAAYLjMGVhEZyuXMi3bg0rY&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=LnVfoBf5smekeCFlal5rmpELFRoDrB3H3ij_lZJRy0w&s=XNpVm_6i2GLFLk3FzpM9T15aezUet1BA0FlapuVXdmc&eand
provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Jun 2018 - Regroup and create new dataframe

[R] Regroup and create new dataframe

[R] Regroup and create new dataframe

[R] Regroup and create new dataframe

[R] Regroup and create new dataframe

[R] Regroup and create new dataframe

Apparently Analagous Threads