Hi All I have a datafram which looks like this : CustomerID DietType 1 a 1 c 1 b 2 f 2 a 3 j 4 c 4 c 4 f And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this :> MyDfCustomerID DietType DietType DietType 1 a c b 2 f a 3 j 4 c c f I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing : # 1) Adding new column with unique values: MyDf $newcol <- c (1:9) #2) then : NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) This produces the desired table but with many NA values like this : CustomerID 1 2 3 4 5 6 7 8 9 1 a c b NA NA NA NA NA NA 2 NA NA NA f a NA NA NA NA 3 NA NA NA NA NA j NA NA NA 4 NA NA NA NA NA NA c c f As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC. What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover? Regards Allaisnoe [[alternative HTML version deleted]]
I believe you need to spend time with an R tutorial or two: a data frame (presumably the "table" data structure you describe) can *not* contain "blanks" -- all columns must be the same length, which means NA's are filled in as needed. Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any local version of R (maybe it would in some server version -- others more knowledgeable should comment on this). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at hotmail.com> wrote:> Hi All > > I have a datafram which looks like this : > > CustomerID DietType > 1 a > 1 c > 1 b > 2 f > 2 a > 3 j > 4 c > 4 c > 4 f > > And I would like to reshape this so I can see the list of DietTypes per > customer in rows instead of columns like this : > > > MyDf > CustomerID DietType DietType DietType > 1 a c b > 2 f a > 3 j > 4 c c f > > I tried many times using melt(),spread (),and dcast () functions but was > not able to produce the desired table. The best attempt was by typing : > > # 1) Adding new column with unique values: > MyDf $newcol <- c (1:9) > #2) then : > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) > > This produces the desired table but with many NA values like this : > > CustomerID 1 2 3 4 5 6 7 8 9 > 1 a c b NA NA NA NA NA NA > 2 NA NA NA f a NA NA NA NA > 3 NA NA NA NA NA j NA NA NA > 4 NA NA NA NA NA NA c c f > > As you see, the lette/s indicating DietType move to the right side each > time we move down leaving many NA values and as my original files is very > large, I expect that the final output would contain around 800,000 columns > and 70,000 rows. This is why my code works with small data but does not > work with my large file because of memory issue even though I'm using large > PC. > > What changes I need to do with my code to produce the desired table where > the list of DietTypes are grouped in rows exactly like the second table > shown abover? > > Regards > Allaisnoe > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Allaisone, I took a slightly different approach but you might find this either as or more useful than your approach, or at least a start on the path to a solution you need. df1 <- data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"), stringsAsFactors=FALSE) custs <- unique(df1$CustId) dtype <- unique(df1$DietType) nc <- length(custs) nd <- length(dtype) df2 <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc), stringsAsFactors=FALSE) colnames(df2) <- c("CustId",dtype[order(dtype)]) df2$CustId <- custs[ order(custs) ] for ( i in 1:nrow(df1) ) { iRow <- match(df1$CustId[i],df2$CustId) iCol <- match(df1$DietType[i],colnames(df2)) df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1 }> df2# CustId a b c f j # 1 1 1 1 1 0 0 # 2 2 0 0 0 0 0 # 3 3 1 0 0 0 1 # 4 4 0 0 2 1 0 The dataframe df2 will have a column for the CustId and one column for each unique diet type. Each row is a unique customerId, and each entry contains the number of times the given diet type occurred for that customer. I hope that helps, Eric On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> I believe you need to spend time with an R tutorial or two: a data frame > (presumably the "table" data structure you describe) can *not* contain > "blanks" -- all columns must be the same length, which means NA's are > filled in as needed. > > Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any > local version of R (maybe it would in some server version -- others more > knowledgeable should comment on this). > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at hotmail.com> > wrote: > > > Hi All > > > > I have a datafram which looks like this : > > > > CustomerID DietType > > 1 a > > 1 c > > 1 b > > 2 f > > 2 a > > 3 j > > 4 c > > 4 c > > 4 f > > > > And I would like to reshape this so I can see the list of DietTypes per > > customer in rows instead of columns like this : > > > > > MyDf > > CustomerID DietType DietType DietType > > 1 a c b > > 2 f a > > 3 j > > 4 c c f > > > > I tried many times using melt(),spread (),and dcast () functions but was > > not able to produce the desired table. The best attempt was by typing : > > > > # 1) Adding new column with unique values: > > MyDf $newcol <- c (1:9) > > #2) then : > > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) > > > > This produces the desired table but with many NA values like this : > > > > CustomerID 1 2 3 4 5 6 7 8 9 > > 1 a c b NA NA NA NA NA NA > > 2 NA NA NA f a NA NA NA NA > > 3 NA NA NA NA NA j NA NA NA > > 4 NA NA NA NA NA NA c c f > > > > As you see, the lette/s indicating DietType move to the right side each > > time we move down leaving many NA values and as my original files is very > > large, I expect that the final output would contain around 800,000 > columns > > and 70,000 rows. This is why my code works with small data but does not > > work with my large file because of memory issue even though I'm using > large > > PC. > > > > What changes I need to do with my code to produce the desired table where > > the list of DietTypes are grouped in rows exactly like the second table > > shown abover? > > > > Regards > > Allaisnoe > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Allaisone, If you want a data frame as the output you will have to put up with a few NA values unless each Customer has the same number of diet types: a1df<-read.table(text="CustomerID DietType 1 a 1 c 1 b 2 f 2 a 3 j 4 c 4 c 4 f", header=TRUE,stringsAsFactors=FALSE) library(prettyR) stretch_df(a1df,"CustomerID","DietType") CustomerID DietType_1 DietType_2 DietType_3 1 1 a c b 2 2 f a <NA> 3 3 j <NA> <NA> 4 4 c c f Jim On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <allaisone1 at hotmail.com> wrote:> Hi All > > I have a datafram which looks like this : > > CustomerID DietType > 1 a > 1 c > 1 b > 2 f > 2 a > 3 j > 4 c > 4 c > 4 f > > And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this : > >> MyDf > CustomerID DietType DietType DietType > 1 a c b > 2 f a > 3 j > 4 c c f > > I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing : > > # 1) Adding new column with unique values: > MyDf $newcol <- c (1:9) > #2) then : > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) > > This produces the desired table but with many NA values like this : > > CustomerID 1 2 3 4 5 6 7 8 9 > 1 a c b NA NA NA NA NA NA > 2 NA NA NA f a NA NA NA NA > 3 NA NA NA NA NA j NA NA NA > 4 NA NA NA NA NA NA c c f > > As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC. > > What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover? > > Regards > Allaisnoe > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Many thanks all for your very helpful replays. I wasn't aware by stretch_df function in PrettyR package!.. It has produced exactly what I want in a very simple and efficient way!. Thank you so much Jim. Kind Regards Allaisone ________________________________________ From: Jim Lemon <drjimlemon at gmail.com> Sent: 25 February 2018 21:48:51 To: Allaisone 1; r-help mailing list Subject: Re: [R] reshaping column items into rows per unique ID Hi Allaisone, If you want a data frame as the output you will have to put up with a few NA values unless each Customer has the same number of diet types: a1df<-read.table(text="CustomerID DietType 1 a 1 c 1 b 2 f 2 a 3 j 4 c 4 c 4 f", header=TRUE,stringsAsFactors=FALSE) library(prettyR) stretch_df(a1df,"CustomerID","DietType") CustomerID DietType_1 DietType_2 DietType_3 1 1 a c b 2 2 f a <NA> 3 3 j <NA> <NA> 4 4 c c f Jim On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <allaisone1 at hotmail.com> wrote:> Hi All > > I have a datafram which looks like this : > > CustomerID DietType > 1 a > 1 c > 1 b > 2 f > 2 a > 3 j > 4 c > 4 c > 4 f > > And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this : > >> MyDf > CustomerID DietType DietType DietType > 1 a c b > 2 f a > 3 j > 4 c c f > > I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing : > > # 1) Adding new column with unique values: > MyDf $newcol <- c (1:9) > #2) then : > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) > > This produces the desired table but with many NA values like this : > > CustomerID 1 2 3 4 5 6 7 8 9 > 1 a c b NA NA NA NA NA NA > 2 NA NA NA f a NA NA NA NA > 3 NA NA NA NA NA j NA NA NA > 4 NA NA NA NA NA NA c c f > > As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC. > > What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover? > > Regards > Allaisnoe > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.