thr3ads.net - R help - [R] reshaping column items into rows per unique ID [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Allaisone 1

2018-Feb-25 12:59 UTC

[R] reshaping column items into rows per unique ID

Hi All

I have a datafram which looks like this :

CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f

And I would like to reshape this so I can see the list of DietTypes per customer
in rows instead of columns like this :
> MyDfCustomerID      DietType   DietType  DietType
1                                a            c               b
2                                 f             a
3                                 j
4                                 c              c             f

I tried many times using melt(),spread (),and dcast () functions but was not
able to produce the desired table. The best attempt was by typing :

# 1) Adding new column with unique values:
MyDf $newcol <- c (1:9)
#2) then :
NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)

This produces the desired table but with many NA values like this :

CustomerID    1   2   3    4     5    6     7   8   9
1                    a  c    b   NA NA NA NA NA NA
2                  NA NA NA  f     a  NA NA NA NA
3                  NA NA NA NA NA  j   NA NA NA
4                  NA NA NA NA NA NA c     c     f

  As you see, the lette/s indicating DietType move to the right side each time
we move down leaving many NA values and as my original files is very large, I
expect that the final output would contain around 800,000 columns and 70,000
rows. This is why my code works with small data but does not work with my large
file because of memory issue even though I'm using large PC.

What changes I need to do with my code to produce the desired table where the
list of DietTypes are grouped in rows exactly like the second table shown
abover?

Regards
Allaisnoe

	[[alternative HTML version deleted]]

Bert Gunter

2018-Feb-25 17:08 UTC

head link

[R] reshaping column items into rows per unique ID

I believe you need to spend time with an R tutorial or two: a data frame
(presumably the "table" data structure you describe) can *not* contain
"blanks" -- all columns must be the same length, which means NA's
are
filled in as needed.

Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
local version of R (maybe it would in some server version -- others more
knowledgeable should comment on this).

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at hotmail.com>
wrote:
> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per
> customer in rows instead of columns like this :
>
> > MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was
> not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each
> time we move down leaving many NA values and as my original files is very
> large, I expect that the final output would contain around 800,000 columns
> and 70,000 rows. This is why my code works with small data but does not
> work with my large file because of memory issue even though I'm using
large
> PC.
>
> What changes I need to do with my code to produce the desired table where
> the list of DietTypes are grouped in rows exactly like the second table
> shown abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Eric Berger

2018-Feb-25 17:56 UTC

head link

[R] reshaping column items into rows per unique ID

Hi Allaisone,
I took a slightly different approach but you might find this either as or
more useful than your approach, or at least a start on the path to a
solution you need.

df1   <-
data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"),
                    stringsAsFactors=FALSE)
custs <- unique(df1$CustId)
dtype <- unique(df1$DietType)
nc    <- length(custs)
nd    <- length(dtype)
df2   <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc),
stringsAsFactors=FALSE)
colnames(df2) <- c("CustId",dtype[order(dtype)])
df2$CustId <- custs[ order(custs) ]

for ( i in 1:nrow(df1) ) {
  iRow <- match(df1$CustId[i],df2$CustId)
  iCol <- match(df1$DietType[i],colnames(df2))
  df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1
}
> df2#       CustId   a  b  c  f   j
# 1             1  1  1  1  0  0
# 2              2  0 0  0  0  0
# 3              3  1 0  0  0  1
# 4              4  0 0  2  1  0

The dataframe df2 will have a column for the CustId and one column for each
unique diet type.
Each row is a unique customerId, and each entry contains the number of
times the given diet type occurred for that customer.

I hope that helps,
Eric



On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> I believe you need to spend time with an R tutorial or two: a data frame
> (presumably the "table" data structure you describe) can *not*
contain
> "blanks" -- all columns must be the same length, which means
NA's are
> filled in as needed.
>
> Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
> local version of R (maybe it would in some server version -- others more
> knowledgeable should comment on this).
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
> On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at
hotmail.com>
> wrote:
>
> > Hi All
> >
> > I have a datafram which looks like this :
> >
> > CustomerID    DietType
> > 1                           a
> > 1                            c
> > 1                            b
> > 2                            f
> > 2                             a
> > 3                             j
> > 4                             c
> > 4                             c
> > 4                              f
> >
> > And I would like to reshape this so I can see the list of DietTypes
per
> > customer in rows instead of columns like this :
> >
> > > MyDf
> > CustomerID      DietType   DietType  DietType
> > 1                                a            c               b
> > 2                                 f             a
> > 3                                 j
> > 4                                 c              c             f
> >
> > I tried many times using melt(),spread (),and dcast () functions but
was
> > not able to produce the desired table. The best attempt was by typing
:
> >
> > # 1) Adding new column with unique values:
> > MyDf $newcol <- c (1:9)
> > #2) then :
> > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
> >
> > This produces the desired table but with many NA values like this :
> >
> > CustomerID    1   2   3    4     5    6     7   8   9
> > 1                    a  c    b   NA NA NA NA NA NA
> > 2                  NA NA NA  f     a  NA NA NA NA
> > 3                  NA NA NA NA NA  j   NA NA NA
> > 4                  NA NA NA NA NA NA c     c     f
> >
> >   As you see, the lette/s indicating DietType move to the right side
each
> > time we move down leaving many NA values and as my original files is
very
> > large, I expect that the final output would contain around 800,000
> columns
> > and 70,000 rows. This is why my code works with small data but does
not
> > work with my large file because of memory issue even though I'm
using
> large
> > PC.
> >
> > What changes I need to do with my code to produce the desired table
where
> > the list of DietTypes are grouped in rows exactly like the second
table
> > shown abover?
> >
> > Regards
> > Allaisnoe
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jim Lemon

2018-Feb-25 21:48 UTC

head link

[R] reshaping column items into rows per unique ID

Hi Allaisone,
If you want a data frame as the output you will have to put up with a
few NA values unless each Customer has the same number of diet types:

a1df<-read.table(text="CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f",
header=TRUE,stringsAsFactors=FALSE)
library(prettyR)
stretch_df(a1df,"CustomerID","DietType")
  CustomerID DietType_1 DietType_2 DietType_3
1          1          a          c          b
2          2          f          a       <NA>
3          3          j       <NA>       <NA>
4          4          c          c          f

Jim


On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <allaisone1 at hotmail.com>
wrote:> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per
customer in rows instead of columns like this :
>
>> MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was
not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each
time we move down leaving many NA values and as my original files is very large,
I expect that the final output would contain around 800,000 columns and 70,000
rows. This is why my code works with small data but does not work with my large
file because of memory issue even though I'm using large PC.
>
> What changes I need to do with my code to produce the desired table where
the list of DietTypes are grouped in rows exactly like the second table shown
abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Allaisone 1

2018-Mar-03 00:14 UTC

head link

[R] reshaping column items into rows per unique ID

Many thanks all for your very helpful replays.

I wasn't aware by stretch_df function in PrettyR package!.. It has produced
exactly what I want in a very simple and efficient way!. Thank you so much Jim.

Kind Regards
Allaisone

________________________________________
From: Jim Lemon <drjimlemon at gmail.com>
Sent: 25 February 2018 21:48:51
To: Allaisone 1; r-help mailing list
Subject: Re: [R] reshaping column items into rows per unique ID

Hi Allaisone,
If you want a data frame as the output you will have to put up with a
few NA values unless each Customer has the same number of diet types:

a1df<-read.table(text="CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f",
header=TRUE,stringsAsFactors=FALSE)
library(prettyR)
stretch_df(a1df,"CustomerID","DietType")
  CustomerID DietType_1 DietType_2 DietType_3
1          1          a          c          b
2          2          f          a       <NA>
3          3          j       <NA>       <NA>
4          4          c          c          f

Jim


On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <allaisone1 at hotmail.com>
wrote:> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per
customer in rows instead of columns like this :
>
>> MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was
not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each
time we move down leaving many NA values and as my original files is very large,
I expect that the final output would contain around 800,000 columns and 70,000
rows. This is why my code works with small data but does not work with my large
file because of memory issue even though I'm using large PC.
>
> What changes I need to do with my code to produce the desired table where
the list of DietTypes are grouped in rows exactly like the second table shown
abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2018 - reshaping column items into rows per unique ID

[R] reshaping column items into rows per unique ID

[R] reshaping column items into rows per unique ID

[R] reshaping column items into rows per unique ID

[R] reshaping column items into rows per unique ID

[R] reshaping column items into rows per unique ID

Apparently Analagous Threads