thr3ads.net - R help - [R] Help with data management [Feb 2017]

If this information is useful, please help other people find it:
Share via:

David L Carlson

2017-Feb-24 15:40 UTC

[R] Help with data management

You can also combine the data frames into a single one and use xtabs:

ID <- names(mylist)
mylist <- Map(data.frame, mylist, dfn=ID)
mydf <- do.call(rbind, mylist)
mydf$Family <- factor(mydf$Family, levels=sort(levels(mydf$Family)))
xtabs(Hits~Family+dfn, mydf)
#       dfn
# Family  A  B  C
#      a  0  3  0
#      c  1  1  0
#      d  2  0  0
#      e  3  0  0
#      f  0  4  5
#      o  0  0  4
#      q  0  0 10


-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
Sent: Thursday, February 23, 2017 6:00 PM
To: Andr? Luis Neves <andrluis at ualberta.ca>; r-help mailing list
<r-help at r-project.org>
Subject: Re: [R] Help with data management

Hi Andre,
As far as I am aware, merges can only be accomplished between two data
frames, so I think you would have to do it one by one. It is probably
possible to program this to operate on your list of data frames, but I
suspect that it would take as much time as a bit of copying and
pasting. If your data is being extracted from an external database, it
may be possible to perform the operation in SQL, I don't have the time
to work that out at the moment.

Jim


On Fri, Feb 24, 2017 at 10:53 AM, Andr? Luis Neves <andrluis at
ualberta.ca> wrote:> Hi, Jim:
>
> Your code worked great, but I have 48 dataframes. After merging A and B in
> D, you merged C in D. In this case, do I need to add them one by one until
> getting the 48 dataframes merged in one?
>
> Thank you for your great help.
>
> Andre
>
> On Thu, Feb 23, 2017 at 4:24 PM, Jim Lemon <drjimlemon at gmail.com>
wrote:
>>
>> Hi Andre,
>> This might do it:
>>
>> A<-data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
>> colnames(A) <- c ("Family", "NormalizedCount",
"Hits")
>> B<-data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
>> colnames(B) <- c ("Family", "NormalizedCount",
"Hits")
>> C<-data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
>> colnames(C) <- c ("Family", "NormalizedCount",
"Hits")
>> keepcols<-c("Family","Hits")
>> D<-merge(A[,keepcols],B[,keepcols],by="Family",all=TRUE)
>> D<-merge(D,C[,keepcols],by="Family",all=TRUE)
>> D[,2:4]<-sapply(D[,-1],function(x) { x[is.na(x)]<-0; x })
>>
names(D)<-c("Family","A","B","C")
>>
>> Jim
>>
>>
>> On Fri, Feb 24, 2017 at 9:37 AM, Andr? Luis Neves <andrluis at
ualberta.ca>
>> wrote:
>> > Dear R users,
>> >
>> > I have the following dataframes (A, B, and C) stored in a list:
>> >
>> > A= data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
>> > colnames(A) <- c ("Family",
"NormalizedCount", "Hits")
>> > A
>> >
>> >
>> > B= data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
>> > colnames(B) <- c ("Family",
"NormalizedCount", "Hits")
>> > B
>> >
>> >
>> > C= data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
>> > colnames(C) <- c ("Family",
"NormalizedCount", "Hits")
>> > C
>> >
>> > mylist <- list(A=A,B=B,C=C)
>> > mylist
>> >
>> >
>> > My idea is to merge the three dataframes into another dataframe
(let's
>> > name
>> > it: 'D')  with a structure in which the rows are the
Families and
>> > columns
>> > the "Hits" of each family detected in the dataframes A,
B, and C. If a
>> > given 'Family' does NOT have a 'Hit' in the
dataframe we need to assign
>> > number 0 to it.
>> >
>> > The dataframe 'D' would need to be populated as follows:
>> >
>> >
>> > Family                                                      A
>> >        B                                      C
>> > c 1 1 0
>> > d 2 0 0
>> > e 3 0 0
>> > f 0 4 5
>> > a 0 3 0
>> > q 0 0 10
>> > o 0 0 4
>> >
>> >
>> > Thank you very much for your great help,
>> >
>> >
>> >
>> > --
>> > Andre
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Andre
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

André Luis Neves

2017-Feb-24 16:13 UTC

head link

[R] Help with data management

Hi, David:

Thank you so much for your answer.

I just added some commands and got what I wanted.

The final command would be something like this:


A= data.frame(c("c", "d", "e"),4.4:6.8,c(1,2,3))
colnames(A) <- c ("Family", "NormalizedCount",
"Hits")
A
B= data.frame(c("c", "f", "a"),c(3.2,6.4, 4.4),
c(1,4,3))
colnames(B) <- c ("Family", "NormalizedCount",
"Hits")
B
C= data.frame(c("q", "o", "f"),c(7.2,9.4, 41.4),
c(10,4,5))
colnames(C) <- c ("Family", "NormalizedCount",
"Hits")
C
mylist <- list(A=A,B=B,C=C)
mylist
ID <- names(mylist)
mylist <- Map(data.frame, mylist, dfn=ID)
mydf <- do.call(rbind, mylist)
mydf$Family <- factor(mydf$Family, levels=sort(levels(mydf$Family)))
z <- xtabs(Hits~Family+dfn, mydf)
x <- as.data.frame(z)
x
library(reshape2)
y <- dcast(x, Family ~ dfn, value.var = "Freq")
y


Thank you very much.

Andre


On Fri, Feb 24, 2017 at 8:40 AM, David L Carlson <dcarlson at tamu.edu>
wrote:
> You can also combine the data frames into a single one and use xtabs:
>
> ID <- names(mylist)
> mylist <- Map(data.frame, mylist, dfn=ID)
> mydf <- do.call(rbind, mylist)
> mydf$Family <- factor(mydf$Family, levels=sort(levels(mydf$Family)))
> xtabs(Hits~Family+dfn, mydf)
> #       dfn
> # Family  A  B  C
> #      a  0  3  0
> #      c  1  1  0
> #      d  2  0  0
> #      e  3  0  0
> #      f  0  4  5
> #      o  0  0  4
> #      q  0  0 10
>
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim
Lemon
> Sent: Thursday, February 23, 2017 6:00 PM
> To: Andr? Luis Neves <andrluis at ualberta.ca>; r-help mailing list
<
> r-help at r-project.org>
> Subject: Re: [R] Help with data management
>
> Hi Andre,
> As far as I am aware, merges can only be accomplished between two data
> frames, so I think you would have to do it one by one. It is probably
> possible to program this to operate on your list of data frames, but I
> suspect that it would take as much time as a bit of copying and
> pasting. If your data is being extracted from an external database, it
> may be possible to perform the operation in SQL, I don't have the time
> to work that out at the moment.
>
> Jim
>
>
> On Fri, Feb 24, 2017 at 10:53 AM, Andr? Luis Neves <andrluis at
ualberta.ca>
> wrote:
> > Hi, Jim:
> >
> > Your code worked great, but I have 48 dataframes. After merging A and
B
> in
> > D, you merged C in D. In this case, do I need to add them one by one
> until
> > getting the 48 dataframes merged in one?
> >
> > Thank you for your great help.
> >
> > Andre
> >
> > On Thu, Feb 23, 2017 at 4:24 PM, Jim Lemon <drjimlemon at
gmail.com> wrote:
> >>
> >> Hi Andre,
> >> This might do it:
> >>
> >> A<-data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
> >> colnames(A) <- c ("Family",
"NormalizedCount", "Hits")
> >> B<-data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
> >> colnames(B) <- c ("Family",
"NormalizedCount", "Hits")
> >> C<-data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
> >> colnames(C) <- c ("Family",
"NormalizedCount", "Hits")
> >> keepcols<-c("Family","Hits")
> >>
D<-merge(A[,keepcols],B[,keepcols],by="Family",all=TRUE)
> >> D<-merge(D,C[,keepcols],by="Family",all=TRUE)
> >> D[,2:4]<-sapply(D[,-1],function(x) { x[is.na(x)]<-0; x })
> >>
names(D)<-c("Family","A","B","C")
> >>
> >> Jim
> >>
> >>
> >> On Fri, Feb 24, 2017 at 9:37 AM, Andr? Luis Neves <andrluis at
ualberta.ca
> >
> >> wrote:
> >> > Dear R users,
> >> >
> >> > I have the following dataframes (A, B, and C) stored in a
list:
> >> >
> >> > A= data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
> >> > colnames(A) <- c ("Family",
"NormalizedCount", "Hits")
> >> > A
> >> >
> >> >
> >> > B= data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
> >> > colnames(B) <- c ("Family",
"NormalizedCount", "Hits")
> >> > B
> >> >
> >> >
> >> > C= data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
> >> > colnames(C) <- c ("Family",
"NormalizedCount", "Hits")
> >> > C
> >> >
> >> > mylist <- list(A=A,B=B,C=C)
> >> > mylist
> >> >
> >> >
> >> > My idea is to merge the three dataframes into another
dataframe (let's
> >> > name
> >> > it: 'D')  with a structure in which the rows are the
Families and
> >> > columns
> >> > the "Hits" of each family detected in the
dataframes A, B, and C. If a
> >> > given 'Family' does NOT have a 'Hit' in the
dataframe we need to
> assign
> >> > number 0 to it.
> >> >
> >> > The dataframe 'D' would need to be populated as
follows:
> >> >
> >> >
> >> > Family                                                      A
> >> >        B                                      C
> >> > c 1 1 0
> >> > d 2 0 0
> >> > e 3 0 0
> >> > f 0 4 5
> >> > a 0 3 0
> >> > q 0 0 10
> >> > o 0 0 4
> >> >
> >> >
> >> > Thank you very much for your great help,
> >> >
> >> >
> >> >
> >> > --
> >> > Andre
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible
code.
> >
> >
> >
> >
> > --
> > Andre
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Andre

	[[alternative HTML version deleted]]

David L Carlson

2017-Feb-24 18:00 UTC

head link

[R] Help with data management

You can also get there without reshape2:

z <- xtabs(Hits~Family+dfn, mydf)
x <- as.data.frame.matrix(z) # Convert the table without changing the format
y <- data.frame(Family=dimnames(z)$Family, as.data.frame.matrix(z)) # Add
Family column
rownames(y) <- NULL # Optional, but it replaces the rownames numbers
str(y)
# data.frame':   7 obs. of  4 variables:
#  $ Family: Factor w/ 7 levels
"a","c","d","e",..: 1 2 3 4 5 6 7
#  $ A     : num  0 1 2 3 0 0 0
#  $ B     : num  3 1 0 0 4 0 0
#  $ C     : num  0 0 0 0 5 4 10

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



From: Andr? Luis Neves [mailto:andrluis at ualberta.ca] 
Sent: Friday, February 24, 2017 10:14 AM
To: David L Carlson <dcarlson at tamu.edu>
Cc: Jim Lemon <drjimlemon at gmail.com>; r-help mailing list <r-help at
r-project.org>
Subject: Re: [R] Help with data management

Hi, David:

Thank you so much for your answer.

I just added some commands and got what I wanted.

The final command would be something like this:


A= data.frame(c("c", "d", "e"),4.4:6.8,c(1,2,3))
colnames(A) <- c ("Family", "NormalizedCount",
"Hits")?
A?
B= data.frame(c("c", "f", "a"),c(3.2,6.4, 4.4),
c(1,4,3))?
colnames(B) <- c ("Family", "NormalizedCount",
"Hits")
B
C= data.frame(c("q", "o", "f"),c(7.2,9.4, 41.4),
c(10,4,5))?
colnames(C) <- c ("Family", "NormalizedCount",
"Hits")
C
mylist <- list(A=A,B=B,C=C)
mylist
ID <- names(mylist)
mylist <- Map(data.frame, mylist, dfn=ID)
mydf <- do.call(rbind, mylist)
mydf$Family <- factor(mydf$Family, levels=sort(levels(mydf$Family)))
z <- xtabs(Hits~Family+dfn, mydf)
x <- as.data.frame(z)
x
library(reshape2)
y <- dcast(x, Family ~ dfn, value.var = "Freq")
y


Thank you very much.

Andre


On Fri, Feb 24, 2017 at 8:40 AM, David L Carlson <dcarlson at tamu.edu>
wrote:
You can also combine the data frames into a single one and use xtabs:

ID <- names(mylist)
mylist <- Map(data.frame, mylist, dfn=ID)
mydf <- do.call(rbind, mylist)
mydf$Family <- factor(mydf$Family, levels=sort(levels(mydf$Family)))
xtabs(Hits~Family+dfn, mydf)
#? ? ? ?dfn
# Family? A? B? C
#? ? ? a? 0? 3? 0
#? ? ? c? 1? 1? 0
#? ? ? d? 2? 0? 0
#? ? ? e? 3? 0? 0
#? ? ? f? 0? 4? 5
#? ? ? o? 0? 0? 4
#? ? ? q? 0? 0 10


-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
Sent: Thursday, February 23, 2017 6:00 PM
To: Andr? Luis Neves <andrluis at ualberta.ca>; r-help mailing list
<r-help at r-project.org>
Subject: Re: [R] Help with data management

Hi Andre,
As far as I am aware, merges can only be accomplished between two data
frames, so I think you would have to do it one by one. It is probably
possible to program this to operate on your list of data frames, but I
suspect that it would take as much time as a bit of copying and
pasting. If your data is being extracted from an external database, it
may be possible to perform the operation in SQL, I don't have the time
to work that out at the moment.

Jim


On Fri, Feb 24, 2017 at 10:53 AM, Andr? Luis Neves <andrluis at
ualberta.ca> wrote:> Hi, Jim:
>
> Your code worked great, but I have 48 dataframes. After merging A and B in
> D, you merged C in D. In this case, do I need to add them one by one until
> getting the 48 dataframes merged in one?
>
> Thank you for your great help.
>
> Andre
>
> On Thu, Feb 23, 2017 at 4:24 PM, Jim Lemon <drjimlemon at gmail.com>
wrote:
>>
>> Hi Andre,
>> This might do it:
>>
>> A<-data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
>> colnames(A) <- c ("Family", "NormalizedCount",
"Hits")
>> B<-data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
>> colnames(B) <- c ("Family", "NormalizedCount",
"Hits")
>> C<-data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
>> colnames(C) <- c ("Family", "NormalizedCount",
"Hits")
>> keepcols<-c("Family","Hits")
>> D<-merge(A[,keepcols],B[,keepcols],by="Family",all=TRUE)
>> D<-merge(D,C[,keepcols],by="Family",all=TRUE)
>> D[,2:4]<-sapply(D[,-1],function(x) { x[is.na(x)]<-0; x })
>>
names(D)<-c("Family","A","B","C")
>>
>> Jim
>>
>>
>> On Fri, Feb 24, 2017 at 9:37 AM, Andr? Luis Neves <andrluis at
ualberta.ca>
>> wrote:
>> > Dear R users,
>> >
>> > I have the following dataframes (A, B, and C) stored in a list:
>> >
>> > A= data.frame(c("c", "d",
"e"),4.4:6.8,c(1,2,3))
>> > colnames(A) <- c ("Family",
"NormalizedCount", "Hits")
>> > A
>> >
>> >
>> > B= data.frame(c("c", "f",
"a"),c(3.2,6.4, 4.4), c(1,4,3))
>> > colnames(B) <- c ("Family",
"NormalizedCount", "Hits")
>> > B
>> >
>> >
>> > C= data.frame(c("q", "o",
"f"),c(7.2,9.4, 41.4), c(10,4,5))
>> > colnames(C) <- c ("Family",
"NormalizedCount", "Hits")
>> > C
>> >
>> > mylist <- list(A=A,B=B,C=C)
>> > mylist
>> >
>> >
>> > My idea is to merge the three dataframes into another dataframe
(let's
>> > name
>> > it: 'D')? with a structure in which the rows are the
Families and
>> > columns
>> > the "Hits" of each family detected in the dataframes A,
B, and C. If a
>> > given 'Family' does NOT have a 'Hit' in the
dataframe we need to assign
>> > number 0 to it.
>> >
>> > The dataframe 'D' would need to be populated as follows:
>> >
>> >
>> > Family? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? A
>> >? ? ? ? B? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? C
>> > c 1 1 0
>> > d 2 0 0
>> > e 3 0 0
>> > f 0 4 5
>> > a 0 3 0
>> > q 0 0 10
>> > o 0 0 4
>> >
>> >
>> > Thank you very much for your great help,
>> >
>> >
>> >
>> > --
>> > Andre
>> >
>> >? ? ? ? ?[[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Andre
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
Andre

R help - Feb 2017 - Help with data management

[R] Help with data management

[R] Help with data management

[R] Help with data management