thr3ads.net - R help - [R] how to merge 5 data frames by one column [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Ana Marija

2019-Dec-03 20:09 UTC

[R] how to merge 5 data frames by one column

I can perhaps do this:

m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22, s33,s44,s55))

but than in the output of this one SNP (just for example)
> head(m)         rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
 V1.3
6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
ENSG00000141030
         V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
...

but how to filter out this output (m) in order to remove all rows where I
have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5





On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <sokovic.anamarija at gmail.com>
wrote:
> the desired output would look like this (example give just for two genes,
> it should include all 5 from all 5 data frames):
>
> where the example is if say only 5 rs are shared between those two genes,
> what is given after rs# is values from V4 column for each gene
>
> GENES ENSG00000001629 ENSG00000127914
> rs1208998 -0.0337989326337439  -0.00106024397995199
> rs4729008 0.0630831868839983  0.00890783698397027
> rs11772754 0.181375539335959  0.0012636115921931
> rs10257459 0.0369962603988132  0.00509887844657462
> rs17164876 0.0307882763321834  -0.00188979524322732
>
> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <sokovic.anamarija at
gmail.com>
> wrote:
>
>> Hello,
>>
>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>
>> > head(s11)
>>                V1.1                          rs         V3.1       
V4.1
>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>
>> > head(s22)
>>                V1.2                               rs        V3.2     
V4.2
>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>
>> Each one has one unique value in respective V1
>>
>> I am trying to merge all at once all 5 data frames by the
"rs" column.
>>
>> Can you please help with this,
>> Ana
>>
>>
>>
>>
>>
	[[alternative HTML version deleted]]

Ana Marija

2019-Dec-03 20:16 UTC

head link

[R] how to merge 5 data frames by one column

would this make sense for the previous:
mt=na.omit(m, cols =
c("V1.1","V1.2","V1.3","V1.4","V1.5"))

On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <sokovic.anamarija at gmail.com>
wrote:
> I can perhaps do this:
>
> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22, s33,s44,s55))
>
> but than in the output of this one SNP (just for example)
>
> > head(m)
>          rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>  V1.3
> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
> ENSG00000141030
>          V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
> ...
>
> but how to filter out this output (m) in order to remove all rows where I
> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
>
>
>
>
>
> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <sokovic.anamarija at
gmail.com>
> wrote:
>
>> the desired output would look like this (example give just for two
genes,
>> it should include all 5 from all 5 data frames):
>>
>> where the example is if say only 5 rs are shared between those two
genes,
>> what is given after rs# is values from V4 column for each gene
>>
>> GENES ENSG00000001629 ENSG00000127914
>> rs1208998 -0.0337989326337439  -0.00106024397995199
>> rs4729008 0.0630831868839983  0.00890783698397027
>> rs11772754 0.181375539335959  0.0012636115921931
>> rs10257459 0.0369962603988132  0.00509887844657462
>> rs17164876 0.0307882763321834  -0.00188979524322732
>>
>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <sokovic.anamarija at
gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>
>>> > head(s11)
>>>                V1.1                          rs         V3.1       
V4.1
>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>
>>> > head(s22)
>>>                V1.2                               rs        V3.2
>>>  V4.2
>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>
>>> Each one has one unique value in respective V1
>>>
>>> I am trying to merge all at once all 5 data frames by the
"rs" column.
>>>
>>> Can you please help with this,
>>> Ana
>>>
>>>
>>>
>>>
>>>
	[[alternative HTML version deleted]]

Ana Marija

2019-Dec-03 20:27 UTC

head link

[R] how to merge 5 data frames by one column

I apologize I would need to reformulate this problem because there will be
much more unique genes I have to look up, 381

so all genes or in one data frame
> head(r)               V1         V2          V3        V4
1 ENSG00000273172  rs7215271 4.33932e-17 -0.602316
2 ENSG00000273172 rs34889101 4.99518e-17 -0.596089
3 ENSG00000273172  rs4890177 4.23229e-17 -0.590085
4 ENSG00000273172  rs4890178 7.14216e-17 -0.581467
5 ENSG00000273172  rs7503363 3.16802e-17 -0.582836
6 ENSG00000273172 rs35611892 2.24399e-17 -0.583710
> tail(r)                   V1          V2          V3        V4
18946 ENSG00000141560    rs7215271 8.53890e-17  0.572286
18947 ENSG00000141560    rs606532 9.00740e-17  0.572151
18963 ENSG00000175711 rs111566282 5.71871e-17 -0.609586
18964 ENSG00000175711  rs76319775 4.58843e-17 -0.610164
18965 ENSG00000175711  rs62074661 4.17490e-17 -0.603199
18966 ENSG00000176845  rs11433639 1.45496e-17 -0.761955

So for the adobe example I would just have in result for merging this one
row: because they gave this same rs: rs7215271
and output would contain all columns related to those two genes which have
the same:  rs7215271

it can be also possible that I can find more than 2 genes sharing the same
rs.

Can you please advise about this




On Tue, Dec 3, 2019 at 2:16 PM Ana Marija <sokovic.anamarija at gmail.com>
wrote:
> would this make sense for the previous:
> mt=na.omit(m, cols =
c("V1.1","V1.2","V1.3","V1.4","V1.5"))
>
> On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <sokovic.anamarija at
gmail.com>
> wrote:
>
>> I can perhaps do this:
>>
>> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22,
>> s33,s44,s55))
>>
>> but than in the output of this one SNP (just for example)
>>
>> > head(m)
>>          rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>>  V1.3
>> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
>> ENSG00000141030
>>          V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
>> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
>> ...
>>
>> but how to filter out this output (m) in order to remove all rows where
I
>> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <sokovic.anamarija at
gmail.com>
>> wrote:
>>
>>> the desired output would look like this (example give just for two
>>> genes, it should include all 5 from all 5 data frames):
>>>
>>> where the example is if say only 5 rs are shared between those two
>>> genes, what is given after rs# is values from V4 column for each
gene
>>>
>>> GENES ENSG00000001629 ENSG00000127914
>>> rs1208998 -0.0337989326337439  -0.00106024397995199
>>> rs4729008 0.0630831868839983  0.00890783698397027
>>> rs11772754 0.181375539335959  0.0012636115921931
>>> rs10257459 0.0369962603988132  0.00509887844657462
>>> rs17164876 0.0307882763321834  -0.00188979524322732
>>>
>>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <sokovic.anamarija at
gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>>
>>>> > head(s11)
>>>>                V1.1                          rs         V3.1   
V4.1
>>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>>
>>>> > head(s22)
>>>>                V1.2                               rs       
V3.2
>>>>  V4.2
>>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>>
>>>> Each one has one unique value in respective V1
>>>>
>>>> I am trying to merge all at once all 5 data frames by the
"rs" column.
>>>>
>>>> Can you please help with this,
>>>> Ana
>>>>
>>>>
>>>>
>>>>
>>>>
	[[alternative HTML version deleted]]

David Winsemius

2019-Dec-03 20:42 UTC

head link

[R] how to merge 5 data frames by one column

On 12/3/19 12:16 PM, Ana Marija wrote:> would this make sense for the previous:
> mt=na.omit(m, cols =
c("V1.1","V1.2","V1.3","V1.4","V1.5"))
>
> On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <sokovic.anamarija at
gmail.com>
> wrote:
>
>> I can perhaps do this:
>>
>> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22,
s33,s44,s55))
>>
>> but than in the output of this one SNP (just for example)
>>
>>> head(m)
>>           rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>>   V1.3
>> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
>> ENSG00000141030
>>           V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
>> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA

It's a very simple matter when using gmail to adhere to the Posting 
Guide policy of plaintext submission to rhelp. Failing to adhere to that 
rule is making your successive posting less and less readable.
>> ...
>>
>> but how to filter out this output (m) in order to remove all rows where
I
>> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
The complete.cases function returns a logical vector suitable for 
selecting a subset.


-- 

David.
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <sokovic.anamarija at
gmail.com>
>> wrote:
>>
>>> the desired output would look like this (example give just for two
genes,
>>> it should include all 5 from all 5 data frames):
>>>
>>> where the example is if say only 5 rs are shared between those two
genes,
>>> what is given after rs# is values from V4 column for each gene
>>>
>>> GENES ENSG00000001629 ENSG00000127914
>>> rs1208998 -0.0337989326337439  -0.00106024397995199
>>> rs4729008 0.0630831868839983  0.00890783698397027
>>> rs11772754 0.181375539335959  0.0012636115921931
>>> rs10257459 0.0369962603988132  0.00509887844657462
>>> rs17164876 0.0307882763321834  -0.00188979524322732
>>>
>>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <sokovic.anamarija at
gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>>
>>>>> head(s11)
>>>>                 V1.1                          rs         V3.1  
V4.1
>>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>>
>>>>> head(s22)
>>>>                 V1.2                               rs       
V3.2
>>>>   V4.2
>>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>>
>>>> Each one has one unique value in respective V1
>>>>
>>>> I am trying to merge all at once all 5 data frames by the
"rs" column.
>>>>
>>>> Can you please help with this,
>>>> Ana
>>>>
>>>>
>>>>
>>>>
>>>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Dec 2019 - how to merge 5 data frames by one column

[R] how to merge 5 data frames by one column

[R] how to merge 5 data frames by one column

[R] how to merge 5 data frames by one column

[R] how to merge 5 data frames by one column