thr3ads.net - R help - [R] is there is a way to extract lines in between 3 files that are in common based on one column? [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Ana Marija

2020-Jun-02 01:50 UTC

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

Hi David,

that is a great point!
Yes indeed some are non unique:
> dim(neu1)
[1] 3742845       9> length(unique(neu1$Marker))
[1] 3741858> length(unique(nep1$Marker))
[1] 3745560> dim(nep1)
[1] 3746550       9> length(unique(ret1$Marker))
[1] 3743494> dim(ret1)[1] 3743494       9

How would I rewrite this code so that is merging by Chr and Marker
column? It seems that a Marker can be under a few Chr.





On Mon, Jun 1, 2020 at 8:41 PM David Winsemius <dwinsemius at comcast.net>
wrote:>
>
> On 6/1/20 5:40 PM, Ana Marija wrote:
> > Hi Jim,
> >
> > thank you so much for getting back to me. I tried your code and this
is
> > what I get:
> >> dim(neu2)
> > [1] 3740988       9
> >> dim(nep2)
> > [1] 3740988       9
> >> dim(ret2)
> > [1] 3740001       9
> >
> > I think I would need to have the same number of lines in all 3 data
frames.
> >
> > Can you please advise.
>
>
> You should check for duplicated Marker values.
>
>
> --
>
> David
>
> >
> > Cheers
> > Ana
> >
> > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
> >
> >> Hi Ana,
> >> Not too hard, but your example has all the "marker"
fields in common.
> >> So using a sample that will show the expected result:
> >>
> >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction 
pValue N
> >>   1 100000012 1:100000012:G:T 0.229925  T  G  + 0.650403 1594
> >>   1 100000827 1:100000827:C:T 0.287014  T  C  + 0.955449 1594
> >>   1 100002713 1:100002713:C:T 0.097867  T  C  - 0.290455 1594
> >>   1 100002882 1:100002882:T:G 0.287014  G  T  + 0.955449 1594
> >>   1 100002991 1:100002991:G:A 0.097867  A  G  - 0.290455 1594
> >>   1 100004726 1:100004726:G:A 0.132058  A  G  + 0.115005
1594",
> >>   header=TRUE,stringsAsFactors=FALSE)
> >>
> >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 Direction  
pValue N
> >>   1 100000012 1:100000012:G:T 0.2300430 T  G - 0.1420030 1641
> >>   1 100000827 1:100000827:C:T 0.2867150 T  C - 0.2045580 1641
> >>   1 100002713 1:100002713:C:T 0.0975015 T  C - 0.0555507 1641
> >>   1 100002882 1:100002882:T:G 0.2867150 G  T - 0.2045580 1641
> >>   1 100002991 1:100002991:G:A 0.0975015 A  G - 0.0555507 1641
> >>   1 100004726 1:100004727:G:A 0.1325410 A  G - 0.8725660
1641",
> >>   header=TRUE,stringsAsFactors=FALSE)
> >>
> >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction  
pValue N
> >>   1 100000012 1:100000012:G:T 0.2322760 T  G - 0.230383 1608
> >>   1 100000827 1:100000827:C:T 0.2882460 T  C - 0.120356 1608
> >>   1 100002713 1:100002713:C:T 0.0982587 T  C - 0.272936 1608
> >>   1 100002882 1:100002882:T:G 0.2882460 G  T - 0.120356 1608
> >>   1 100002991 1:100002992:G:A 0.0982587 A  G - 0.272936 1608
> >>   1 100004726 1:100004727:G:A 0.1340170 A  G - 0.594538
1608",
> >> header=TRUE,stringsAsFactors=FALSE)
> >>
> >> # merge the three data frames on "Marker"
> >> nn1<-merge(neu1,nep1,by="Marker")
> >> nn2<-merge(nn1,ret1,by="Marker")
> >> # get the common "Marker" strings
> >> Marker3<-nn2$Marker
> >> # subset all three data frames on Marker3
> >> neu2<-neu1[neu1$Marker %in% Marker3,]
> >> nep2<-nep1[nep1$Marker %in% Marker3,]
> >> ret2<-ret1[ret1$Marker %in% Marker3,]
> >>
> >> Jim
> >>
> >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija <sokovic.anamarija at
gmail.com>
> >> wrote:
> >>> Hello,
> >>>
> >>> I have 3 data frames which have about 3.4 mill lines (but they
don't have
> >>> exactly the same number of lines)...they look like this:
> >>> ...
> >>> Is there is a way to create another 3 data frames, say neu2,
nep2, ret2
> >>> which would only contain lines that have the same entries in
Marker
> >> column
> >>> for all 3 data frames?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>>          [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Ana Marija

2020-Jun-02 01:54 UTC

head link

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

Hi Jim
> neu3<-neu1[!(neu1$Marker %in% Marker3),]
> dim(neu3)
[1] 1857    9> nep3<-nep1[!(nep1$Marker %in% Marker3),]
> dim(nep3)
[1] 5562    9> ret3<-ret1[!(ret1$Marker %in% Marker3),]
> dim(ret3)[1] 3493    9


If I do:

 nn1<-merge(neu1,nep1,by=c("Marker","Chr"))
nn2<-merge(nn1,ret1,by=c("Marker","Chr"))> Marker3<-nn2$Marker
> length(Marker3)
[1] 3742962> Marker4<-nn1$Marker
> length(Marker4)[1] 3744443

On Mon, Jun 1, 2020 at 8:50 PM Ana Marija <sokovic.anamarija at gmail.com>
wrote:>
> Hi David,
>
> that is a great point!
> Yes indeed some are non unique:
>
> > dim(neu1)
> [1] 3742845       9
> > length(unique(neu1$Marker))
> [1] 3741858
> > length(unique(nep1$Marker))
> [1] 3745560
> > dim(nep1)
> [1] 3746550       9
> > length(unique(ret1$Marker))
> [1] 3743494
> > dim(ret1)
> [1] 3743494       9
>
> How would I rewrite this code so that is merging by Chr and Marker
> column? It seems that a Marker can be under a few Chr.
>
>
>
>
>
> On Mon, Jun 1, 2020 at 8:41 PM David Winsemius <dwinsemius at
comcast.net> wrote:
> >
> >
> > On 6/1/20 5:40 PM, Ana Marija wrote:
> > > Hi Jim,
> > >
> > > thank you so much for getting back to me. I tried your code and
this is
> > > what I get:
> > >> dim(neu2)
> > > [1] 3740988       9
> > >> dim(nep2)
> > > [1] 3740988       9
> > >> dim(ret2)
> > > [1] 3740001       9
> > >
> > > I think I would need to have the same number of lines in all 3
data frames.
> > >
> > > Can you please advise.
> >
> >
> > You should check for duplicated Marker values.
> >
> >
> > --
> >
> > David
> >
> > >
> > > Cheers
> > > Ana
> > >
> > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
> > >
> > >> Hi Ana,
> > >> Not too hard, but your example has all the "marker"
fields in common.
> > >> So using a sample that will show the expected result:
> > >>
> > >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2
Direction  pValue N
> > >>   1 100000012 1:100000012:G:T 0.229925  T  G  + 0.650403 1594
> > >>   1 100000827 1:100000827:C:T 0.287014  T  C  + 0.955449 1594
> > >>   1 100002713 1:100002713:C:T 0.097867  T  C  - 0.290455 1594
> > >>   1 100002882 1:100002882:T:G 0.287014  G  T  + 0.955449 1594
> > >>   1 100002991 1:100002991:G:A 0.097867  A  G  - 0.290455 1594
> > >>   1 100004726 1:100004726:G:A 0.132058  A  G  + 0.115005
1594",
> > >>   header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2
Direction    pValue N
> > >>   1 100000012 1:100000012:G:T 0.2300430 T  G - 0.1420030 1641
> > >>   1 100000827 1:100000827:C:T 0.2867150 T  C - 0.2045580 1641
> > >>   1 100002713 1:100002713:C:T 0.0975015 T  C - 0.0555507 1641
> > >>   1 100002882 1:100002882:T:G 0.2867150 G  T - 0.2045580 1641
> > >>   1 100002991 1:100002991:G:A 0.0975015 A  G - 0.0555507 1641
> > >>   1 100004726 1:100004727:G:A 0.1325410 A  G - 0.8725660
1641",
> > >>   header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2
Direction   pValue N
> > >>   1 100000012 1:100000012:G:T 0.2322760 T  G - 0.230383 1608
> > >>   1 100000827 1:100000827:C:T 0.2882460 T  C - 0.120356 1608
> > >>   1 100002713 1:100002713:C:T 0.0982587 T  C - 0.272936 1608
> > >>   1 100002882 1:100002882:T:G 0.2882460 G  T - 0.120356 1608
> > >>   1 100002991 1:100002992:G:A 0.0982587 A  G - 0.272936 1608
> > >>   1 100004726 1:100004727:G:A 0.1340170 A  G - 0.594538
1608",
> > >> header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> # merge the three data frames on "Marker"
> > >> nn1<-merge(neu1,nep1,by="Marker")
> > >> nn2<-merge(nn1,ret1,by="Marker")
> > >> # get the common "Marker" strings
> > >> Marker3<-nn2$Marker
> > >> # subset all three data frames on Marker3
> > >> neu2<-neu1[neu1$Marker %in% Marker3,]
> > >> nep2<-nep1[nep1$Marker %in% Marker3,]
> > >> ret2<-ret1[ret1$Marker %in% Marker3,]
> > >>
> > >> Jim
> > >>
> > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija
<sokovic.anamarija at gmail.com>
> > >> wrote:
> > >>> Hello,
> > >>>
> > >>> I have 3 data frames which have about 3.4 mill lines (but
they don't have
> > >>> exactly the same number of lines)...they look like this:
> > >>> ...
> > >>> Is there is a way to create another 3 data frames, say
neu2, nep2, ret2
> > >>> which would only contain lines that have the same entries
in Marker
> > >> column
> > >>> for all 3 data frames?
> > >>>
> > >>> Thanks
> > >>> Ana
> > >>>
> > >>>          [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained,
reproducible code.
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.

Jim Lemon

2020-Jun-02 03:04 UTC

head link

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

So recombination sticks out its foot before us. Do you want to account
for gene linkage?

JIm

On Tue, Jun 2, 2020 at 11:55 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:>
> Hi Jim
>
> > neu3<-neu1[!(neu1$Marker %in% Marker3),]
> > dim(neu3)
> [1] 1857    9
> > nep3<-nep1[!(nep1$Marker %in% Marker3),]
> > dim(nep3)
> [1] 5562    9
> > ret3<-ret1[!(ret1$Marker %in% Marker3),]
> > dim(ret3)
> [1] 3493    9
>
>
> If I do:
>
>  nn1<-merge(neu1,nep1,by=c("Marker","Chr"))
> nn2<-merge(nn1,ret1,by=c("Marker","Chr"))
> > Marker3<-nn2$Marker
> > length(Marker3)
> [1] 3742962
> > Marker4<-nn1$Marker
> > length(Marker4)
> [1] 3744443
>
> On Mon, Jun 1, 2020 at 8:50 PM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> >
> > Hi David,
> >
> > that is a great point!
> > Yes indeed some are non unique:
> >
> > > dim(neu1)
> > [1] 3742845       9
> > > length(unique(neu1$Marker))
> > [1] 3741858
> > > length(unique(nep1$Marker))
> > [1] 3745560
> > > dim(nep1)
> > [1] 3746550       9
> > > length(unique(ret1$Marker))
> > [1] 3743494
> > > dim(ret1)
> > [1] 3743494       9
> >
> > How would I rewrite this code so that is merging by Chr and Marker
> > column? It seems that a Marker can be under a few Chr.
> >
> >
> >
> >
> >
> > On Mon, Jun 1, 2020 at 8:41 PM David Winsemius <dwinsemius at
comcast.net> wrote:
> > >
> > >
> > > On 6/1/20 5:40 PM, Ana Marija wrote:
> > > > Hi Jim,
> > > >
> > > > thank you so much for getting back to me. I tried your code
and this is
> > > > what I get:
> > > >> dim(neu2)
> > > > [1] 3740988       9
> > > >> dim(nep2)
> > > > [1] 3740988       9
> > > >> dim(ret2)
> > > > [1] 3740001       9
> > > >
> > > > I think I would need to have the same number of lines in all
3 data frames.
> > > >
> > > > Can you please advise.
> > >
> > >
> > > You should check for duplicated Marker values.
> > >
> > >
> > > --
> > >
> > > David
> > >
> > > >
> > > > Cheers
> > > > Ana
> > > >
> > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
> > > >
> > > >> Hi Ana,
> > > >> Not too hard, but your example has all the
"marker" fields in common.
> > > >> So using a sample that will show the expected result:
> > > >>
> > > >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2
Direction  pValue N
> > > >>   1 100000012 1:100000012:G:T 0.229925  T  G  + 0.650403
1594
> > > >>   1 100000827 1:100000827:C:T 0.287014  T  C  + 0.955449
1594
> > > >>   1 100002713 1:100002713:C:T 0.097867  T  C  - 0.290455
1594
> > > >>   1 100002882 1:100002882:T:G 0.287014  G  T  + 0.955449
1594
> > > >>   1 100002991 1:100002991:G:A 0.097867  A  G  - 0.290455
1594
> > > >>   1 100004726 1:100004726:G:A 0.132058  A  G  + 0.115005
1594",
> > > >>   header=TRUE,stringsAsFactors=FALSE)
> > > >>
> > > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2
Direction    pValue N
> > > >>   1 100000012 1:100000012:G:T 0.2300430 T  G - 0.1420030
1641
> > > >>   1 100000827 1:100000827:C:T 0.2867150 T  C - 0.2045580
1641
> > > >>   1 100002713 1:100002713:C:T 0.0975015 T  C - 0.0555507
1641
> > > >>   1 100002882 1:100002882:T:G 0.2867150 G  T - 0.2045580
1641
> > > >>   1 100002991 1:100002991:G:A 0.0975015 A  G - 0.0555507
1641
> > > >>   1 100004726 1:100004727:G:A 0.1325410 A  G - 0.8725660
1641",
> > > >>   header=TRUE,stringsAsFactors=FALSE)
> > > >>
> > > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2
Direction   pValue N
> > > >>   1 100000012 1:100000012:G:T 0.2322760 T  G - 0.230383
1608
> > > >>   1 100000827 1:100000827:C:T 0.2882460 T  C - 0.120356
1608
> > > >>   1 100002713 1:100002713:C:T 0.0982587 T  C - 0.272936
1608
> > > >>   1 100002882 1:100002882:T:G 0.2882460 G  T - 0.120356
1608
> > > >>   1 100002991 1:100002992:G:A 0.0982587 A  G - 0.272936
1608
> > > >>   1 100004726 1:100004727:G:A 0.1340170 A  G - 0.594538
1608",
> > > >> header=TRUE,stringsAsFactors=FALSE)
> > > >>
> > > >> # merge the three data frames on "Marker"
> > > >> nn1<-merge(neu1,nep1,by="Marker")
> > > >> nn2<-merge(nn1,ret1,by="Marker")
> > > >> # get the common "Marker" strings
> > > >> Marker3<-nn2$Marker
> > > >> # subset all three data frames on Marker3
> > > >> neu2<-neu1[neu1$Marker %in% Marker3,]
> > > >> nep2<-nep1[nep1$Marker %in% Marker3,]
> > > >> ret2<-ret1[ret1$Marker %in% Marker3,]
> > > >>
> > > >> Jim
> > > >>
> > > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija
<sokovic.anamarija at gmail.com>
> > > >> wrote:
> > > >>> Hello,
> > > >>>
> > > >>> I have 3 data frames which have about 3.4 mill lines
(but they don't have
> > > >>> exactly the same number of lines)...they look like
this:
> > > >>> ...
> > > >>> Is there is a way to create another 3 data frames,
say neu2, nep2, ret2
> > > >>> which would only contain lines that have the same
entries in Marker
> > > >> column
> > > >>> for all 3 data frames?
> > > >>>
> > > >>> Thanks
> > > >>> Ana
> > > >>>
> > > >>>          [[alternative HTML version deleted]]
> > > >>>
> > > >>> ______________________________________________
> > > >>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >>> PLEASE do read the posting guide
> > > >> http://www.R-project.org/posting-guide.html
> > > >>> and provide commented, minimal, self-contained,
reproducible code.
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
code.

R help - Jun 2020 - is there is a way to extract lines in between 3 files that are in common based on one column?

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

[R] is there is a way to extract lines in between 3 files that are in common based on one column?