thr3ads.net - R help - [R] negative vector length when merging data frames [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Ana Marija

2019-Oct-23 23:08 UTC

[R] negative vector length when merging data frames

Hi Jim,

I think one of the issue is that data frames are so big,> dim(l4)
[1] 166941635         8> dim(asign)[1] 107371528         5

so my example would not reproduce the error

On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Hi Ana,
> When I run this example taken from your email:
>
> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal
gene_id.LCL
> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> header=TRUE,stringsAsFactors=FALSE)
> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> header=TRUE,stringsAsFactors=FALSE)
> merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
>  [1] X1           X2           X3           X4           X5
> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> [11] p.val.Retina
> <0 rows> (or 0-length row.names)
>
> It works okay, but there are no matches in the join. So I can't even
> guess what the problem is.
>
> Jim
>
> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> >
> > Hello,
> >
> > I have two data frames like this:
> >
> > > head(l4)
> >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > head(asign)
> >               gene  chr                chr_pos   pos p.val.Retina
> > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > m = merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
> > Error in merge.data.frame(l4, asign, by.x = c("X1",
"X2"), by.y = c("chr",  :
> >   negative length vectors are not allowed
> > > sapply(l4,class)
> >           X1           X2           X3           X4           X5  
variant_id
> >  "character"  "character"  "character" 
"character"  "character"  "character"
> > pval_nominal  gene_id.LCL
> >    "numeric"  "character"
> > > sapply(asign,class)
> >         gene          chr      chr_pos          pos p.val.Retina
> >  "character"  "character"  "character" 
"character"  "character"
> >
> > Please advise as to why I am getting this error when merging?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

2019-Oct-23 23:15 UTC

head link

[R] negative vector length when merging data frames

Yes. Have you tried the bigmemory package?

Jim

On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:>
> Hi Jim,
>
> I think one of the issue is that data frames are so big,
> > dim(l4)
> [1] 166941635         8
> > dim(asign)
> [1] 107371528         5
>
> so my example would not reproduce the error
>
> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon at gmail.com>
wrote:
> >
> > Hi Ana,
> > When I run this example taken from your email:
> >
> > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal
gene_id.LCL
> > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> > header=TRUE,stringsAsFactors=FALSE)
> > asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> > header=TRUE,stringsAsFactors=FALSE)
> > merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
> >  [1] X1           X2           X3           X4           X5
> > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > [11] p.val.Retina
> > <0 rows> (or 0-length row.names)
> >
> > It works okay, but there are no matches in the join. So I can't
even
> > guess what the problem is.
> >
> > Jim
> >
> > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I have two data frames like this:
> > >
> > > > head(l4)
> > >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > > head(asign)
> > >               gene  chr                chr_pos   pos p.val.Retina
> > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > > m = merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
> > > Error in merge.data.frame(l4, asign, by.x = c("X1",
"X2"), by.y = c("chr",  :
> > >   negative length vectors are not allowed
> > > > sapply(l4,class)
> > >           X1           X2           X3           X4           X5 
variant_id
> > >  "character"  "character" 
"character"  "character"  "character" 
"character"
> > > pval_nominal  gene_id.LCL
> > >    "numeric"  "character"
> > > > sapply(asign,class)
> > >         gene          chr      chr_pos          pos p.val.Retina
> > >  "character"  "character" 
"character"  "character"  "character"
> > >
> > > Please advise as to why I am getting this error when merging?
> > >
> > > Thanks
> > > Ana
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.

Ana Marija

2019-Oct-23 23:17 UTC

head link

[R] negative vector length when merging data frames

no can you please send me an example how the command would look like in my case?

On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Yes. Have you tried the bigmemory package?
>
> Jim
>
> On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> >
> > Hi Jim,
> >
> > I think one of the issue is that data frames are so big,
> > > dim(l4)
> > [1] 166941635         8
> > > dim(asign)
> > [1] 107371528         5
> >
> > so my example would not reproduce the error
> >
> > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
> > >
> > > Hi Ana,
> > > When I run this example taken from your email:
> > >
> > > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id
pval_nominal gene_id.LCL
> > > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > chr1 17005  A  G b38 1:17005:A:G     0.802803
ENSG00000227232",
> > > header=TRUE,stringsAsFactors=FALSE)
> > > asign<-read.table(text="gene  chr  chr_pos   pos
p.val.Retina
> > > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > ENSG00000227232 chr1  rs201725126:13116:T:G 13116    
0.438572",
> > > header=TRUE,stringsAsFactors=FALSE)
> > > merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
> > >  [1] X1           X2           X3           X4           X5
> > > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > > [11] p.val.Retina
> > > <0 rows> (or 0-length row.names)
> > >
> > > It works okay, but there are no matches in the join. So I
can't even
> > > guess what the problem is.
> > >
> > > Jim
> > >
> > > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija
at gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have two data frames like this:
> > > >
> > > > > head(l4)
> > > >     X1    X2 X3 X4  X5  variant_id pval_nominal    
gene_id.LCL
> > > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614
ENSG00000227232
> > > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708
ENSG00000227232
> > > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887
ENSG00000227232
> > > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895
ENSG00000227232
> > > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822
ENSG00000227232
> > > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803
ENSG00000227232
> > > > > head(asign)
> > > >               gene  chr                chr_pos   pos
p.val.Retina
> > > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177    
0.381708
> > > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352    
0.959523
> > > > 3: ENSG00000227232 chr1            1:11008:C:G 11008    
0.218132
> > > > 4: ENSG00000227232 chr1            1:11012:C:G 11012    
0.218132
> > > > 5: ENSG00000227232 chr1            1:13110:G:A 13110    
0.998262
> > > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116    
0.438572
> > > > > m = merge(l4, asign, by.x=c("X1",
"X2"), by.y=c("chr", "pos"))
> > > > Error in merge.data.frame(l4, asign, by.x =
c("X1", "X2"), by.y = c("chr",  :
> > > >   negative length vectors are not allowed
> > > > > sapply(l4,class)
> > > >           X1           X2           X3           X4         
X5   variant_id
> > > >  "character"  "character" 
"character"  "character"  "character" 
"character"
> > > > pval_nominal  gene_id.LCL
> > > >    "numeric"  "character"
> > > > > sapply(asign,class)
> > > >         gene          chr      chr_pos          pos
p.val.Retina
> > > >  "character"  "character" 
"character"  "character"  "character"
> > > >
> > > > Please advise as to why I am getting this error when
merging?
> > > >
> > > > Thanks
> > > > Ana
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
code.

Rui Barradas

2019-Oct-24 19:17 UTC

head link

[R] negative vector length when merging data frames

Hello,

Sometimes sqldf::sqldf tends to save memory. Maybe if you try

library(sqldf)

sqldf('select l4.*, asign.gene, asign.chr_pos, asign.`p.val.Retina`
       from l4
       inner join asign
       on X1 = asign.chr and X2 = asign.pos')

Or you can filter the rows that match first, then merge the results.
Something along the lines of

# read in only the columns needed with fread, it's fast
l4join <- data.table::fread(l4_file, select = c("X1",
"X2"))
ajoin <- data.table::fread(asign_file, select = c("chr",
"pos"))

# create indices with the matches on both sides
i1 <- (l4join$X1 %in% ajoin$chr) & (l4join$X2 %in% ajoin$pos)
i2 <- (ajoin$chr %in% l4join$X1) & (ajoin$pos %in% l4join$X2)

rm(l4join, ajoin)   # don't need this any more, remove them

# now the real fread's
l4 <- data.table::fread(l4_file)
asign <- data.table::fread(asign_file)

# extract the relevant rows and merge
res <- l4[i1, ]
res2 <- asign[i2, setdiff(names(asign), names(l4))]
merge(res, res2, by.x = c("X1", "X2"), by.y =
c("chr", "pos"))


Hope this helps,

Rui Barradas






?s 00:08 de 24/10/19, Ana Marija escreveu:> Hi Jim,
> 
> I think one of the issue is that data frames are so big,
>> dim(l4)
> [1] 166941635         8
>> dim(asign)
> [1] 107371528         5
> 
> so my example would not reproduce the error
> 
> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon at gmail.com>
wrote:
>>
>> Hi Ana,
>> When I run this example taken from your email:
>>
>> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal
gene_id.LCL
>> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
>> header=TRUE,stringsAsFactors=FALSE)
>> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
>> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
>> header=TRUE,stringsAsFactors=FALSE)
>> merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
>>   [1] X1           X2           X3           X4           X5
>> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
>> [11] p.val.Retina
>> <0 rows> (or 0-length row.names)
>>
>> It works okay, but there are no matches in the join. So I can't
even
>> guess what the problem is.
>>
>> Jim
>>
>> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I have two data frames like this:
>>>
>>>> head(l4)
>>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>>>> head(asign)
>>>                gene  chr                chr_pos   pos p.val.Retina
>>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>>>> m = merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
>>> Error in merge.data.frame(l4, asign, by.x = c("X1",
"X2"), by.y = c("chr",  :
>>>    negative length vectors are not allowed
>>>> sapply(l4,class)
>>>            X1           X2           X3           X4           X5  
variant_id
>>>   "character"  "character" 
"character"  "character"  "character" 
"character"
>>> pval_nominal  gene_id.LCL
>>>     "numeric"  "character"
>>>> sapply(asign,class)
>>>          gene          chr      chr_pos          pos p.val.Retina
>>>   "character"  "character" 
"character"  "character"  "character"
>>>
>>> Please advise as to why I am getting this error when merging?
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Ana Marija

2019-Oct-29 02:28 UTC

head link

[R] negative vector length when merging data frames

HI Rui,

thank you so much for this. I tried with the sqldf but it didn't help.
Next I tried your 2nd method and I was following your steps until:
> res2 <- asign[i2, setdiff(names(asign), names(l4))]
> m=merge(res, res2, by.x = c("chr", "pos"), by.y =
c("chr", "pos"))Error in merge.data.table(res, res2, by.x = c("chr", "pos"),
by.y = c("chr",  :
  Elements listed in `by.y` must be valid column names in
y.> head(res)    chr   pos a1 a2  a3         variant_id pval_nominal           gene_id
1: chr1 54490  G  A b38 chr1_54490_G_A_b38     0.608495 ENSG00000227232.5
2: chr1 58814  G  A b38 chr1_58814_G_A_b38     0.295211 ENSG00000227232.5
3: chr1 60351  A  G b38 chr1_60351_A_G_b38     0.439788 ENSG00000227232.5
4: chr1 61920  G  A b38 chr1_61920_G_A_b38     0.319528 ENSG00000227232.5
5: chr1 63671  G  A b38 chr1_63671_G_A_b38     0.237739 ENSG00000227232.5
6: chr1 64931  G  A b38 chr1_64931_G_A_b38     0.276679
ENSG00000227232.5> head(res2)[1] "gene"         "chr_pos"     
"p.val.Retina"> dim(res)
[1] 111478253         8> head(l4)    chr   pos a1 a2  a3         variant_id pval_nominal           gene_id
1: chr1 13550  G  A b38 chr1_13550_G_A_b38     0.375614 ENSG00000227232.5
2: chr1 14671  G  C b38 chr1_14671_G_C_b38     0.474708 ENSG00000227232.5
3: chr1 14677  G  A b38 chr1_14677_G_A_b38     0.699887 ENSG00000227232.5
4: chr1 16841  G  T b38 chr1_16841_G_T_b38     0.127895 ENSG00000227232.5
5: chr1 16856  A  G b38 chr1_16856_A_G_b38     0.627822 ENSG00000227232.5
6: chr1 17005  A  G b38 chr1_17005_A_G_b38     0.802803
ENSG00000227232.5> head(asign)              gene  chr                chr_pos   pos p.val.Retina
1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116    
0.438572> length(i2)[1] 107371528

Everything is the same as I stated initially in the problem, except
that as you can see in l4 I renamed columns so now
instead of X1 and X2 I have "chr", "pos"

Do you know why this command didn't return anything?
res2 <- asign[i2, setdiff(names(asign), names(l4))]

On Thu, Oct 24, 2019 at 2:17 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:>
> Hello,
>
> Sometimes sqldf::sqldf tends to save memory. Maybe if you try
>
> library(sqldf)
>
> sqldf('select l4.*, asign.gene, asign.chr_pos, asign.`p.val.Retina`
>        from l4
>        inner join asign
>        on X1 = asign.chr and X2 = asign.pos')
>
> Or you can filter the rows that match first, then merge the results.
> Something along the lines of
>
> # read in only the columns needed with fread, it's fast
> l4join <- data.table::fread(l4_file, select = c("X1",
"X2"))
> ajoin <- data.table::fread(asign_file, select = c("chr",
"pos"))
>
> # create indices with the matches on both sides
> i1 <- (l4join$X1 %in% ajoin$chr) & (l4join$X2 %in% ajoin$pos)
> i2 <- (ajoin$chr %in% l4join$X1) & (ajoin$pos %in% l4join$X2)
>
> rm(l4join, ajoin)   # don't need this any more, remove them
>
> # now the real fread's
> l4 <- data.table::fread(l4_file)
> asign <- data.table::fread(asign_file)
>
> # extract the relevant rows and merge
> res <- l4[i1, ]
> res2 <- asign[i2, setdiff(names(asign), names(l4))]
> merge(res, res2, by.x = c("X1", "X2"), by.y =
c("chr", "pos"))
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
>
>
>
> ?s 00:08 de 24/10/19, Ana Marija escreveu:
> > Hi Jim,
> >
> > I think one of the issue is that data frames are so big,
> >> dim(l4)
> > [1] 166941635         8
> >> dim(asign)
> > [1] 107371528         5
> >
> > so my example would not reproduce the error
> >
> > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
> >>
> >> Hi Ana,
> >> When I run this example taken from your email:
> >>
> >> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id
pval_nominal gene_id.LCL
> >> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> >> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> >> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> >> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> >> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> >> chr1 17005  A  G b38 1:17005:A:G     0.802803
ENSG00000227232",
> >> header=TRUE,stringsAsFactors=FALSE)
> >> asign<-read.table(text="gene  chr  chr_pos   pos
p.val.Retina
> >> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> >> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> >> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> >> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> >> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> >> ENSG00000227232 chr1  rs201725126:13116:T:G 13116    
0.438572",
> >> header=TRUE,stringsAsFactors=FALSE)
> >> merge(l4, asign, by.x=c("X1", "X2"),
by.y=c("chr", "pos"))
> >>   [1] X1           X2           X3           X4           X5
> >> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> >> [11] p.val.Retina
> >> <0 rows> (or 0-length row.names)
> >>
> >> It works okay, but there are no matches in the join. So I
can't even
> >> guess what the problem is.
> >>
> >> Jim
> >>
> >> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija
at gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I have two data frames like this:
> >>>
> >>>> head(l4)
> >>>      X1    X2 X3 X4  X5  variant_id pval_nominal    
gene_id.LCL
> >>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614
ENSG00000227232
> >>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708
ENSG00000227232
> >>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887
ENSG00000227232
> >>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895
ENSG00000227232
> >>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822
ENSG00000227232
> >>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803
ENSG00000227232
> >>>> head(asign)
> >>>                gene  chr                chr_pos   pos
p.val.Retina
> >>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177    
0.381708
> >>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352    
0.959523
> >>> 3: ENSG00000227232 chr1            1:11008:C:G 11008    
0.218132
> >>> 4: ENSG00000227232 chr1            1:11012:C:G 11012    
0.218132
> >>> 5: ENSG00000227232 chr1            1:13110:G:A 13110    
0.998262
> >>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116    
0.438572
> >>>> m = merge(l4, asign, by.x=c("X1",
"X2"), by.y=c("chr", "pos"))
> >>> Error in merge.data.frame(l4, asign, by.x = c("X1",
"X2"), by.y = c("chr",  :
> >>>    negative length vectors are not allowed
> >>>> sapply(l4,class)
> >>>            X1           X2           X3           X4          
X5   variant_id
> >>>   "character"  "character" 
"character"  "character"  "character" 
"character"
> >>> pval_nominal  gene_id.LCL
> >>>     "numeric"  "character"
> >>>> sapply(asign,class)
> >>>          gene          chr      chr_pos          pos
p.val.Retina
> >>>   "character"  "character" 
"character"  "character"  "character"
> >>>
> >>> Please advise as to why I am getting this error when merging?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

R help - Oct 2019 - negative vector length when merging data frames

[R] negative vector length when merging data frames

[R] negative vector length when merging data frames

[R] negative vector length when merging data frames

[R] negative vector length when merging data frames

[R] negative vector length when merging data frames