Cecilia Carmo
2010-Aug-21 15:45 UTC
[R] problems with merge() - the output has many repeated lines
Hi everyone, I have been merging many big dataframes (about 80000 rows each) and I never had this problem, but now it happened to me and I want to know if someone knows what could be happening. The final dataframe has many rows, an impossible number! I have done edit(dataframe) and I saw that there are many repeated rows (all equal). Thanks for any help, Cec?lia Carmo Universidade de Aveiro Portugal
Hadley Wickham
2010-Aug-21 15:58 UTC
[R] problems with merge() - the output has many repeated lines
You may find a close reading of ?merge helpful, particularly this sentence: "If there is more than one match, all possible matches contribute one row each" (so check that you don't have multiple matches). Hadley On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo <cecilia.carmo at ua.pt> wrote:> Hi everyone, > > I have been merging many big dataframes (about 80000 rows each) and I never > had this problem, but now it happened to me and I want to know if someone > knows what could be happening. > The final dataframe has many rows, an impossible number! I have done > edit(dataframe) and I saw that there are many repeated rows (all equal). > > Thanks for any help, > > Cec?lia Carmo > Universidade de Aveiro > Portugal > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Brad Patrick Schneid
2010-Aug-22 16:41 UTC
[R] problems with merge() - the output has many repeated lines
?unique -- View this message in context: http://r.789695.n4.nabble.com/problems-with-merge-the-output-has-many-repeated-lines-tp2333596p2334249.html Sent from the R help mailing list archive at Nabble.com.
Cecilia Carmo
2010-Aug-23 13:51 UTC
[R] problems with merge() - the output has many repeated lines
Thank you all for your help and patience. I?have done table(duplicated(df1[, c("firm","year")])) as William Dunlap suggested and I find repeated rows in df1. R is always right! I really believed that my data could not be repeated lines. I now have another problem which is to discover why this happened with my data, but this has nothing to do with the R! Thank you again and again, Cec?lia Carmo Universidade de Aveiro Portugal Em Sun, 22 Aug 2010 13:15:36 -0700 "William Dunlap" <wdunlap at tibco.com> escreveu:>> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of >>Cecilia Carmo >> Sent: Sunday, August 22, 2010 10:24 AM >> To: Erik Iverson >> Cc: r-help at r-project.org; Hadley Wickham >> Subject: Re: [R] problems with merge() - the output has >>many >> repeated lines >> >> I have done >> intersect(names(df1), names(df2)) >> [1] "firm" "year" >> >> This is the key I used to merge >> merge(df1,df2,by=c("firm","year")) >> >> And there is just one row firm/year in df1 that matches >> with another firm/year row in df2. Df1 has more >>firm/year >> rows than df2, and them don't match with none in df2. > > To get to the bottom of this you may have to show > us some of the relevant rows of data (80000 rows > per dataset would be a lot to mailout). For starters > it would be nice to see the output of > str(df1) > str(df2) > str(m) # where m is merge(df1,df2) > Then it would nice to see the output of > table(duplicated(df1[, c("firm","year")])) > and the same for df2 and m. > > You said you saw many repeated rows in the output of > merge(df1,df2,...), which I am calling 'm'. Say the >i'th > row is one of the repeated rows. What are the outputs >of > df1[ df1$firm==m$firm[i] & df1$year==m$year[i], >,drop=FALSE] > df2[ df2$firm==m$firm[i] & df2$year==m$year[i], >,drop=FALSE] > m[ m$firm==m$firm[i] & m$year==m$year[i], ,drop=FALSE] > ? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> Cec?lia >> >> Em Sun, 22 Aug 2010 12:09:57 -0500 >> Erik Iverson <eriki at ccbr.umn.edu> escreveu: >> > Cecilia - >> > >> >Find what columns you're matching on, >> > >> > intersect(names(df1), names(df2)), >> > >> > Maybe that will shed some light on the issue. >> > >> > On 08/22/2010 12:02 PM, Cecilia Carmo wrote: >> >> Thanks, but I don't have multiple matches and the >>lines >> >>repeated in the >> >> final dataframe are exactly equal in all columns. >> >> >> >> Cec?lia >> >> >> >> Sat, 21 Aug 2010 10:58:53 -0500 >> >> Hadley Wickham <hadley at rice.edu> escreveu: >> >>> You may find a close reading of ?merge helpful, >> >>>particularly this >> >>> sentence: "If there is more than one match, all >>possible >> >>> matches contribute one row each" (so check that you >> >>>don't have >> >>> multiple matches). >> >>> >> >>> Hadley >> >>> >> >>> On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo >> >>><cecilia.carmo at ua.pt> >> >>> wrote: >> >>>> Hi everyone, >> >>>> >> >>>> I have been merging many big dataframes (about >>80000 >> >>>>rows each) and I >> >>>> never >> >>>> had this problem, but now it happened to me and I >>want >> >>>>to know if >> >>>> someone >> >>>> knows what could be happening. >> >>>> The final dataframe has many rows, an impossible >>number! >> >>>>I have done >> >>>> edit(dataframe) and I saw that there are many >>repeated >> >>>>rows (all equal). >> >>>> >> >>>> Thanks for any help, >> >>>> >> >>>> Cec?lia Carmo >> >>>> Universidade de Aveiro >> >>>> Portugal >> >>>> >> >>>> ______________________________________________ >> >>>> R-help at r-project.org mailing list >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>> PLEASE do read the posting guide >> >>>> http://www.R-project.org/posting-guide.html >> >>>> and provide commented, minimal, self-contained, >> >>>>reproducible code. >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Assistant Professor / Dobelman Family Junior Chair >> >>> Department of Statistics / Rice University >> >>> http://had.co.nz/ >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, >> >>reproducible code. >> > >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, >>reproducible code. >>