Also, it will be easier to provide helpful information if you'd describe what in your data you want to compare and what you hope to get out of the comparison. Best wishes, Ulrik Eric Berger <ericjberger at gmail.com> schrieb am Sa., 27. Jan. 2018, 08:18:> Hi Marsh, > An RDS is not a data structure such as a data.frame. It can be anything. > For example if I want to save my objects a, b, c I could do: > > saveRDS( list(a,b,c,), file="tmp.RDS") > Then read them back later with > > myList <- readRDS( "tmp.RDS" ) > > Do you have additional information about your "RDSs" ? > > Eric > > > On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK <mhardy at ara.com> > wrote: > > > Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing > > row numbers with mismatches? > > > > Thanks in advance. > > > > // > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Marsh Hardy ARA/RISK
2018-Jan-27 21:18 UTC
[R] Newbie wants to compare 2 huge RDSs row by row.
Hi Guys, I apologize for my rank & utter newness at R. I used summary() and found about 95 variables, both character and numeric, all with "Length:368842" I assume is the # of records. I'd like to know the record number (row #?) of any record where the data doesn't match in the 2 files of what should be the same output. Thanks in advance, M. // ________________________________________ From: Ulrik Stervbo [ulrik.stervbo at gmail.com] Sent: Saturday, January 27, 2018 10:00 AM To: Eric Berger Cc: Marsh Hardy ARA/RISK; r-help at r-project.org Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row. Also, it will be easier to provide helpful information if you'd describe what in your data you want to compare and what you hope to get out of the comparison. Best wishes, Ulrik Eric Berger <ericjberger at gmail.com<mailto:ericjberger at gmail.com>> schrieb am Sa., 27. Jan. 2018, 08:18: Hi Marsh, An RDS is not a data structure such as a data.frame. It can be anything. For example if I want to save my objects a, b, c I could do:> saveRDS( list(a,b,c,), file="tmp.RDS")Then read them back later with> myList <- readRDS( "tmp.RDS" )Do you have additional information about your "RDSs" ? Eric On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK <mhardy at ara.com<mailto:mhardy at ara.com>> wrote:> Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing > row numbers with mismatches? > > Thanks in advance. > > // > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
> On Jan 27, 2018, at 1:18 PM, Marsh Hardy ARA/RISK <mhardy at ara.com> wrote: > > Hi Guys, I apologize for my rank & utter newness at R. > > I used summary() and found about 95 variables, both character and numeric, all with "Length:368842" I assume is the # of records. > > I'd like to know the record number (row #?) of any record where the data doesn't match in the 2 files of what should be the same output.The 'length' function returns the number of items at the top level of a list. As such it returns the number of columns of a dataframe which is a type of list. The quotes around "Length:368842" make it appear that you are quoting from some sort of output, although its nature is unclear. Rather than abstracting from information that you don't know how to interpret it would have been better to include the entire output. The `summary` function is generic and therefore can vary widely in what it prints to the console. You should instead post the output from str() applied to each of your "files". It does a better job of displaying object structure. -- David.> Thanks in advance, M. > > // > ________________________________________ > From: Ulrik Stervbo [ulrik.stervbo at gmail.com] > Sent: Saturday, January 27, 2018 10:00 AM > To: Eric Berger > Cc: Marsh Hardy ARA/RISK; r-help at r-project.org > Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row. > > Also, it will be easier to provide helpful information if you'd describe what in your data you want to compare and what you hope to get out of the comparison. > > Best wishes, > Ulrik > > Eric Berger <ericjberger at gmail.com<mailto:ericjberger at gmail.com>> schrieb am Sa., 27. Jan. 2018, 08:18: > Hi Marsh, > An RDS is not a data structure such as a data.frame. It can be anything. > For example if I want to save my objects a, b, c I could do: >> saveRDS( list(a,b,c,), file="tmp.RDS") > Then read them back later with >> myList <- readRDS( "tmp.RDS" ) > > Do you have additional information about your "RDSs" ? > > Eric > > > On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK <mhardy at ara.com<mailto:mhardy at ara.com>> > wrote: > >> Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing >> row numbers with mismatches? >> >> Thanks in advance. >> >> // >> >> ______________________________________________ >> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
If your two objects have class "data.frame" (look at class(objectName)) and they both have the same number of columns and the same order of columns and the column types match closely enough (use all.equal(x1, x2) for that), then you can try which( rowSums( x1 != x2 ) > 0) E.g.,> x1 <- data.frame(X=1:5, Y=rep(c("A","B"),c(3,2))) > x2 <- data.frame(X=c(1,2,-3,-4,5), Y=rep(c("A","B"),c(2,3))) > x1X Y 1 1 A 2 2 A 3 3 A 4 4 B 5 5 B> x2X Y 1 1 A 2 2 A 3 -3 B 4 -4 B 5 5 B> which( rowSums( x1 != x2 ) > 0)[1] 3 4 If you want to allow small numeric differences but exactly character matches you will have to get a bit fancier. Splitting the data.frames into character and numeric parts and comparing each works well. Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Jan 27, 2018 at 1:18 PM, Marsh Hardy ARA/RISK <mhardy at ara.com> wrote:> Hi Guys, I apologize for my rank & utter newness at R. > > I used summary() and found about 95 variables, both character and numeric, > all with "Length:368842" I assume is the # of records. > > I'd like to know the record number (row #?) of any record where the data > doesn't match in the 2 files of what should be the same output. > > Thanks in advance, M. > > // > ________________________________________ > From: Ulrik Stervbo [ulrik.stervbo at gmail.com] > Sent: Saturday, January 27, 2018 10:00 AM > To: Eric Berger > Cc: Marsh Hardy ARA/RISK; r-help at r-project.org > Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row. > > Also, it will be easier to provide helpful information if you'd describe > what in your data you want to compare and what you hope to get out of the > comparison. > > Best wishes, > Ulrik > > Eric Berger <ericjberger at gmail.com<mailto:ericjberger at gmail.com>> schrieb > am Sa., 27. Jan. 2018, 08:18: > Hi Marsh, > An RDS is not a data structure such as a data.frame. It can be anything. > For example if I want to save my objects a, b, c I could do: > > saveRDS( list(a,b,c,), file="tmp.RDS") > Then read them back later with > > myList <- readRDS( "tmp.RDS" ) > > Do you have additional information about your "RDSs" ? > > Eric > > > On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK <mhardy at ara.com > <mailto:mhardy at ara.com>> > wrote: > > > Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing > > row numbers with mismatches? > > > > Thanks in advance. > > > > // > > > > ______________________________________________ > > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]