I don't see the issue here. It would be helpful if people would report their sessionInfo() when reporting whether or not they see this issue. Mine is> sessionInfo()R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux Matrix products: default BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 rmsfact_0.0.3 cowsay_0.5.0 fortunes_1.5-4 On Wed, Mar 14, 2018 at 12:02 PM, Gregory Michaelson <greg at datarobot.com> wrote:> I ran this code in RStudio Server on a linux machine, but I don?t know the version offhand. I will try to get it tomorrow. Thanks. > > Thanks, > Greg Michaelson > www.datarobot.com > 704-981-1118 > > > > >> On Mar 14, 2018, at 4:47 PM, Joris Meys <jorismeys at gmail.com> wrote: >> >> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is not recognized by Windows cmd, I replaced with: >> >> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1') >> >> The last line shows only 7 digits after the decimal, whereas the first have 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files are not the best way to work with datasets. >> >> Cheers >> Joris >> >> >> >> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel <edd at debian.org <mailto:edd at debian.org>> wrote: >> >> What OS are you on? On Ubuntu 17.10 with R 3.4.3 all seems well (see >> below for your example, I just added a setwd()). >> >> [ That said, I long held a (apparently minority) view that csv is for all >> intends and purposes a less-than-ideal format. If you have that much data, >> you do generally not want to serialize it back and forth as that is slow, and >> may drop precision. The rds format is great for R alone; we now have C code >> to read it from other apps (in the librdata repo by Evan Miller). Different >> portable serializations work too (protocol buffer, msgpack, ...), there are >> databases and on and on... ] >> >> Dirk >> >> >> R> df <- data.frame(replicate(100, runif(1000000, 0,1))) >> R> setwd("/tmp") >> R> write.csv(df, "temp.csv") >> R> system('tail -n1 temp.csv') >> "1000000",0.11496100993827,0.740764639340341,0.519190795486793,0.736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.00151186063885689,0.446474697208032,0.0673662247136235,0.791947861900553,0.0973296447191387 >> R> system('head -n2 temp.csv') >> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X21","X22","X23","X24","X25","X26","X27","X28","X29","X30","X31","X32","X33","X34","X35","X36","X37","X38","X39","X40","X41","X42","X43","X44","X45","X46","X47","X48","X49","X50","X51","X52","X53","X54","X55","X56","X57","X58","X59","X60","X61","X62","X63","X64","X65","X66","X67","X68","X69","X70","X71","X72","X73","X74","X75","X76","X77","X78","X79","X80","X81","X82","X83","X84","X85","X86","X87","X88","X89","X90","X91","X92","X93","X94","X95","X96","X97","X98","X99","X100" >> "1",0.995067856274545,0.0237177284434438,0.839840568602085,0.99880409357138,0.455015312181786,0.967688028467819,0.191194181796163,0.903533136472106,0.570170691236854,0.86230118968524,0.23530788696371,0.30707904486917,0.256274404237047,0.369592409580946,0.989929250674322,0.50812312704511,0.806819133926183,0.536566868191585,0.0863138805143535,0.294523851014674,0.676951135974377,0.195627561537549,0.261776751372963,0.383222601376474,0.578275503357872,0.79082652577199,0.19860127940774,0.0204593606758863,0.659964868798852,0.42379029514268,0.69516694964841,0.0594558380544186,0.124592808773741,0.289328144863248,0.524508266709745,0.84306427766569,0.317027662880719,0.273440480465069,0.111866136547178,0.217484838794917,0.354757327819243,0.973936082562432,0.673076402861625,0.300948366522789,0.219195493729785,0.912278874544427,0.276768424082547,0.959344451315701,0.500720858341083,0.431024399353191,0.814444699790329,0.0738761406391859,0.600137831410393,0.639816240407526,0.405302967177704,0.941259450744838,0.190415472723544,0.0382565588224679,0.486769351176918,0.127647049957886,0.558708024444059,0.686994878342375,0.176803215174004,0.794697789475322,0.59406904829666,0.0897431457415223,0.196549082174897,0.0750515828840435,0.736311340238899,0.00494878669269383,0.383522965712473,0.960385771468282,0.101023471681401,0.209177070530131,0.798869548132643,0.147874428424984,0.187238642480224,0.148522146046162,0.32379064662382,0.620601811446249,0.201180462958291,0.179565666476265,0.466121524339542,0.245493365218863,0.980698639061302,0.342919659335166,0.387780519668013,0.393966492731124,0.148554262006655,0.521724705817178,0.722740866011009,0.105151653522626,0.461909410310909,0.905382365221158,0.0736293855588883,0.636923864483833,0.540197744267061,0.425208077067509,0.666353516280651,0.584139186656103 >> R> >> >> -- >> http://dirk.eddelbuettel.com <http://dirk.eddelbuettel.com/> | @eddelbuettel | edd at debian.org <mailto:edd at debian.org> >> ______________________________________________ >> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Department of Data Analysis and Mathematical Modelling >> Ghent University >> Coupure Links 653, B-9000 Gent (Belgium) >> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> >> >> ----------- >> Biowiskundedagen 2017-2018 >> http://www.biowiskundedagen.ugent.be/ <http://www.biowiskundedagen.ugent.be/> >> >> ------------------------------- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php <http://helpdesk.ugent.be/e-maildisclaimer.php> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
My apologies for not including sessionInfo(), and I'm a bit angry at myself for that. Retrying in a fresh session of R, I get different results. More specifically, I get the expected result where accuracy is the same in the first and the last line. As I didn't include my sessionInfo() in my previous mail, I can't figure out why I now have a different result. So I'm positive I've seen the behaviour described by Gregory, but I can't reproduce consistently. Results and session Info below. Cheers Joris df = data.frame(replicate(100, runif(1000000, 0,1))) write.csv(df, "temp.csv")> system('head -n2 temp.csv')"","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X21","X22","X23","X24","X25","X26","X27","X28","X29","X30","X31","X32","X33","X34","X35","X36","X37","X38","X39","X40","X41","X42","X43","X44","X45","X46","X47","X48","X49","X50","X51","X52","X53","X54","X55","X56","X57","X58","X59","X60","X61","X62","X63","X64","X65","X66","X67","X68","X69","X70","X71","X72","X73","X74","X75","X76","X77","X78","X79","X80","X81","X82","X83","X84","X85","X86","X87","X88","X89","X90","X91","X92","X93","X94","X95","X96","X97","X98","X99","X100" "1",0.278388975420967,0.370451691094786,0.717217007186264,0.116161955753341,0.144262576242909,0.937281515449286,0.373484081588686,0.955863541224971,0.826917823404074,0.821003203978762,0.592950115678832,0.0627794633619487,0.815737818833441,0.0805139308795333,0.238502083579078,0.509200588334352,0.73775092815049,0.868772336747497,0.0352788285817951,0.96509046619758,0.403636189643294,0.435718205757439,0.0162769011221826,0.597037401981652,0.504837732296437,0.206882111029699,0.883217994589359,0.548339378088713,0.294472687412053,0.996299823047593,0.84715538774617,0.206719091162086,0.936834576772526,0.439650829415768,0.48171737533994,0.847850588615984,0.168411831371486,0.74452265072614,0.148969533387572,0.410039864480495,0.778313281945884,0.432499173562974,0.512454774230719,0.16644035698846,0.82063413807191,0.978053349768743,0.99700310616754,0.874686364317313,0.796479270327836,0.816980117466301,0.274035695008934,0.00785374757833779,0.678476774599403,0.660274159396067,0.184961069142446,0.681200950173661,0.611048432299867,0.73395977425389,0.209964233217761,0.310086127603427,0.975754244253039,0.125808657845482,0.015794032253325,0.526331929024309,0.531722096726298,0.59097072808072,0.815139955608174,0.529103851644322,0.183188699418679,0.910278890514746,0.237709420500323,0.752752122003585,0.14534721034579,0.00572531204670668,0.222574554383755,0.895228188252077,0.899962505558506,0.987743409816176,0.592631630599499,0.948386731324717,0.86595072131604,0.0715177122037858,0.0426598901394755,0.336731978459284,0.641609625890851,0.949697833275422,0.26424896903336,0.528028564760461,0.562290757661685,0.653207891387865,0.513830083655193,0.818740799557418,0.86044091056101,0.790382120991126,0.227793522411957,0.580261130817235,0.181467723799869,0.295633365400136,0.548259064555168,0.833231552969664> system('powershell -nologo & Get-Content -Path temp.csv -Tail 1')"1000000",0.946863592602313,0.656343327835202,0.627083137864247,0.482342466711998,0.337082419078797,0.424337374512106,0.626660786569118,0.870844106189907,0.78627574048005,0.0107703430112451,0.50574235082604,0.182688802946359,0.29385484661907,0.0441680049989372,0.375604564556852,0.895043386844918,0.510951161850244,0.865806604968384,0.0833957826253027,0.100834607845172,0.139034334337339,0.854574690107256,0.121182460337877,0.86904955166392,0.616418665507808,0.616997531382367,0.325345175806433,0.487117795739323,0.00973135000094771,0.304118999978527,0.0132197963539511,0.654607841046527,0.896146323531866,0.358923224499449,0.968490360304713,0.757937406655401,0.926832290366292,0.863271801266819,0.325824091676623,0.140821835258976,0.550571520347148,0.645497811725363,0.545551799703389,0.440615838393569,0.296690225601196,0.838868388207629,0.488215223187581,0.512655091006309,0.764586469857022,0.156665422255173,0.109298826660961,0.660329486243427,0.220234925625846,0.192423258908093,0.672684306278825,0.239764124620706,0.754978574579582,0.636799369007349,0.240582759492099,0.458807958755642,0.196174292825162,0.477994701592252,0.725636600283906,0.473409370519221,0.741089153569192,0.906417449470609,0.540478575974703,0.360421892022714,0.933905930491164,0.631188633851707,0.416520888684317,0.485372453462332,0.700725849252194,0.186034456361085,0.903570784721524,0.0693298415280879,0.261779377236962,0.128776200115681,0.0801852298900485,0.665786169003695,0.144309232477099,0.485807131510228,0.0646850543562323,0.909404250094667,0.848976222565398,0.862456669798121,0.949187902035192,0.240288577275351,0.177118748193607,0.0833796421065927,0.0747064722236246,0.107194342184812,0.774909492349252,0.424547733273357,0.848057812545449,0.913047505775467,0.134580536745489,0.904593974584714,0.90503191947937,0.386907825712115> sessionInfo()R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 yaml_2.1.18 On Wed, Mar 14, 2018 at 7:32 PM, Ista Zahn <istazahn at gmail.com> wrote:> I don't see the issue here. It would be helpful if people would report > their sessionInfo() when reporting whether or not they see this issue. > Mine is > > > sessionInfo() > R version 3.4.3 (2017-11-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Arch Linux > > Matrix products: default > BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.3 rmsfact_0.0.3 cowsay_0.5.0 fortunes_1.5-4 > > On Wed, Mar 14, 2018 at 12:02 PM, Gregory Michaelson <greg at datarobot.com> > wrote: > > I ran this code in RStudio Server on a linux machine, but I don?t know > the version offhand. I will try to get it tomorrow. Thanks. > > > > Thanks, > > Greg Michaelson > > www.datarobot.com > > 704-981-1118 > > > > > > > > > >> On Mar 14, 2018, at 4:47 PM, Joris Meys <jorismeys at gmail.com> wrote: > >> > >> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is > not recognized by Windows cmd, I replaced with: > >> > >> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1') > >> > >> The last line shows only 7 digits after the decimal, whereas the first > have 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files > are not the best way to work with datasets. > >> > >> Cheers > >> Joris > >> > >> > >> > >> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel <edd at debian.org > <mailto:edd at debian.org>> wrote: > >> > >> What OS are you on? On Ubuntu 17.10 with R 3.4.3 all seems well (see > >> below for your example, I just added a setwd()). > >> > >> [ That said, I long held a (apparently minority) view that csv is for > all > >> intends and purposes a less-than-ideal format. If you have that much > data, > >> you do generally not want to serialize it back and forth as that is > slow, and > >> may drop precision. The rds format is great for R alone; we now have C > code > >> to read it from other apps (in the librdata repo by Evan Miller). > Different > >> portable serializations work too (protocol buffer, msgpack, ...), there > are > >> databases and on and on... ] > >> > >> Dirk > >> > >> > >> R> df <- data.frame(replicate(100, runif(1000000, 0,1))) > >> R> setwd("/tmp") > >> R> write.csv(df, "temp.csv") > >> R> system('tail -n1 temp.csv') > >> "1000000",0.11496100993827,0.740764639340341,0.519190795486793,0. > 736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0. > 437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0. > 496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0. > 821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0. > 576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0. > 0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0. > 657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0. > 500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0. > 559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0. > 317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0. > 667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0. > 421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0. > 650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0. > 357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0. > 211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0. > 74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0. > 21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0. > 572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0. > 626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0. > 477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0. > 0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0. > 372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0. > 259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0. > 00151186063885689,0.446474697208032,0.0673662247136235,0. > 791947861900553,0.0973296447191387 > >> R> system('head -n2 temp.csv') > >> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11" > ,"X12","X13","X14","X15","X16","X17","X18","X19","X20","X21" > ,"X22","X23","X24","X25","X26","X27","X28","X29","X30","X31" > ,"X32","X33","X34","X35","X36","X37","X38","X39","X40","X41" > ,"X42","X43","X44","X45","X46","X47","X48","X49","X50","X51" > ,"X52","X53","X54","X55","X56","X57","X58","X59","X60","X61" > ,"X62","X63","X64","X65","X66","X67","X68","X69","X70","X71" > ,"X72","X73","X74","X75","X76","X77","X78","X79","X80","X81" > ,"X82","X83","X84","X85","X86","X87","X88","X89","X90","X91" > ,"X92","X93","X94","X95","X96","X97","X98","X99","X100" > >> "1",0.995067856274545,0.0237177284434438,0.839840568602085,0. > 99880409357138,0.455015312181786,0.967688028467819,0.191194181796163,0. > 903533136472106,0.570170691236854,0.86230118968524,0.23530788696371,0. > 30707904486917,0.256274404237047,0.369592409580946,0.989929250674322,0. > 50812312704511,0.806819133926183,0.536566868191585,0.0863138805143535,0. > 294523851014674,0.676951135974377,0.195627561537549,0.261776751372963,0. > 383222601376474,0.578275503357872,0.79082652577199,0.19860127940774,0. > 0204593606758863,0.659964868798852,0.42379029514268,0.69516694964841,0. > 0594558380544186,0.124592808773741,0.289328144863248,0.524508266709745,0. > 84306427766569,0.317027662880719,0.273440480465069,0.111866136547178,0. > 217484838794917,0.354757327819243,0.973936082562432,0.673076402861625,0. > 300948366522789,0.219195493729785,0.912278874544427,0.276768424082547,0. > 959344451315701,0.500720858341083,0.431024399353191,0.814444699790329,0. > 0738761406391859,0.600137831410393,0.639816240407526,0.405302967177704,0. > 941259450744838,0.190415472723544,0.0382565588224679,0.486769351176918,0. > 127647049957886,0.558708024444059,0.686994878342375,0.176803215174004,0. > 794697789475322,0.59406904829666,0.0897431457415223,0.196549082174897,0. > 0750515828840435,0.736311340238899,0.00494878669269383,0. > 383522965712473,0.960385771468282,0.101023471681401,0.209177070530131,0. > 798869548132643,0.147874428424984,0.187238642480224,0.148522146046162,0. > 32379064662382,0.620601811446249,0.201180462958291,0.179565666476265,0. > 466121524339542,0.245493365218863,0.980698639061302,0.342919659335166,0. > 387780519668013,0.393966492731124,0.148554262006655,0.521724705817178,0. > 722740866011009,0.105151653522626,0.461909410310909,0.905382365221158,0. > 0736293855588883,0.636923864483833,0.540197744267061,0.425208077067509,0. > 666353516280651,0.584139186656103 > >> R> > >> > >> -- > >> http://dirk.eddelbuettel.com <http://dirk.eddelbuettel.com/> | > @eddelbuettel | edd at debian.org <mailto:edd at debian.org> > >> ______________________________________________ > >> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel < > https://stat.ethz.ch/mailman/listinfo/r-devel> > >> > >> > >> > >> -- > >> Joris Meys > >> Statistical consultant > >> > >> Department of Data Analysis and Mathematical Modelling > >> Ghent University > >> Coupure Links 653, B-9000 Gent (Belgium) > >> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B- > 9000+Gent,%C2%A0Belgium&entry=gmail&source=g> > >> > >> ----------- > >> Biowiskundedagen 2017-2018 > >> http://www.biowiskundedagen.ugent.be/ <http://www.biowiskundedagen. > ugent.be/> > >> > >> ------------------------------- > >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php < > http://helpdesk.ugent.be/e-maildisclaimer.php> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
So, I come in this morning, and I also find that the behavior is not Happening any longer as well. Perhaps it has to do with Memory utilization and some built-in safeguards to avoid Memory Problems by truncating the numerics? It's extermely frustrating that it I can no longer make this happen. On Wed, Mar 14, 2018 at 8:05 PM, Joris Meys <jorismeys at gmail.com> wrote:> My apologies for not including sessionInfo(), and I'm a bit angry at > myself for that. Retrying in a fresh session of R, I get different results. > More specifically, I get the expected result where accuracy is the same in > the first and the last line. As I didn't include my sessionInfo() in my > previous mail, I can't figure out why I now have a different result. So I'm > positive I've seen the behaviour described by Gregory, but I can't > reproduce consistently. > > Results and session Info below. > > Cheers > Joris > > df = data.frame(replicate(100, runif(1000000, 0,1))) > write.csv(df, "temp.csv") > > > system('head -n2 temp.csv') > "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11" > ,"X12","X13","X14","X15","X16","X17","X18","X19","X20","X21" > ,"X22","X23","X24","X25","X26","X27","X28","X29","X30","X31" > ,"X32","X33","X34","X35","X36","X37","X38","X39","X40","X41" > ,"X42","X43","X44","X45","X46","X47","X48","X49","X50","X51" > ,"X52","X53","X54","X55","X56","X57","X58","X59","X60","X61" > ,"X62","X63","X64","X65","X66","X67","X68","X69","X70","X71" > ,"X72","X73","X74","X75","X76","X77","X78","X79","X80","X81" > ,"X82","X83","X84","X85","X86","X87","X88","X89","X90","X91" > ,"X92","X93","X94","X95","X96","X97","X98","X99","X100" > "1",0.278388975420967,0.370451691094786,0.717217007186264,0. > 116161955753341,0.144262576242909,0.937281515449286,0.373484081588686,0. > 955863541224971,0.826917823404074,0.821003203978762,0.592950115678832,0. > 0627794633619487,0.815737818833441,0.0805139308795333,0.238502083579078,0. > 509200588334352,0.73775092815049,0.868772336747497,0.0352788285817951,0. > 96509046619758,0.403636189643294,0.435718205757439,0.0162769011221826,0. > 597037401981652,0.504837732296437,0.206882111029699,0.883217994589359,0. > 548339378088713,0.294472687412053,0.996299823047593,0.84715538774617,0. > 206719091162086,0.936834576772526,0.439650829415768,0.48171737533994,0. > 847850588615984,0.168411831371486,0.74452265072614,0.148969533387572,0. > 410039864480495,0.778313281945884,0.432499173562974,0.512454774230719,0. > 16644035698846,0.82063413807191,0.978053349768743,0.99700310616754,0. > 874686364317313,0.796479270327836,0.816980117466301,0.274035695008934,0. > 00785374757833779,0.678476774599403,0.660274159396067,0.184961069142446,0. > 681200950173661,0.611048432299867,0.73395977425389,0.209964233217761,0. > 310086127603427,0.975754244253039,0.125808657845482,0.015794032253325,0. > 526331929024309,0.531722096726298,0.59097072808072,0.815139955608174,0. > 529103851644322,0.183188699418679,0.910278890514746,0.237709420500323,0. > 752752122003585,0.14534721034579,0.00572531204670668,0.222574554383755,0. > 895228188252077,0.899962505558506,0.987743409816176,0.592631630599499,0. > 948386731324717,0.86595072131604,0.0715177122037858,0.0426598901394755,0. > 336731978459284,0.641609625890851,0.949697833275422,0.26424896903336,0. > 528028564760461,0.562290757661685,0.653207891387865,0.513830083655193,0. > 818740799557418,0.86044091056101,0.790382120991126,0.227793522411957,0. > 580261130817235,0.181467723799869,0.295633365400136,0.548259064555168,0. > 833231552969664 > > system('powershell -nologo & Get-Content -Path temp.csv -Tail 1') > "1000000",0.946863592602313,0.656343327835202,0.627083137864247,0. > 482342466711998,0.337082419078797,0.424337374512106,0.626660786569118,0. > 870844106189907,0.78627574048005,0.0107703430112451,0.50574235082604,0. > 182688802946359,0.29385484661907,0.0441680049989372,0.375604564556852,0. > 895043386844918,0.510951161850244,0.865806604968384,0.0833957826253027,0. > 100834607845172,0.139034334337339,0.854574690107256,0.121182460337877,0. > 86904955166392,0.616418665507808,0.616997531382367,0.325345175806433,0. > 487117795739323,0.00973135000094771,0.304118999978527,0. > 0132197963539511,0.654607841046527,0.896146323531866,0.358923224499449,0. > 968490360304713,0.757937406655401,0.926832290366292,0.863271801266819,0. > 325824091676623,0.140821835258976,0.550571520347148,0.645497811725363,0. > 545551799703389,0.440615838393569,0.296690225601196,0.838868388207629,0. > 488215223187581,0.512655091006309,0.764586469857022,0.156665422255173,0. > 109298826660961,0.660329486243427,0.220234925625846,0.192423258908093,0. > 672684306278825,0.239764124620706,0.754978574579582,0.636799369007349,0. > 240582759492099,0.458807958755642,0.196174292825162,0.477994701592252,0. > 725636600283906,0.473409370519221,0.741089153569192,0.906417449470609,0. > 540478575974703,0.360421892022714,0.933905930491164,0.631188633851707,0. > 416520888684317,0.485372453462332,0.700725849252194,0.186034456361085,0. > 903570784721524,0.0693298415280879,0.261779377236962,0.128776200115681,0. > 0801852298900485,0.665786169003695,0.144309232477099,0.485807131510228,0. > 0646850543562323,0.909404250094667,0.848976222565398,0.862456669798121,0. > 949187902035192,0.240288577275351,0.177118748193607,0.0833796421065927,0. > 0747064722236246,0.107194342184812,0.774909492349252,0.424547733273357,0. > 848057812545449,0.913047505775467,0.134580536745489,0.904593974584714,0. > 90503191947937,0.386907825712115 > > > sessionInfo() > R version 3.4.3 (2017-11-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows >= 8 x64 (build 9200) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.3 tools_3.4.3 yaml_2.1.18 > > On Wed, Mar 14, 2018 at 7:32 PM, Ista Zahn <istazahn at gmail.com> wrote: > >> I don't see the issue here. It would be helpful if people would report >> their sessionInfo() when reporting whether or not they see this issue. >> Mine is >> >> > sessionInfo() >> R version 3.4.3 (2017-11-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Arch Linux >> >> Matrix products: default >> BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.4.3 rmsfact_0.0.3 cowsay_0.5.0 fortunes_1.5-4 >> >> On Wed, Mar 14, 2018 at 12:02 PM, Gregory Michaelson <greg at datarobot.com> >> wrote: >> > I ran this code in RStudio Server on a linux machine, but I don?t know >> the version offhand. I will try to get it tomorrow. Thanks. >> > >> > Thanks, >> > Greg Michaelson >> > www.datarobot.com >> > 704-981-1118 >> > >> > >> > >> > >> >> On Mar 14, 2018, at 4:47 PM, Joris Meys <jorismeys at gmail.com> wrote: >> >> >> >> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is >> not recognized by Windows cmd, I replaced with: >> >> >> >> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1') >> >> >> >> The last line shows only 7 digits after the decimal, whereas the first >> have 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files >> are not the best way to work with datasets. >> >> >> >> Cheers >> >> Joris >> >> >> >> >> >> >> >> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel <edd at debian.org >> <mailto:edd at debian.org>> wrote: >> >> >> >> What OS are you on? On Ubuntu 17.10 with R 3.4.3 all seems well (see >> >> below for your example, I just added a setwd()). >> >> >> >> [ That said, I long held a (apparently minority) view that csv is for >> all >> >> intends and purposes a less-than-ideal format. If you have that much >> data, >> >> you do generally not want to serialize it back and forth as that is >> slow, and >> >> may drop precision. The rds format is great for R alone; we now have >> C code >> >> to read it from other apps (in the librdata repo by Evan Miller). >> Different >> >> portable serializations work too (protocol buffer, msgpack, ...), >> there are >> >> databases and on and on... ] >> >> >> >> Dirk >> >> >> >> >> >> R> df <- data.frame(replicate(100, runif(1000000, 0,1))) >> >> R> setwd("/tmp") >> >> R> write.csv(df, "temp.csv") >> >> R> system('tail -n1 temp.csv') >> >> "1000000",0.11496100993827 <+49%206100%20993827>,0.7 >> 40764639340341,0.519190795486793,0.736045523779467,0.5371154 >> 48853001,0.769496953347698,0.102257401449606,0.4376177245285 >> 36,0.173321532085538,0.351960731903091,0.397348914295435,0.4 >> 96789071243256,0.463006566744298,0.573105450021103,0.5751964 >> 29155767,0.821617329493165,0.112913676071912,0.1875801463611 >> 42,0.121353451395407,0.576333721866831,0.00763232703320682, >> 0.468676633667201,0.451408475637436,0.0172415724955499,0.946 >> 199159137905,0.439950440311804,0.109224532730877,0.657066411 >> 571577,0.0524766123853624,0.54859598656185,0.94473168021068, >> 0.500153199071065,0.636756601976231,0.221365773351863,0.6201 >> 96332456544,0.559639401268214,0.198483835440129,0.3978746519 >> 42402,0.710652963491157,0.317212327616289,0.239299293374643, >> 0.0606942125596106,0.165786643279716,0.667431530542672,0.436 >> 631754040718,0.812185280025005,0.374252707697451,0.421187321 >> 422622,0.730321826180443,0.904493971262127,0.399387824581936 >> ,0.650714065413922,0.594219180056825,0.147960299625993,0.941 >> 945064114407,0.357223904458806,0.275038427906111,0.191008436 >> 959237,0.957893384154886,0.211530723143369,0.680650093592703 >> ,0.503884038887918,0.754094189498574,0.74776051659137,0.6736 >> 91919771954,0.236221367260441,0.825558929471299,0.2107195958 >> 9608,0.246618688805029,0.686810691142455,0.0247942050918937, >> 0.572868114337325,0.494058627169579,0.684360746992752,0.0139 >> 967589639127,0.626861660508439,0.417218193877488,0.410173830 >> 809072,0.390906651504338,0.477168896235526,0.382211019750684 >> ,0.597674581920728,0.198329919017851,0.0684413285925984,0.45 >> 0342149706557,0.133007253985852,0.755873151356354,0.37286273 >> 7858668,0.762442974606529,0.582133987685665,0.69204888353124 >> 3,0.259269661735743,0.147847984684631,0.635266482364386,0.32 >> 0955650880933,0.00151186063885689,0.446474697208032,0.067366 >> 2247136235,0.791947861900553,0.0973296447191387 >> >> R> system('head -n2 temp.csv') >> >> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11", >> "X12","X13","X14","X15","X16","X17","X18","X19","X20","X21", >> "X22","X23","X24","X25","X26","X27","X28","X29","X30","X31", >> "X32","X33","X34","X35","X36","X37","X38","X39","X40","X41", >> "X42","X43","X44","X45","X46","X47","X48","X49","X50","X51", >> "X52","X53","X54","X55","X56","X57","X58","X59","X60","X61", >> "X62","X63","X64","X65","X66","X67","X68","X69","X70","X71", >> "X72","X73","X74","X75","X76","X77","X78","X79","X80","X81", >> "X82","X83","X84","X85","X86","X87","X88","X89","X90","X91", >> "X92","X93","X94","X95","X96","X97","X98","X99","X100" >> >> "1",0.995067856274545,0.0237177284434438,0.839840568602085, >> 0.99880409357138,0.455015312181786,0.967688028467819,0.19119 >> 4181796163,0.903533136472106,0.570170691236854,0.8623011896 >> 8524,0.23530788696371,0.30707904486917,0.256274404237047,0.3 >> 69592409580946,0.989929250674322,0.50812312704511,0.80681913 >> 3926183,0.536566868191585,0.0863138805143535,0.2945238510146 >> 74,0.676951135974377,0.195627561537549,0.261776751372963,0.3 >> 83222601376474,0.578275503357872,0.79082652577199,0.19860127 >> 940774,0.0204593606758863,0.659964868798852,0.42379029514268 >> ,0.69516694964841,0.0594558380544186,0.124592808773741,0.289 >> 328144863248,0.524508266709745,0.84306427766569,0.3170276628 >> 80719,0.273440480465069,0.111866136547178,0.217484838794917, >> 0.354757327819243,0.973936082562432,0.673076402861625,0.3009 >> 48366522789,0.219195493729785,0.912278874544427,0.2767684240 >> 82547,0.959344451315701,0.500720858341083,0.431024399353191, >> 0.814444699790329,0.0738761406391859,0.600137831410393,0.639 >> 816240407526,0.405302967177704,0.941259450744838,0.190415472 >> 723544,0.0382565588224679,0.486769351176918,0.12764704995788 >> 6,0.558708024444059,0.686994878342375,0.176803215174004,0.79 >> 4697789475322,0.59406904829666,0.0897431457415223,0.19654908 >> 2174897,0.0750515828840435,0.736311340238899,0.0049487866926 >> 9383,0.383522965712473,0.960385771468282,0.101023471681401, >> 0.209177070530131,0.798869548132643,0.147874428424984,0.1872 >> 38642480224,0.148522146046162,0.32379064662382,0.62060181144 >> 6249,0.201180462958291,0.179565666476265,0.466121524339542, >> 0.245493365218863,0.980698639061302,0.342919659335166,0.3877 >> 80519668013,0.393966492731124,0.148554262006655,0.5217247058 >> 17178,0.722740866011009,0.105151653522626,0.461909410310909, >> 0.905382365221158,0.0736293855588883,0.636923864483833,0.540 >> 197744267061,0.425208077067509,0.666353516280651,0.584139186656103 >> >> R> >> >> >> >> -- >> >> http://dirk.eddelbuettel.com <http://dirk.eddelbuettel.com/> | >> @eddelbuettel | edd at debian.org <mailto:edd at debian.org> >> >> ______________________________________________ >> >> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel < >> https://stat.ethz.ch/mailman/listinfo/r-devel> >> >> >> >> >> >> >> >> -- >> >> Joris Meys >> >> Statistical consultant >> >> >> >> Department of Data Analysis and Mathematical Modelling >> >> Ghent University >> >> Coupure Links 653, B-9000 Gent (Belgium >> <https://maps.google.com/?q=Coupure+Links+653,+B-9000+Gent+(Belgium&entry=gmail&source=g> >> ) >> >> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+ >> Gent,%C2%A0Belgium&entry=gmail&source=g> >> >> >> >> ----------- >> >> Biowiskundedagen 2017-2018 >> >> http://www.biowiskundedagen.ugent.be/ <http://www.biowiskundedagen.u >> gent.be/> >> >> >> >> ------------------------------- >> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php < >> http://helpdesk.ugent.be/e-maildisclaimer.php> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > -- > Joris Meys > Statistical consultant > > Department of Data Analysis and Mathematical Modelling > Ghent University > Coupure Links 653, B-9000 Gent (Belgium) > > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> > > ----------- > Biowiskundedagen 2017-2018 > http://www.biowiskundedagen.ugent.be/ > > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >-- Greg [[alternative HTML version deleted]]