Not convinced Jeff is completely right about this not concerning R, since I've found that the application language (R, perl, etc.) makes a difference in how files are accessed by/to OS. He is certainly correct that OS (and versions) are where the actual reading and writing happens, but sometimes the call to those can be inefficient. (Sorry, I've not got examples specifically for file reads, but had a case in computation where there was an 800% i.e., 80000 fold difference in timing with R, which rather took my breath away. That's probably been sorted now.) The difficulty in making general statements is that a rather full set of comparisons over different commands, datasets, OS and version variants is needed before the general picture can emerge. Using microbenchmark when you need to find the bottlenecks is how I'd proceed, which OP is doing. About 30 years ago, I did write up some preliminary work, never published, on estimating the two halves of a copy, that is, the reading from file and storing to "memory" or a different storage location. This was via regression with a singular design matrix, but one can get a minimal length least squares solution via svd. Possibly relevant today to try to get at slow links on a network. JN On 2017-08-22 09:07 AM, Jeff Newmiller wrote:> You need to study how reading files works in your operating system. This question is not about R. >
Caching happens, both within the operating system and within the C standard library. Ostensibly the intent for those caches is to help performance, but you are right that different low-level caching algorithms can be a poor match for specific application level use cases such as copying files or parsing text syntax. However, the OS and even the specific file system drivers (e.g. ext4 on flash disk or FAT32 on magnetic media) can behave quite differently for the same application level use case, so a generic discussion at the R language level (this mailing list) can be almost impossible to sort out intelligently. -- Sent from my phone. Please excuse my brevity. On August 22, 2017 7:11:39 AM PDT, J C Nash <profjcnash at gmail.com> wrote:>Not convinced Jeff is completely right about this not concerning R, >since I've found that the application language (R, >perl, etc.) makes a difference in how files are accessed by/to OS. He >is certainly correct that OS (and versions) are >where the actual reading and writing happens, but sometimes the call to >those can be inefficient. (Sorry, I've not got >examples specifically for file reads, but had a case in computation >where there was an 800% i.e., 80000 fold difference >in timing with R, which rather took my breath away. That's probably >been sorted now.) The difficulty in making general >statements is that a rather full set of comparisons over different >commands, datasets, OS and version variants is needed >before the general picture can emerge. Using microbenchmark when you >need to find the bottlenecks is how I'd proceed, >which OP is doing. > >About 30 years ago, I did write up some preliminary work, never >published, on estimating the two halves of a copy, that >is, the reading from file and storing to "memory" or a different >storage location. This was via regression with a >singular design matrix, but one can get a minimal length least squares >solution via svd. Possibly relevant today to try >to get at slow links on a network. > >JN > >On 2017-08-22 09:07 AM, Jeff Newmiller wrote: >> You need to study how reading files works in your operating system. >This question is not about R. >>
Dear R Fellows, I Have a dataset( data1) with 2 columns of date showing a class of factor. How to convert them to date? Then compare them, keep the greater date only in a new column. Using as.Date to change the class to Date but the data becomes NA. Much Thanks COL1 COL2 Apr-16 1-Nov-16 May-16 1-Nov-16 Jun-16 1-Nov-16 Jul-16 1-Nov-16 Aug-16 1-Nov-16 Sep-16 1-Nov-16 Oct-16 1-Nov-16 Nov-16 1-Nov-16 Dec-16 1-Nov-16 Jan-17 1-Nov-16 Feb-17 1-Nov-16 Mar-17 1-Nov-16 Apr-17 1-Nov-16 May-17 1-Nov-16 Jun-17 1-Nov-16 Jul-17 1-Nov-16 Aug-17 1-Nov-16 Sep-17 1-Nov-16 [[alternative HTML version deleted]]
raphael.felber at agroscope.admin.ch
2017-Aug-23 12:40 UTC
[R] How to benchmark speed of load/readRDS correctly
Hi there Thanks for your answers. I didn't expect that this would be so complex. Honestly, I don't understand everything you wrote since I'm not an IT specialist. But I read something that reading *.rds files is faster than loading *.Rdata and I wanted to proof that for my system and R version. But thanks anyway for your time. Cheers Raphael> -----Urspr?ngliche Nachricht----- > Von: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] > Gesendet: Dienstag, 22. August 2017 18:33 > An: J C Nash <profjcnash at gmail.com>; r-help at r-project.org; Felber Raphael > Agroscope <raphael.felber at agroscope.admin.ch> > Betreff: Re: [R] How to benchmark speed of load/readRDS correctly > > Caching happens, both within the operating system and within the C > standard library. Ostensibly the intent for those caches is to help > performance, but you are right that different low-level caching algorithms > can be a poor match for specific application level use cases such as copying > files or parsing text syntax. However, the OS and even the specific file > system drivers (e.g. ext4 on flash disk or FAT32 on magnetic media) can > behave quite differently for the same application level use case, so a generic > discussion at the R language level (this mailing list) can be almost impossible > to sort out intelligently. > -- > Sent from my phone. Please excuse my brevity. > > On August 22, 2017 7:11:39 AM PDT, J C Nash <profjcnash at gmail.com> > wrote: > >Not convinced Jeff is completely right about this not concerning R, > >since I've found that the application language (R, perl, etc.) makes a > >difference in how files are accessed by/to OS. He is certainly correct > >that OS (and versions) are where the actual reading and writing > >happens, but sometimes the call to those can be inefficient. (Sorry, > >I've not got examples specifically for file reads, but had a case in > >computation where there was an 800% i.e., 80000 fold difference in > >timing with R, which rather took my breath away. That's probably been > >sorted now.) The difficulty in making general statements is that a > >rather full set of comparisons over different commands, datasets, OS > >and version variants is needed before the general picture can emerge. > >Using microbenchmark when you need to find the bottlenecks is how I'd > >proceed, which OP is doing. > > > >About 30 years ago, I did write up some preliminary work, never > >published, on estimating the two halves of a copy, that is, the reading > >from file and storing to "memory" or a different storage location. This > >was via regression with a singular design matrix, but one can get a > >minimal length least squares solution via svd. Possibly relevant today > >to try to get at slow links on a network. > > > >JN > > > >On 2017-08-22 09:07 AM, Jeff Newmiller wrote: > >> You need to study how reading files works in your operating system. > >This question is not about R. > >>