I'm encountering excruciatingly slow load times for character vectors in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, repeated loads of the same set of files are near-instantaneous. The problem is proving tricky to reproduce consistently from scratch, so I have attached the 3 files used in the examples below. If I create a similar-looking object from scratch, then save it and re-load it a few times, the problem doesn't always occur... at least not in that session. FWIW I have noticed that the time taken to load seems to be roughly a power of 2 of the "base slow load time"-- could be a red herring. The problem seems specific to character vectors-- I noticed it with entire workspaces and have whittled it down to char vecs only. The example below is from a brand-new session with only the basic packages loaded; delays in my real sessions are much longer. Mark Bravington CSIRO Mathematical & Information Sciences Marine Laboratory Castray Esplanade Hobart 7001 TAS ph (+61) 3 6232 5118 fax (+61) 3 6232 5012 mob (+61) 438 315 623 Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.> system.time( load( 'd:/r2.0/t1.rda'))user system elapsed 0.5 0.0 0.5> system.time( load( 'd:/r2.0/t1.rda')) # same file; sloweruser system elapsed 3.5 0.0 3.5> system.time( load( 'd:/r2.0/t1.rda'))user system elapsed 4.13 0.00 4.13> system.time( load( 'd:/r2.0/t1.rda'))user system elapsed 3.51 0.00 3.52> system.time( load( 'd:/r2.0/t2.rda')) # different bigger fileuser system elapsed 4.42 0.00 4.42> system.time( load( 'd:/r2.0/t2.rda')) # same file; sloweruser system elapsed 10.44 0.00 10.44> system.time( load( 'd:/r2.0/t2.rda'))user system elapsed 10.79 0.00 10.80> system.time( load( 'd:/r2.0/t2.rda'))user system elapsed 10.39 0.00 10.41> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; sloweruser system elapsed 10.67 0.00 10.69> system.time( load( 'd:/r2.0/t3.rda')) # different smaller fileuser system elapsed 10.51 0.00 10.52> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again: sloweruser system elapsed 14.61 0.00 14.61 --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 2 minor = 6.0 year = 2007 month = 10 day = 03 svn rev = 43063 language = R version.string = R version 2.6.0 (2007-10-03) Windows XP (build 2600) Service Pack 2.0 Locale: LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MON ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 Search Path: Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base
On Thu, 11 Oct 2007, Mark.Bravington at csiro.au wrote:> I'm encountering excruciatingly slow load times for character vectors in > R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes > character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, > repeated loads of the same set of files are near-instantaneous. > > The problem is proving tricky to reproduce consistently from scratch, so > I have attached the 3 files used in the examples below.There was no attachment: since these are (I presume) binary files, can you not put them on a website (as suggested by the posting guide)?> If I create a similar-looking object from scratch, then save it and > re-load it a few times, the problem doesn't always occur... at least not > in that session. > > > FWIW I have noticed that the time taken to load seems to be roughly a > power of 2 of the "base slow load time"-- could be a red herring. > > The problem seems specific to character vectors-- I noticed it with > entire workspaces and have whittled it down to char vecs only. > > The example below is from a brand-new session with only the basic > packages loaded; delays in my real sessions are much longer.Can you please try R-patched or R-devel. We've found and solved a couple of performance issues with creating STRSXPs, but with character vectors of the millions of elements. I tried several examples of around 10000 elements and got times of at most 0.05 secs in 2.6.0. These included parts of those examples on which we had seen performance issues. A few clues: - even your base time is much slower than I would expect. - you say 'a 15K file ... object size ~0.5MB'. That's pretty phenomenal compression, and I am seeing file sizes more like 100Kb for objects that size. Since object.size does take into account duplication, one way to get that would be to have all unique elements. At ca 50bytes per element you would need an average string length of about 15 chars. Such an object takes about 200Kb as a .rda file.> > > Mark Bravington > CSIRO Mathematical & Information Sciences > Marine Laboratory > Castray Esplanade > Hobart 7001 > TAS > > ph (+61) 3 6232 5118 > fax (+61) 3 6232 5012 > mob (+61) 438 315 623 > > > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > >> system.time( load( 'd:/r2.0/t1.rda')) > user system elapsed > 0.5 0.0 0.5 >> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower > user system elapsed > 3.5 0.0 3.5 >> system.time( load( 'd:/r2.0/t1.rda')) > user system elapsed > 4.13 0.00 4.13 >> system.time( load( 'd:/r2.0/t1.rda')) > user system elapsed > 3.51 0.00 3.52 > >> system.time( load( 'd:/r2.0/t2.rda')) # different bigger file > user system elapsed > 4.42 0.00 4.42 >> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower > user system elapsed > 10.44 0.00 10.44 >> system.time( load( 'd:/r2.0/t2.rda')) > user system elapsed > 10.79 0.00 10.80 >> system.time( load( 'd:/r2.0/t2.rda')) > user system elapsed > 10.39 0.00 10.41 >> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; slower > user system elapsed > 10.67 0.00 10.69 >> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file > user system elapsed > 10.51 0.00 10.52 >> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again: slower > user system elapsed > 14.61 0.00 14.61 > > > > --please do not edit the information below-- > > Version: > platform = i386-pc-mingw32 > arch = i386 > os = mingw32 > system = i386, mingw32 > status > major = 2 > minor = 6.0 > year = 2007 > month = 10 > day = 03 > svn rev = 43063 > language = R > version.string = R version 2.6.0 (2007-10-03) > > Windows XP (build 2600) Service Pack 2.0 > > Locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MON > ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 > > Search Path: > Search Path: > .GlobalEnv, package:stats, package:graphics, package:grDevices, > package:utils, package:datasets, package:methods, Autoloads, > package:base >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Problem fixed by R-patched, thanks; see comments below.>On Thu, 11 Oct 2007, Mark.Bravington at csiro.au wrote: > >> I'm encountering excruciatingly slow load times for character vectors>> in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes>> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, >> repeated loads of the same set of files are near-instantaneous. >> >> The problem is proving tricky to reproduce consistently from scratch,>> so I have attached the 3 files used in the examples below. > >There was no attachment: since these are (I presume) binary files, canyou>not put them on a website (as suggested by the posting guide)?Sorry, I would have if I could, but can't at present. The attachments got through OK to me at least, though. If anyone does have an interest in the files, let me know off-list and I'll re-send as a zip or somesuch.> >> If I create a similar-looking object from scratch, then save it and >> re-load it a few times, the problem doesn't always occur... at leastnot>> in that session. >> >> >> FWIW I have noticed that the time taken to load seems to be roughly a>> power of 2 of the "base slow load time"-- could be a red herring. >> >> The problem seems specific to character vectors-- I noticed it with >> entire workspaces and have whittled it down to char vecs only. >> >> The example below is from a brand-new session with only the basic >> packages loaded; delays in my real sessions are much longer. > >Can you please try R-patched or R-devel. We've found and solved acouple>of performance issues with creating STRSXPs, but with character vectorsof>the millions of elements.Thanks; R-patched fixed it. I did look in R-devel NEWS before posting, but that doesn't mention the bug fix on CHARSXP which is in the R-patched NEWS, so I didn't persist. FWIW in case work is still being done on new CHARSXP: my problems were with much shorter vectors (~1e4) than the millions mentioned in patched-NEWS, and the strings were short too: 90% were '' and the other 10% were 'a'. Also, when the previously offending objects are loaded into 2.6.0patched, they are 3-10X smaller (according to object.size) than in unpatched-- I was also amazed by the compression! Looks like unpatched R was allocating at least a 32-byte memory entry per individual zero-character string. It is down to about 4 bytes per (zero-character) string in R-patched. Mark Bravington> >I tried several examples of around 10000 elements and got times of atmost>0.05 secs in 2.6.0. These included parts of those examples on which we>had seen performance issues. > >A few clues: > >- even your base time is much slower than I would expect. > >- you say 'a 15K file ... object size ~0.5MB'. That's prettyphenomenal> compression, and I am seeing file sizes more like 100Kb for objectsthat> size. Since object.size does take into account duplication, one wayto> get that would be to have all unique elements. At ca 50bytes per > element you would need an average string length of about 15 chars.Such> an object takes about 200Kb as a .rda file. > > >> >> >> Mark Bravington >> CSIRO Mathematical & Information Sciences >> Marine Laboratory >> Castray Esplanade >> Hobart 7001 >> TAS >> >> ph (+61) 3 6232 5118 >> fax (+61) 3 6232 5012 >> mob (+61) 438 315 623 >> >> >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. Type 'q()' to >> quit R. >> >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 0.5 0.0 0.5 >>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower >> user system elapsed >> 3.5 0.0 3.5 >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 4.13 0.00 4.13 >>> system.time( load( 'd:/r2.0/t1.rda')) >> user system elapsed >> 3.51 0.00 3.52 >> >>> system.time( load( 'd:/r2.0/t2.rda')) # different bigger file >> user system elapsed >> 4.42 0.00 4.42 >>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower >> user system elapsed >> 10.44 0.00 10.44 >>> system.time( load( 'd:/r2.0/t2.rda')) >> user system elapsed >> 10.79 0.00 10.80 >>> system.time( load( 'd:/r2.0/t2.rda')) >> user system elapsed >> 10.39 0.00 10.41 >>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; >>> slower >> user system elapsed >> 10.67 0.00 10.69 >>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file >> user system elapsed >> 10.51 0.00 10.52 >>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again:slower>> user system elapsed >> 14.61 0.00 14.61 >> >> >> >> --please do not edit the information below-- >> >> Version: >> platform = i386-pc-mingw32 >> arch = i386 >> os = mingw32 >> system = i386, mingw32 >> status >> major = 2 >> minor = 6.0 >> year = 2007 >> month = 10 >> day = 03 >> svn rev = 43063 >> language = R >> version.string = R version 2.6.0 (2007-10-03) >> >> Windows XP (build 2600) Service Pack 2.0 >> >> Locale: >>LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M>> ON >>ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252>> >> Search Path: >> Search Path: >> .GlobalEnv, package:stats, package:graphics, package:grDevices, >> package:utils, package:datasets, package:methods, Autoloads, >> package:base >> > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595 >