Henrik Bengtsson (max 7Mb)
2006-May-05 18:58 UTC
[Rd] str() with attr(*, "names") is extremely slow for long vectors
Hi, I noticed some time ago that, for instance, named vectors that are really makes str() really slow when displaying the names attribute. I don't know exactly when this started, but it wasn't the case say 1-2 years ago. Example (on a WinXP 1.8GHz):> s <- 1:1000; names(s) <- s > system.time(str(s))Named int [1:1000] 1 2 3 4 5 6 7 8 9 10 ... - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ... [1] 0.08 0.00 0.09 NA NA> s <- 1:100000; names(s) <- s > system.time(str(s))Named int [1:100000] 1 2 3 4 5 6 7 8 9 10 ... - attr(*, "names")= chr [1:100000] "1" "2" "3" "4" ... [1] 8.82 0.00 9.11 NA NA I looks like all strings elements are processed although only the first few are displayed. Cheers Henrik
Martin Maechler
2006-May-08 08:28 UTC
[Rd] str() with attr(*, "names") is extremely slow for long vectors
>>>>> "HenrikB" == Henrik Bengtsson (max 7Mb) <hb at stat.berkeley.edu> >>>>> on Fri, 5 May 2006 11:58:19 -0700 writes:HenrikB> Hi, HenrikB> I noticed some time ago that, for instance, named vectors that are HenrikB> really makes str() really slow when displaying the names attribute. HenrikB> I don't know exactly when this started, but it HenrikB> wasn't the case say 1-2 years ago. Example (on a WinXP 1.8GHz): Thank you, Henrik, for the note. Indeed, str() is unnecessary slow for long character vectors in general, not just when they are names(); and Rprof() + Rprofsummary() quickly reveal were the culprits lie. This shouldn't be too hard to improve, I'm having a look. Martin Maechler, ETH Zurich >> s <- 1:1000; names(s) <- s >> system.time(str(s)) HenrikB> Named int [1:1000] 1 2 3 4 5 6 7 8 9 10 ... HenrikB> - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ... HenrikB> [1] 0.08 0.00 0.09 NA NA >> s <- 1:100000; names(s) <- s >> system.time(str(s)) HenrikB> Named int [1:100000] 1 2 3 4 5 6 7 8 9 10 ... HenrikB> - attr(*, "names")= chr [1:100000] "1" "2" "3" "4" ... HenrikB> [1] 8.82 0.00 9.11 NA NA HenrikB> I looks like all strings elements are processed although only the HenrikB> first few are displayed. HenrikB> Cheers HenrikB> Henrik HenrikB> ______________________________________________ HenrikB> R-devel at r-project.org mailing list HenrikB> https://stat.ethz.ch/mailman/listinfo/r-devel>>>>> "HenrikB" == Henrik Bengtsson (max 7Mb) <hb at stat.berkeley.edu> >>>>> on Fri, 5 May 2006 11:58:19 -0700 writes:HenrikB> Hi, I noticed some time ago that, for instance, HenrikB> named vectors that are really makes str() really HenrikB> slow when displaying the names attribute. I don't HenrikB> know exactly when this started, but it wasn't the HenrikB> case say 1-2 years ago. Example (on a WinXP HenrikB> 1.8GHz): >> s <- 1:1000; names(s) <- s system.time(str(s)) HenrikB> Named int [1:1000] 1 2 3 4 5 6 7 8 9 10 ... - HenrikB> attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ... HenrikB> [1] 0.08 0.00 0.09 NA NA >> s <- 1:100000; names(s) <- s system.time(str(s)) HenrikB> Named int [1:100000] 1 2 3 4 5 6 7 8 9 10 ... - HenrikB> attr(*, "names")= chr [1:100000] "1" "2" "3" "4" HenrikB> ... [1] 8.82 0.00 9.11 NA NA HenrikB> I looks like all strings elements are processed HenrikB> although only the first few are displayed. HenrikB> Cheers HenrikB> Henrik HenrikB> ______________________________________________ HenrikB> R-devel at r-project.org mailing list HenrikB> https://stat.ethz.ch/mailman/listinfo/r-devel
Gerhard Thallinger
2006-May-13 10:54 UTC
[Rd] str() with attr(*, "names") is extremely slow for long vectors
>>>>> "HenrikB" == Henrik Bengtsson (max 7Mb) <hb at stat.berkeley.edu> >>>>> on Fri, 5 May 2006 11:58:19 -0700 writes:HenrikB> Hi, HenrikB> I noticed some time ago that, for instance, named vectors HenrikB> that are really makes str() really slow when displaying the HenrikB> names attribute. HenrikB> I don't know exactly when this started, but it wasn't the HenrikB> case say 1-2 years ago. Example (on a WinXP 1.8GHz): It got slower with R 2.3.0. Comparing str() for a big exprSet object from the "Biobase" package I got the following numbers (system.time(str(anaexp)) on WinXP 1.8 GHz): R 2.2.0 1. 14.64 0.13 14.90 NA NA 2. 4.33 0.09 4.43 NA NA 3. 4.20 0.15 4.38 NA NA R 2.3.0 1. 65.36 0.18 66.12 NA NA 2. 51.75 0.21 52.55 NA NA 3. 51.79 0.17 52.45 NA NA One can notice a considerable speed-up in the 2nd & 3rd call to str() in R 2.2.0, which is much less pronounced in R 2.3.0. Hth Gerhard ------------------------------------------------------------------------ DI Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 5340 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html
Gerhard Thallinger
2006-May-14 13:19 UTC
[Rd] str() with attr(*, "names") is extremely slow for long vectors
>>>>> "MartinM" == Martin Maechler maechler at stat.math.ethz.ch >>>>> Sat, May 13 2006 15:16:19 +0200 writes:MartinM> But have you looked at R 2.3.0-patched at all? MartinM> MartinM> I did acknowledge that str(<long character>) had become MartinM> unacceptably slow, and had implemented a simple patch MartinM> almost "immediately". Yes, I did. Here are the timings (WinXP 1.8 GHz): R 2.2.0 1. 14.64 0.13 14.90 NA NA 2. 4.33 0.09 4.43 NA NA 3. 4.20 0.15 4.38 NA NA R 2.3.0 1. 65.36 0.18 66.12 NA NA 2. 51.75 0.21 52.55 NA NA 3. 51.79 0.17 52.45 NA NA R 2.3.0 Patched (2006-05-11 r38037) 1. 44.09 0.09 44.45 NA NA 2. 34.96 0.08 35.66 NA NA 3. 34.52 0.07 34.81 NA NA Hth Gerhard ------------------------------------------------------------------------ DI Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 5340 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html
Gerhard Thallinger
2006-May-17 11:27 UTC
[Rd] str() with attr(*, "names") is extremely slow for long vectors
>>>>> "MartinM" == Martin Maechler maechler at stat.math.ethz.ch >>>>> Sat, May 13 2006 15:16:19 +0200 writes:MartinM> But have you looked at R 2.3.0-patched at all? MartinM> MartinM> I did acknowledge that str(<long character>) had become MartinM> unacceptably slow, and had implemented a simple patch MartinM> almost "immediately".> Yes, I did. Here are the timings (WinXP 1.8 GHz):> R 2.3.0 Patched (2006-05-11 r38037) > > 1. 44.09 0.09 44.45 NA NA > 2. 34.96 0.08 35.66 NA NA > 3. 34.52 0.07 34.81 NA NAWhen I made the test I used an incomplete version of R patched (the new version of the utils package was missing). With the complete version of R patch the timings are now the same as with R 2.2.0. Gerhard ------------------------------------------------------------------------ DI Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 5340 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html