Hi, I would like to ask how the paste(S1, S2, sep="") function internally works. Are the two stings copied to a new String? I have a program where successively strings are build up. First the program calls an external function and depending on the result it builds up strings to visualize the result. The external function is really fast, also for huge input data. But the building of the strings takes much to long for huge input sizes. So I'm wondering if the concatenating could be the problem, like using String in Java instead of StringBuffer. Is there something like StringBuffers in R also? Thanks, T. Steijger
How often are you doing it? How large are your strings? What exactly are you doing with them? Have you considered keeping them in a list and then using 'do.call' to do the concatenation all at once? Have you used Rprof on your program to see where time is being spent? How much memory, and OS, do you have? How 'fast' is 'fast'? What is your current performance and what are your goals? You have not supplied sufficient information; you need to do some more homework and provide some actual data. On 10/31/07, Tamara Steijger <smara1 at gmx.de> wrote:> Hi, > > I would like to ask how the paste(S1, S2, sep="") function internally > works. Are the two stings copied to a new String? > > I have a program where successively strings are build up. First the > program calls an external function and depending on the result it > builds up strings to visualize the result. The external function is > really fast, also for huge input data. But the building of the > strings takes much to long for huge input sizes. So I'm wondering if > the concatenating could be the problem, like using String in Java > instead of StringBuffer. Is there something like StringBuffers in R > also? > > Thanks, > T. Steijger > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
On 10/31/07, Tamara Steijger <smara1 at gmx.de> wrote:> Hi, > > I would like to ask how the paste(S1, S2, sep="") function internally > works. Are the two stings copied to a new String?I'm not 100% sure, but I'd suspect so, as this is the default behaviour in pretty much every programming language.> I have a program where successively strings are build up. First the > program calls an external function and depending on the result it > builds up strings to visualize the result. The external function is > really fast, also for huge input data. But the building of the > strings takes much to long for huge input sizes. So I'm wondering if > the concatenating could be the problem, like using String in Java > instead of StringBuffer. Is there something like StringBuffers in R > also?If you know how many components there will be in the string, it's probably best to create a character vector (str <- vector("character", 100)) filling it up as you go, and then pasting it together at the very end. If you don't know how many components there will be, you will need to do something a bit more sophisticated. You might also trying profiling your function to make sure the slowness really is being caused by what you think it is. Hadley -- http://had.co.nz/
Hi, thanks for the fast answers. I'm sorry, if I was not clear enough in my question. The problem we are trying to solve is LetterDisplay. There is already a heuristic implemented in the multcompView package of Hans-Peter Piepho. We implemented an exact fixed parameter tractable implementation for that problem (implementation in OCaml). The new R function works similar like the original multcompLetters function. But instead of running the heuristic the input is forwarded to the OCaml program. Because the new multcompView function should have the same return format as the original one (users don't have to bother about the new implementation and still use the old code) the output then has to be formatted in the same way as it is done by now in multcompLetters. The output of the OCaml program is a file containing a matrix. Names that occur in the same row of the matrix are not significant different, so they should have a common letter in the LetterDisplay, i.e. each row in the matrix corresponds to one letter in the LetterDisplay. Right now those LetterDisplays are build up character by character like it was in the original function. But as you already mentioned storing for each name the characters as a list (i.e. a matrix with a row for each name and so many columns as lines in the output file) and then concatenating all at once probably will improve the performance already significantly. And runtime is in this case probably more critical than memory issues. I didn't use Rprof so far, but I will definitely try it. But just to get an idea of what the problem is: The input for a LetterDisplay can also be given as a graph. One of our test instances is a graph with 121 nodes and 5108 edges. The OCaml program needs ca. 1 second to compute the result. But then computing the LetterDisplays needs more than two hours ... Thanks a lot, T. Steijger