thr3ads.net - R help - [R] Performance of concatenating strings [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Tamara Steijger

2007-Oct-31 11:50 UTC

[R] Performance of concatenating strings

Hi,

I would like to ask how the paste(S1, S2, sep="") function internally
works. Are the two stings copied to a new String?

I have a program where successively strings are build up. First the  
program calls an external function and depending on the result it  
builds up strings to visualize the result. The external function is  
really fast, also for huge input data. But the building of the  
strings takes much to long for huge input sizes. So I'm wondering if  
the concatenating could be the problem, like using String in Java  
instead of StringBuffer. Is there something like StringBuffers in R  
also?

Thanks,
T. Steijger

jim holtman

2007-Oct-31 13:24 UTC

head link

[R] Performance of concatenating strings

How often are you doing it?  How large are your strings?  What exactly
are you  doing with them?  Have you considered keeping them in a list
and then using 'do.call' to do the concatenation all at once?  Have
you used Rprof on your program to see where time is being spent? How
much memory, and OS, do you have?  How 'fast' is 'fast'?  What
is your
current performance and what are your goals?

You have not supplied sufficient information; you need to do some more
homework and provide some actual data.

On 10/31/07, Tamara Steijger <smara1 at gmx.de>
wrote:> Hi,
>
> I would like to ask how the paste(S1, S2, sep="") function
internally
> works. Are the two stings copied to a new String?
>
> I have a program where successively strings are build up. First the
> program calls an external function and depending on the result it
> builds up strings to visualize the result. The external function is
> really fast, also for huge input data. But the building of the
> strings takes much to long for huge input sizes. So I'm wondering if
> the concatenating could be the problem, like using String in Java
> instead of StringBuffer. Is there something like StringBuffers in R
> also?
>
> Thanks,
> T. Steijger
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

hadley wickham

2007-Oct-31 13:59 UTC

head link

[R] Performance of concatenating strings

On 10/31/07, Tamara Steijger <smara1 at gmx.de>
wrote:> Hi,
>
> I would like to ask how the paste(S1, S2, sep="") function
internally
> works. Are the two stings copied to a new String?
I'm not 100% sure, but I'd suspect so, as this is the default
behaviour in pretty much every programming language.
> I have a program where successively strings are build up. First the
> program calls an external function and depending on the result it
> builds up strings to visualize the result. The external function is
> really fast, also for huge input data. But the building of the
> strings takes much to long for huge input sizes. So I'm wondering if
> the concatenating could be the problem, like using String in Java
> instead of StringBuffer. Is there something like StringBuffers in R
> also?
If you know how many components there will be in the string, it's
probably best to create a character vector (str <-
vector("character",
100)) filling it up as you go, and then pasting it together at the
very end.  If you don't know how many components there will be, you
will need to do something a bit more sophisticated.

You might also trying profiling your function to make sure the
slowness really is being caused by what you think it is.

Hadley

-- 
http://had.co.nz/

Tamara Steijger

2007-Oct-31 14:09 UTC

head link

[R] Performance of concatenating strings

Hi,
thanks for the fast answers. I'm sorry, if I was not clear enough in  
my question. The problem we are trying to solve is LetterDisplay.  
There is already a heuristic implemented in the multcompView package  
of Hans-Peter Piepho. We implemented an exact fixed parameter  
tractable implementation for that problem (implementation in OCaml).

The new R function works similar like the original multcompLetters  
function. But instead of running the heuristic the input is forwarded  
to the OCaml program. Because the new multcompView function should  
have the same return format as the original one (users don't have to  
bother about the new implementation and still use the old code) the  
output then has to be formatted in the same way as it is done by now  
in multcompLetters.

The output of the OCaml program is a file containing a matrix. Names  
that occur in the same row of the matrix are not significant  
different, so they should have a common letter in the LetterDisplay,  
i.e. each row in the matrix corresponds to one letter in the  
LetterDisplay. Right now those LetterDisplays are build up character  
by character like it was in the original function. But as you already  
mentioned storing for each name the characters as a list (i.e. a  
matrix with a row for each name and so many columns as lines in the  
output file) and then concatenating all at once probably will improve  
the performance already significantly. And runtime is in this case  
probably more critical than memory issues.

I didn't use Rprof so far, but I will definitely try it. But just to  
get an idea of what the problem is: The input for a LetterDisplay can  
also be given as a graph. One of our test instances is a graph with  
121 nodes and 5108 edges. The OCaml program needs ca. 1 second to  
compute the result. But then computing the LetterDisplays needs more  
than two hours ...

Thanks a lot,
T. Steijger

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2007 - Performance of concatenating strings

[R] Performance of concatenating strings

[R] Performance of concatenating strings

[R] Performance of concatenating strings

[R] Performance of concatenating strings

Seemingly Similar Threads