Jonathan
2011-Apr-11 19:48 UTC
[R] Getting many substrings but only loading the original string one time.
Hi All, I'm looking for a way to get many substrings from a longer string and then stitch them together. But, since the longer string is really, really long (like 250 MB long), I don't want to do this in a loop and load and re-load the longer string many times. Does anybody have an idea? Maybe I could pass in two vectors (the first would have the starting coordinates, and the second would have the stopping coordinates), so it would be like a vectorized version of substr, where start and stop would be vector instead of single integers. Example (I'm reducing the size of the string for the example) of how this might work:> longerString <- 'HelloThisIsMyLongerString" > startVector <- c(2,6,4) > stopVector <- c(4,10,5)> substrings <- vectorized_substr(longerString, startVector, stop Vector) > longerString[1] "ell" "ThisI" "lo" Then I'd like to concatenate them (there will be many of them)> result <- paste(longerString,collapse='') > result[1] "ellThisIlo" (perhaps the paste command as I've done it is the best way, but depending on how the substrings are reported there may be different ways). Thanks! Jonathan [[alternative HTML version deleted]]
Duncan Murdoch
2011-Apr-11 20:14 UTC
[R] Getting many substrings but only loading the original string one time.
On 11/04/2011 3:48 PM, Jonathan wrote:> Hi All, > I'm looking for a way to get many substrings from a longer string and > then stitch them together. But, since the longer string is really, really > long (like 250 MB long), I don't want to do this in a loop and load and > re-load the longer string many times. Does anybody have an idea? > > Maybe I could pass in two vectors (the first would have the starting > coordinates, and the second would have the stopping coordinates), so it > would be like a vectorized version of substr, where start and stop would be > vector instead of single integers. > > Example (I'm reducing the size of the string for the example) of how this > might work: > > > longerString<- 'HelloThisIsMyLongerString" > > startVector<- c(2,6,4) > > stopVector<- c(4,10,5) > > > substrings<- vectorized_substr(longerString, startVector, stop Vector) > > longerString > [1] "ell" "ThisI" "lo"Use substring(), not substr(). It is vectorized: > substring(longerString, startVector, stopVector) [1] "ell" "ThisI" "lo" It does this by replicating the longerString, but that doesn't mean actual copies are made: just multiple pointers to the same big one. Duncan Murdoch> Then I'd like to concatenate them (there will be many of them) > > > result<- paste(longerString,collapse='') > > result > [1] "ellThisIlo" > > (perhaps the paste command as I've done it is the best way, but depending on > how the substrings are reported there may be different ways). Thanks! > > Jonathan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- Sponsored development - Monodirectional audio handling
- [PATCH 1/1] uid for expansion in ControlPath
- Preservation of short and long filenames in Windows 9x with Samba
- column name changes based on substrings
- Bug: time complexity of substring is quadratic as string size and number of substrings increases