tradenet
2009-Jul-10 12:18 UTC
[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
I was working with the rmetrics portfolioBacktesting function and dug into the code to try to find why my formula with 113 items, i.e. A1 thru A113, was being truncated and I only get 85 items, not 113. Is it due to a string length limitation in R or is it a bug in the strsplit or gsub functions, or in my string? I'd very much appreciate any suggestions ============Input script: backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113 benchmarkName = as.character(backtestFormula)[2] print(as.character(backtestFormula)[3]) print(benchmarkName) assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]), "\\+")[[1]] nAssets = length(assetsNames) print(nAssets) list(assetsNames) ===============output:> backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113> benchmarkName = as.character(backtestFormula)[2]> print(benchmarkName)[1] "SPX"> print(as.character(backtestFormula)[3])[1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 + A25 + A26 + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 + A38 + A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 + A50 + A51 + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 + A63 + A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 + A75 + A76 + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + "> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]), > "\\+")[[1]]> print(nAssets)[1] 85> nAssets = length(assetsNames)> print(nAssets)[1] 85> list(assetsNames)[[1]] [1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9" "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23" "A24" "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33" [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43" "A44" "A45" "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56" "A57" "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66" [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76" "A77" "A78" "A79" "A80" "A81" "A82" "A83" "A84" "A85" -- View this message in context: nabble.com/strange-strsplit-gsub-problem-0-is-this-a-bug-or-a-string-length-limitation--tp24426457p24426457.html Sent from the R help mailing list archive at Nabble.com.
Marc Schwartz
2009-Jul-10 12:58 UTC
[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
On Jul 10, 2009, at 7:18 AM, tradenet wrote:> > I was working with the rmetrics portfolioBacktesting function and > dug into > the code to try to find why my formula with 113 items, i.e. A1 thru > A113, > was being truncated and I only get 85 items, not 113. > > Is it due to a string length limitation in R or is it a bug in the > strsplit > or gsub functions, or in my string? > > I'd very much appreciate any suggestions > > > ============Input script: > > backtestFormula<- > SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113 > benchmarkName = as.character(backtestFormula)[2] > print(as.character(backtestFormula)[3]) > print(benchmarkName) > assetsNames <- strsplit(gsub(" ", "", > as.character(backtestFormula)[3]), > "\\+")[[1]] > nAssets = length(assetsNames) > print(nAssets) > list(assetsNames) > > ===============output: > > >> backtestFormula<- >> SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113 > >> benchmarkName = as.character(backtestFormula)[2] > >> print(benchmarkName) > [1] "SPX" > >> print(as.character(backtestFormula)[3]) > [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + > A13 + > A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 + > A25 + A26 > + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 + > A38 + > A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 + > A50 + A51 > + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 + > A63 + > A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 + > A75 + A76 > + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + " > >> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula) >> [3]), >> "\\+")[[1]] > >> print(nAssets) > [1] 85 > >> nAssets = length(assetsNames) > >> print(nAssets) > [1] 85 > >> list(assetsNames) > [[1]] > [1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9" "A10" > "A11" "A12" > "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23" > "A24" > "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33" > [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43" > "A44" "A45" > "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56" > "A57" > "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66" > [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76" > "A77" "A78" > "A79" "A80" "A81" "A82" "A83" "A84" "A85"You appear to be bumping up against the 500 character length limit of as.character() when used with R language objects. Review the Note in ?as.character: "as.character truncates components of language objects to 500 characters (was about 70 before 1.3.1)." It is not a string length limitation or a bug in strsplit(): > paste("A", 1:113, sep = "", collapse = " + ") [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 + A25 + A26 + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 + A38 + A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 + A50 + A51 + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 + A63 + A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 + A75 + A76 + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + A86 + A87 + A88 + A89 + A90 + A91 + A92 + A93 + A94 + A95 + A96 + A97 + A98 + A99 + A100 + A101 + A102 + A103 + A104 + A105 + A106 + A107 + A108 + A109 + A110 + A111 + A112 + A113" > nchar(paste("A", 1:113, sep = "", collapse = " + ")) [1] 680 > strsplit(paste("A", 1:113, sep = "", collapse = " + "), " \\+ ")[[1]] [1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9" [10] "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18" [19] "A19" "A20" "A21" "A22" "A23" "A24" "A25" "A26" "A27" [28] "A28" "A29" "A30" "A31" "A32" "A33" "A34" "A35" "A36" [37] "A37" "A38" "A39" "A40" "A41" "A42" "A43" "A44" "A45" [46] "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" [55] "A55" "A56" "A57" "A58" "A59" "A60" "A61" "A62" "A63" [64] "A64" "A65" "A66" "A67" "A68" "A69" "A70" "A71" "A72" [73] "A73" "A74" "A75" "A76" "A77" "A78" "A79" "A80" "A81" [82] "A82" "A83" "A84" "A85" "A86" "A87" "A88" "A89" "A90" [91] "A91" "A92" "A93" "A94" "A95" "A96" "A97" "A98" "A99" [100] "A100" "A101" "A102" "A103" "A104" "A105" "A106" "A107" "A108" [109] "A109" "A110" "A111" "A112" "A113" HTH, Marc Schwartz
Andrew
2009-Jul-10 14:50 UTC
[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
Thanks Marc. I really appreciate your help. I'm going to try my function hack. I forwarded your suggestion to Yohan at rmetrics. Warm regards, Andrew --- On Fri, 7/10/09, Marc Schwartz <marc_schwartz@me.com> wrote: From: Marc Schwartz <marc_schwartz@me.com> Subject: Re: [R] strange strsplit gsub problem 0 is this a bug or a string length limitation? To: "tradenet" <nodecorum@yahoo.com> Cc: r-help@r-project.org Date: Friday, July 10, 2009, 7:34 AM On Jul 10, 2009, at 9:07 AM, tradenet wrote:> > Thanks Marc! > > I just found that the ~500 char limitation via an online search for the > specs for the formula class > The rmetrics library I'm using get's it's character array of assets by > parsing a formula passed as an input parameter to the portfolioBacktest > function. Can I copy the portfolioBacktest function from the source, call > it portfolioBackest_hack, add an additional pamater, an array of asset > names, and have my version use this argument instead of parsing the formula? > I'm fairly new to R so I don't know if R will find my function and if my > function will find the other fPortfolio functions that may be referenced by > the original, non "_hack" version of the function. > > Warm regards, > > AndrewHi Andrew, Happy to help. In terms of your proposal as a short term fix, it may be possible to do that. If you do create a new local function and call it directly, it will be seen instead of the package default version of the same function. However, without reviewing the code and package in detail, you have to be careful about other function dependencies and namespace issues that may be present. I would go ahead and try it to see it it works. A better and longer term approach would be to have the function author(s) modify the way in which they manipulate the formula object. They may wish to review this thread from 2001: stat.ethz.ch/pipermail/r-help/2001-August/014628.html Back then, the as.character() limit was 60 and was increased by Prof. Ripley to 500 in response to that discussion. However, in that thread, Prof. Ripley also proposes a better way of manipulating the formula object passed to the function. That approach uses deparse() rather than using as.character(). HTH, Marc Schwartz [[alternative HTML version deleted]]
Gabor Grothendieck
2009-Jul-10 15:51 UTC
[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
Marc has already answered your question. It may also be possible to avoid the long formua in the first place in the context of lm and certain similar functions as we can write this lm(y1 ~ x1 + x2 + x3 + x4, anscombe) as lm(y1 ~., anscombe[1:5]) where anscombe is a data set that is built into R. On Fri, Jul 10, 2009 at 8:18 AM, tradenet<nodecorum at yahoo.com> wrote:> > I was working with the rmetrics portfolioBacktesting function and dug into > the code to try to find why my formula with 113 items, i.e. A1 thru A113, > was being truncated and I only get 85 items, not 113. > > Is it due to a string length limitation in R or is it a bug in the strsplit > or gsub functions, or in my string? > > I'd very much appreciate any suggestions > > > ============Input script: > > backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113 > benchmarkName = as.character(backtestFormula)[2] > print(as.character(backtestFormula)[3]) > print(benchmarkName) > ? ?assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]), > "\\+")[[1]] > ? ?nAssets = length(assetsNames) > print(nAssets) > list(assetsNames) > > ===============output: > > >> backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113 > >> benchmarkName = as.character(backtestFormula)[2] > >> print(benchmarkName) > [1] "SPX" > >> print(as.character(backtestFormula)[3]) > [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + > A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 + A25 + A26 > + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 + A38 + > A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 + A50 + A51 > + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 + A63 + > A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 + A75 + A76 > + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + " > >> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]), >> "\\+")[[1]] > >> print(nAssets) > [1] 85 > >> nAssets = length(assetsNames) > >> print(nAssets) > [1] 85 > >> list(assetsNames) > [[1]] > ?[1] "A1" ?"A2" ?"A3" ?"A4" ?"A5" ?"A6" ?"A7" ?"A8" ?"A9" ?"A10" "A11" "A12" > "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23" "A24" > "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33" > [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43" "A44" "A45" > "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56" "A57" > "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66" > [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76" "A77" "A78" > "A79" "A80" "A81" "A82" "A83" "A84" "A85" > > > > -- > View this message in context: nabble.com/strange-strsplit-gsub-problem-0-is-this-a-bug-or-a-string-length-limitation--tp24426457p24426457.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >