Tony Breyal
2009-Sep-07 15:40 UTC
[R] using an array of strings with strsplit, issue when including a space in split criteria
Dear all, I'm having a problem understanding why a split does not occur with in the 2nd use of the function strsplit below: # text strings> txt <- c("sales to 23 August 2008 published 29 August",+ "sales to 6 September 2008 published?11 September") # first use> strsplit(txt, 'published', fixed=TRUE)[[1]] [1] "sales to 23 August 2008 " " 29 August" [[2]] [1] "sales to 6 September 2008 " "?11 September" # second use, but with a space ' ' in the split> strsplit(txt, 'published ', fixed=TRUE)[[1]] [1] "sales to 23 August 2008 " "29 August" [[2]] [1] "sales to 6 September 2008 published?11 September" Thank you kindly for any help in advance. Tony O/S: Win Vista Ultimate> sessionInfo()R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-0
Juliet Hannah
2009-Sep-08 01:17 UTC
[R] using an array of strings with strsplit, issue when including a space in split criteria
I get a different result: txt <- c("sales to 23 August 2008 published 29 August","sales to 6 September 2008 published 11 September") strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] "sales to 23 August 2008 " "29 August" [[2]] [1] "sales to 6 September 2008 " "11 September"> sessionInfo()R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyal<tony.breyal at googlemail.com> wrote:> Dear all, > > I'm having a problem understanding why a split does not occur with in > the 2nd use of the function strsplit below: > > # text strings >> txt <- c("sales to 23 August 2008 published 29 August", > + "sales to 6 September 2008 published?11 September") > > # first use >> strsplit(txt, 'published', fixed=TRUE) > [[1]] > [1] "sales to 23 August 2008 " " 29 August" > > [[2]] > [1] "sales to 6 September 2008 " "?11 September" > > # second use, but with a space ' ' in the split >> strsplit(txt, 'published ', fixed=TRUE) > [[1]] > [1] "sales to 23 August 2008 " "29 August" > > [[2]] > [1] "sales to 6 September 2008 published?11 September" > > Thank you kindly for any help in advance. > Tony > > O/S: Win Vista Ultimate >> sessionInfo() > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > > other attached packages: > [1] RODBC_1.3-0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2009-Sep-08 02:30 UTC
[R] using an array of strings with strsplit, issue when including a space in split criteria
I am using the exact same version of R as you also on Vista but can't reproduce your result. For me it splits properly. Try starting R like this (modify path if needed) from the Windows cmd line: \Program Files\R\R-2.9.2\bin\Rgui --vanilla and then try it. On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyal<tony.breyal at googlemail.com> wrote:> Dear all, > > I'm having a problem understanding why a split does not occur with in > the 2nd use of the function strsplit below: > > # text strings >> txt <- c("sales to 23 August 2008 published 29 August", > + "sales to 6 September 2008 published?11 September") > > # first use >> strsplit(txt, 'published', fixed=TRUE) > [[1]] > [1] "sales to 23 August 2008 " " 29 August" > > [[2]] > [1] "sales to 6 September 2008 " "?11 September" > > # second use, but with a space ' ' in the split >> strsplit(txt, 'published ', fixed=TRUE) > [[1]] > [1] "sales to 23 August 2008 " "29 August" > > [[2]] > [1] "sales to 6 September 2008 published?11 September" > > Thank you kindly for any help in advance. > Tony > > O/S: Win Vista Ultimate >> sessionInfo() > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > > other attached packages: > [1] RODBC_1.3-0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Reasonably Related Threads
- reading formatted txt file into a data frame
- Extracting text from html code using the RCurl package.
- How to suppress errors from htmlTreeParse() function in XML package?
- How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
- How to get NA's into the output of xtabs?