Is there a simple way in R to remove all characters from a string other than those in a specified set? For example, I want to keep only the digits 0-9 in a string. In general, I have found the string handling abilities of R a bit limited. (Of course it's great for stats in general). Is there a good reference on this? Or should R programmers dump their output to a text file and use something like Perl or Python for sophisticated text processing? I am familiar with the basic functions such as nchar, substring, as.integer, print, cat, sprintf etc.
Just gsub() non-numerics with ""; e.g.:> gsub("[a-zA-Z]", "", "aB9c81")[1] "981" [I'm really bad in regular expressions, and don't know how to construct "non-numerics".] Andy> From: Vivek Rao > > Is there a simple way in R to remove all characters > from a string other than those in a specified set? For > example, I want to keep only the digits 0-9 in a > string. > > In general, I have found the string handling abilities > of R a bit limited. (Of course it's great for stats in > general). Is there a good reference on this? Or should > R programmers dump their output to a text file and use > something like Perl or Python for sophisticated text > processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote:> Is there a simple way in R to remove all characters > from a string other than those in a specified set? For > example, I want to keep only the digits 0-9 in a > string. > > In general, I have found the string handling abilities > of R a bit limited. (Of course it's great for stats in > general). Is there a good reference on this? Or should > R programmers dump their output to a text file and use > something like Perl or Python for sophisticated text > processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc.Something like the following should work:> x <- paste(sample(c(letters, LETTERS, 0:9), 50, replace = TRUE),collapse = "")> x[1] "QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV"> gsub("[^0-9]", "", x)[1] "8677" The use of gsub() here replaces any characters NOT in 0:9 with a "", therefore leaving only the digits. See ?gsub for more information. HTH, Marc Schwartz
Dear Vivek, Actually, I think R has reasonably good facilities for manipulating strings. See ?gsub etc.; for example: gsub("[^0-9]", "", "XKa0&*1jk2") [1] "012" I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vivek Rao > Sent: Tuesday, April 12, 2005 7:55 AM > To: r-help at stat.math.ethz.ch > Subject: [R] removing characters from a string > > Is there a simple way in R to remove all characters from a > string other than those in a specified set? For example, I > want to keep only the digits 0-9 in a string. > > In general, I have found the string handling abilities of R a > bit limited. (Of course it's great for stats in general). Is > there a good reference on this? Or should R programmers dump > their output to a text file and use something like Perl or > Python for sophisticated text processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc. >
>>>>> "Vivek" == Vivek Rao <rvivekrao at yahoo.com> >>>>> on Tue, 12 Apr 2005 05:54:55 -0700 (PDT) writes:Vivek> Is there a simple way in R to remove all characters Vivek> from a string other than those in a specified set? For Vivek> example, I want to keep only the digits 0-9 in a Vivek> string. Vivek> In general, I have found the string handling abilities Vivek> of R a bit limited. (Of course it's great for stats in Vivek> general). Is there a good reference on this? Or should Vivek> R programmers dump their output to a text file and use Vivek> something like Perl or Python for sophisticated text Vivek> processing? Vivek> I am familiar with the basic functions such as nchar, Vivek> substring, as.integer, print, cat, sprintf etc. It depends on your "etc": The above is pretty trivial using gsub(), but since you sound sophisticated enough to proclaim missing R abilities, I leave the exercise to you. Martin
look at "?gsub()", e.g., string <- "ab03def10-523rtf" string gsub("[^0-9]", "", string) gsub("[0-9]", "", string) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Vivek Rao" <rvivekrao at yahoo.com> To: <r-help at stat.math.ethz.ch> Sent: Tuesday, April 12, 2005 2:54 PM Subject: [R] removing characters from a string> Is there a simple way in R to remove all characters > from a string other than those in a specified set? For > example, I want to keep only the digits 0-9 in a > string. > > In general, I have found the string handling abilities > of R a bit limited. (Of course it's great for stats in > general). Is there a good reference on this? Or should > R programmers dump their output to a text file and use > something like Perl or Python for sophisticated text > processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Hi Try gsub("[^0-9]","","1111af-456utaDFasswe34534%^&%*$h567890ersdfg") [1] "111145634534567890" HTH rksh On Apr 12, 2005, at 01:54 pm, Vivek Rao wrote:> Is there a simple way in R to remove all characters > from a string other than those in a specified set? For > example, I want to keep only the digits 0-9 in a > string. > > In general, I have found the string handling abilities > of R a bit limited. (Of course it's great for stats in > general). Is there a good reference on this? Or should > R programmers dump their output to a text file and use > something like Perl or Python for sophisticated text > processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >-- Robin Hankin Uncertainty Analyst Southampton Oceanography Centre European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
Using help.start() and searching on keyword "character" or using help.search(keyword="character") will show you what you have missed. As others have pointed out, you have missed the power of regular expressions (despite that being how these things are done in Perl). Also, strsplit() can be very powerful. On Tue, 12 Apr 2005, Vivek Rao wrote:> Is there a simple way in R to remove all characters > from a string other than those in a specified set? For > example, I want to keep only the digits 0-9 in a > string. > > In general, I have found the string handling abilities > of R a bit limited.Your exploration of them seems more than a bit limited.> (Of course it's great for stats in general). Is there a good reference > on this? Or should R programmers dump their output to a text file and > use something like Perl or Python for sophisticated text processing? > > I am familiar with the basic functions such as nchar, > substring, as.integer, print, cat, sprintf etc.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for all the helpful replies to my question about string handling. I will try to be more careful in making general comments about R, about which I still have much to learn. The string handling I need to do is not that sophisticated (somewhat contradicting my previous message) and can be done in Fortran 95 with "convenience" intrinsic functions such as INDEX, TRIM, ADJUSTL, SCAN, VERIFY . Figuring out the R equivalents using the functions cited in the replies will be a good exercise for me.