Dennis Fisher
2011-May-26 22:05 UTC
[R] Applying "toupper" to only portions of text strings
Colleagues Assume that I have a vector containing some text strings, some of which contain a particular character. I could like to apply "toupper" to the text before the character. For example (in this case, "|" is the particular character): ORIGINAL: TEXT <- c("aaaa", "bbb|cc", "|ddd") AFTER APPLICATION OF toupper: TEXT <- c("AAAA", "BBB|cc", "|dddd") I could loop through each element, strsplit at the character, apply toupper to the first component, then paste each element together. But, I hope that there is a simpler means to accomplish this. Thanks in advance. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
Cute! I don't think your proposed strategy is all that complicated, but see the gsubfn package for a one-liner. In particular, check out http://code.google.com/p/gsubfn/ where there is an example for your almost exact task. -- Bert On Thu, May 26, 2011 at 3:05 PM, Dennis Fisher <fisher at plessthan.com> wrote:> Colleagues > > Assume that I have a vector containing some text strings, some of which contain a particular character. ?I could like to apply "toupper" to the text before the character. ?For example (in this case, "|" is the particular character): > > ORIGINAL: > ? ? ? ?TEXT ? ?<- c("aaaa", "bbb|cc", "|ddd") > > AFTER APPLICATION OF toupper: > ? ? ? ?TEXT ? ?<- c("AAAA", "BBB|cc", "|dddd") > > I could loop through each element, strsplit at the character, apply toupper to the first component, then paste each element together. ?But, I hope that there is a simpler means to accomplish this. > > Thanks in advance. > > Dennis > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml
peter dalgaard
2011-May-26 22:23 UTC
[R] Applying "toupper" to only portions of text strings
On May 27, 2011, at 00:05 , Dennis Fisher wrote:> Colleagues > > Assume that I have a vector containing some text strings, some of which contain a particular character. I could like to apply "toupper" to the text before the character. For example (in this case, "|" is the particular character): > > ORIGINAL: > TEXT <- c("aaaa", "bbb|cc", "|ddd") > > AFTER APPLICATION OF toupper: > TEXT <- c("AAAA", "BBB|cc", "|dddd") >How are you going to get that extra d in there? >;-)> I could loop through each element, strsplit at the character, apply toupper to the first component, then paste each element together. But, I hope that there is a simpler means to accomplish this.No, I think that is pretty much the plan. It's a one-liner, though:> sapply(strsplit(TEXT,"|",fixed=T),function(x){paste(c(toupper(x[1]),x[-1]),collapse="|")}) [1] "AAAA" "BBB|cc" "|ddd" OK, a _long_ one-liner... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Phil Spector
2011-May-26 22:24 UTC
[R] Applying "toupper" to only portions of text strings
Dennis - Here's one way, using a somewhat obscure feature of perl regular expressions, i.e. the \U and \L escape characters which modify the case of the strings they appear in:> TEXT <- c("aaaa", "bbb|cc", "|ddd") > sub('([a-z]*)(\\|?)([a-z]*)','\\U\\1\\2\\L\\3',TEXT,perl=TRUE)[1] "AAAA" "BBB|cc" "|ddd" - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 26 May 2011, Dennis Fisher wrote:> Colleagues > > Assume that I have a vector containing some text strings, some of which contain a particular character. I could like to apply "toupper" to the text before the character. For example (in this case, "|" is the particular character): > > ORIGINAL: > TEXT <- c("aaaa", "bbb|cc", "|ddd") > > AFTER APPLICATION OF toupper: > TEXT <- c("AAAA", "BBB|cc", "|dddd") > > I could loop through each element, strsplit at the character, apply toupper to the first component, then paste each element together. But, I hope that there is a simpler means to accomplish this. > > Thanks in advance. > > Dennis > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2011-May-26 23:30 UTC
[R] Applying "toupper" to only portions of text strings
On Thu, May 26, 2011 at 6:05 PM, Dennis Fisher <fisher at plessthan.com> wrote:> Colleagues > > Assume that I have a vector containing some text strings, some of which contain a particular character. ?I could like to apply "toupper" to the text before the character. ?For example (in this case, "|" is the particular character): > > ORIGINAL: > ? ? ? ?TEXT ? ?<- c("aaaa", "bbb|cc", "|ddd") > > AFTER APPLICATION OF toupper: > ? ? ? ?TEXT ? ?<- c("AAAA", "BBB|cc", "|dddd") > > I could loop through each element, strsplit at the character, apply toupper to the first component, then paste each element together. ?But, I hope that there is a simpler means to accomplish this. >Try this:> library(gsubfn) > gsubfn("^[^|]+", toupper, TEXT)[1] "AAAA" "BBB|cc" "|dddd" The regular expression starts from the beginning of the string (i.e. ^ ) and includes any immediately following characters that are not | (i.e. [^|]+ ). -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com