Hi, Here's my problem... I have a data frame with three columns containing strings. The first columns is a simple character. I want to get the index of that character in the second column and use it to extract the item from the third column. I can do this using a scalar method. But I'm not finding a vector method. An example is below. col1 col2 col3 'L' 'MAIL ' 'PLOY' What I want to do with the above is find the index of col1 in col2 (4) and then use it to extract the character from col3 ('Y'). I could do the last part if I could get the index in a vector fashion. So, the shorter question is, how do I get the index of the letter in col1 as it is found in col2?
Hi: I think you want regexpr so below does what you want but it doesn't handle the case when L isn't in the second column. I'm still trying to figure that out but don't count on it. Hopefully someone else will reply with that piece. DF <- data.frame(col1="L",col2="MAIL",col3="PLOY") print(DF) index <- regexpr(DF$col1,DF$col2) result <- substr(DF$col3,index,index) On Wed, Aug 20, 2008 at 3:26 PM, John Christie wrote:> Hi, > > Here's my problem... I have a data frame with three columns containing > strings. The first columns is a simple character. I want to get the > index of that character in the second column and use it to extract the > item from the third column. I can do this using a scalar method. But > I'm not finding a vector method. An example is below. > > col1 col2 col3 > 'L' 'MAIL ' 'PLOY' > > What I want to do with the above is find the index of col1 in col2 (4) > and then use it to extract the character from col3 ('Y'). I could do > the last part if I could get the index in a vector fashion. > > So, the shorter question is, how do I get the index of the letter in > col1 as it is found in col2? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Wed, 20 Aug 2008, John Christie wrote:> Hi, > > Here's my problem... I have a data frame with three columns containing > strings. The first columns is a simple character. I want to get the index of > that character in the second column and use it to extract the item from the > third column. I can do this using a scalar method. But I'm not finding a > vector method. An example is below. > > col1 col2 col3 > 'L' 'MAIL ' 'PLOY' > > What I want to do with the above is find the index of col1 in col2 (4) and > then use it to extract the character from col3 ('Y'). I could do the last > part if I could get the index in a vector fashion. > > So, the shorter question is, how do I get the index of the letter in col1 as > it is found in col2?Let me count the ways... On second thought, let someone else count the ways. But here is one ## suppose 'df' is your data.frame a.list <- lapply( df, function(x) strsplit(as.character(x), "") ) with(a.list, mapply( function(x,y,z) z[x==y], col1, col2, col3 ) ) This will return all matches in each row. You can use 'match(x,y,0)' in place of 'x==y' to get just the first one. And if you KNOW a match in each row exists and is unique, this will work: with(a.list, do.call(rbind,col3)[ do.call(rbind,col2) == col1 ] ) but I would not trust it. HTH, Chuck> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Hi John: I didn't realize that that was your problem. You can make it work for any number of rows by putting it in lapply as below. I'm sorry for the misunderstanding. I'll send to the list also since I guess my last solution was kind of bad now that I understand what you want. DF <- data.frame(col1=c("L","T"),col2=c("MAIL","KITE"),col3=c("PLOY","SIX")) print(DF) newcol <- lapply(1:nrow(DF), function(.row) { result <- NULL if ( regexpr(DF[.row,1],DF[.row,2]) != -1 ) result <- substr(DF[.row,3],regexpr(DF[.row,1],DF[.row,2]),regexpr(DF[.row,1],DF[.row,2])) result }) print(newcol) # BELOW IS FOR IF YOU ONLY WANT TO KEEP THE ONES THAT WERE FOUND # AND NOT THE NULLS newcol <- newcol[!sapply(newcol,is.null)] print(newcol) On Thu, Aug 21, 2008 at 12:25 AM, John Christie wrote:> The problem with the grep family of commands is that they either test > a string against a list of strings or test a list of strings against a > string. But they cannot do both simultaneously. Your example only > works if there is only one row. > > On Aug 21, 2008, at 12:30 AM, markleeds at verizon.net wrote: > >> John: Below takes care of when L is not there but it's too ugly so >> I'm not even going to send this to the list. There should be a >> better way of doing it but I'm still learning ( I guess one can >> consider me a senior newbie !!! ) also so I don't know it. Good luck. >> >> DF <- data.frame(col1="Y",col2="MAIL",col3="PLOY") >> result <- NULL >> if ( regexpr(DF$col1,DF$col2) != -1 ) result <- substr(DF >> $col3,regexpr(DF$col1,DF$col2),regexpr(DF$col1,DF$col2)) >> print(result) >> >> >> >> On Wed, Aug 20, 2008 at 11:21 PM, markleeds at verizon.net wrote: >> >>> Hi: I think you want regexpr so below does what you want but it >>> doesn't handle the case when L isn't in the second column. I'm >>> still trying to figure that out but don't count on it. Hopefully >>> someone else will reply with that piece. >>> >>> DF <- data.frame(col1="L",col2="MAIL",col3="PLOY") >>> print(DF) >>> index <- regexpr(DF$col1,DF$col2) >>> result <- substr(DF$col3,index,index) >>> >>> >>> >>> On Wed, Aug 20, 2008 at 3:26 PM, John Christie wrote: >>> >>>> Hi, >>>> >>>> Here's my problem... I have a data frame with three columns >>>> containing strings. The first columns is a simple character. I >>>> want to get the index of that character in the second column and >>>> use it to extract the item from the third column. I can do this >>>> using a scalar method. But I'm not finding a vector method. An >>>> example is below. >>>> >>>> col1 col2 col3 >>>> 'L' 'MAIL ' 'PLOY' >>>> >>>> What I want to do with the above is find the index of col1 in col2 >>>> (4) and then use it to extract the character from col3 ('Y'). I >>>> could do the last part if I could get the index in a vector >>>> fashion. >>>> >>>> So, the shorter question is, how do I get the index of the letter >>>> in col1 as it is found in col2? >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.