thr3ads.net - R help - [R] vector operation using regexpr? [Aug 2008]

If this information is useful, please help other people find it:
Share via:

John Christie

2008-Aug-20 19:26 UTC

[R] vector operation using regexpr?

Hi,

Here's my problem... I have a data frame with three columns containing  
strings.  The first columns is a simple character. I want to get the  
index of that character in the second column and use it to extract the  
item from the third column.  I can do this using a scalar method.  But  
I'm not finding a vector method.  An example is below.

col1      col2      col3
'L'         'MAIL '   'PLOY'

What I want to do with the above is find the index of col1 in col2 (4)  
and then use it to extract the character from col3 ('Y').  I could do  
the last part if I could get the index in a vector fashion.

So, the shorter question is, how do I get the index of the letter in  
col1 as it is found in col2?

markleeds at verizon.net

2008-Aug-21 03:21 UTC

head link

[R] vector operation using regexpr?

Hi: I think you want regexpr so below does what you want but it doesn't 
handle the case when L isn't in the second column. I'm still trying to 
figure that out but don't count on it. Hopefully someone else will reply 
with that piece.

DF <-
data.frame(col1="L",col2="MAIL",col3="PLOY")
print(DF)
index <- regexpr(DF$col1,DF$col2)
result <- substr(DF$col3,index,index)



On Wed, Aug 20, 2008 at  3:26 PM, John Christie wrote:
> Hi,
>
> Here's my problem... I have a data frame with three columns containing 
> strings.  The first columns is a simple character. I want to get the 
> index of that character in the second column and use it to extract the 
> item from the third column.  I can do this using a scalar method.  But 
> I'm not finding a vector method.  An example is below.
>
> col1      col2      col3
> 'L'         'MAIL '   'PLOY'
>
> What I want to do with the above is find the index of col1 in col2 (4) 
> and then use it to extract the character from col3 ('Y').  I could
do
> the last part if I could get the index in a vector fashion.
>
> So, the shorter question is, how do I get the index of the letter in 
> col1 as it is found in col2?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry

2008-Aug-21 04:09 UTC

head link

[R] vector operation using regexpr?

On Wed, 20 Aug 2008, John Christie wrote:
> Hi,
>
> Here's my problem... I have a data frame with three columns containing 
> strings.  The first columns is a simple character. I want to get the index
of
> that character in the second column and use it to extract the item from the
> third column.  I can do this using a scalar method.  But I'm not
finding a
> vector method.  An example is below.
>
> col1      col2      col3
> 'L'         'MAIL '   'PLOY'
>
> What I want to do with the above is find the index of col1 in col2 (4) and 
> then use it to extract the character from col3 ('Y').  I could do
the last
> part if I could get the index in a vector fashion.
>
> So, the shorter question is, how do I get the index of the letter in col1
as
> it is found in col2?

Let me count the ways... On second thought, let someone else count the 
ways. But here is one

 	## suppose 'df' is your data.frame
  	a.list <- lapply( df, function(x) strsplit(as.character(x), "")
)
 	with(a.list, mapply( function(x,y,z) z[x==y], col1, col2, col3 ) )


This will return all matches in each row. You can use 'match(x,y,0)' in 
place of 'x==y' to get just the first one.


And if you KNOW a match in each row exists and is unique, this will work:

 	with(a.list, do.call(rbind,col3)[ do.call(rbind,col2) == col1 ] )

but I would not trust it.

HTH,

Chuck
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

markleeds at verizon.net

2008-Aug-21 05:09 UTC

head link

[R] vector operation using regexpr?

Hi  John: I didn't realize that that was your problem. You can make it 
work for any number of rows by putting it in lapply as below.
I'm sorry for the misunderstanding. I'll send to the list also since I 
guess my last solution was kind of bad now that I understand what you 
want.

DF <- 
data.frame(col1=c("L","T"),col2=c("MAIL","KITE"),col3=c("PLOY","SIX"))
print(DF)

newcol <- lapply(1:nrow(DF), function(.row) {
   result <- NULL
   if ( regexpr(DF[.row,1],DF[.row,2]) != -1 ) result <- 
substr(DF[.row,3],regexpr(DF[.row,1],DF[.row,2]),regexpr(DF[.row,1],DF[.row,2]))
   result
})

print(newcol)

# BELOW IS FOR IF YOU ONLY WANT TO KEEP THE ONES THAT WERE FOUND
# AND NOT THE NULLS
newcol <- newcol[!sapply(newcol,is.null)]
print(newcol)





On Thu, Aug 21, 2008 at 12:25 AM, John Christie wrote:
> The problem with the grep family of commands is that they either test 
> a string against a list of strings or test a list of strings against a 
> string.  But they cannot do both simultaneously.  Your example only 
> works if there is only one row.
>
> On Aug 21, 2008, at 12:30 AM, markleeds at verizon.net wrote:
>
>> John: Below takes care of when L is not there but it's too ugly so 
>> I'm not even going to send this to the list. There should be a 
>> better way of doing it but I'm still learning ( I guess one can 
>> consider me a senior newbie !!! ) also so I don't know it. Good
luck.
>>
>> DF <-
data.frame(col1="Y",col2="MAIL",col3="PLOY")
>> result <- NULL
>> if ( regexpr(DF$col1,DF$col2) != -1 ) result <- substr(DF 
>> $col3,regexpr(DF$col1,DF$col2),regexpr(DF$col1,DF$col2))
>> print(result)
>>
>>
>>
>> On Wed, Aug 20, 2008 at 11:21 PM, markleeds at verizon.net wrote:
>>
>>> Hi: I think you want regexpr so below does what you want but it 
>>> doesn't handle the case when L isn't in the second column.
I'm
>>> still trying to figure that out but don't count on it.
Hopefully
>>> someone else will reply with that piece.
>>>
>>> DF <-
data.frame(col1="L",col2="MAIL",col3="PLOY")
>>> print(DF)
>>> index <- regexpr(DF$col1,DF$col2)
>>> result <- substr(DF$col3,index,index)
>>>
>>>
>>>
>>> On Wed, Aug 20, 2008 at  3:26 PM, John Christie wrote:
>>>
>>>> Hi,
>>>>
>>>> Here's my problem... I have a data frame with three columns
>>>> containing strings.  The first columns is a simple character. I
>>>> want to get the index of that character in the second column
and
>>>> use it to extract the item from the third column.  I can do
this
>>>> using a scalar method.  But I'm not finding a vector
method.  An
>>>> example is below.
>>>>
>>>> col1      col2      col3
>>>> 'L'         'MAIL '   'PLOY'
>>>>
>>>> What I want to do with the above is find the index of col1 in
col2
>>>> (4) and then use it to extract the character from col3
('Y').  I
>>>> could do the last part if I could get the index in a vector 
>>>> fashion.
>>>>
>>>> So, the shorter question is, how do I get the index of the
letter
>>>> in col1 as it is found in col2?
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more reasonably related threads

R help - Aug 2008 - vector operation using regexpr?

[R] vector operation using regexpr?

[R] vector operation using regexpr?

[R] vector operation using regexpr?

[R] vector operation using regexpr?

Reasonably Related Threads