Jake,
You can use the plyr library or some form of apply. If you are on a 64bit
system you can multithread and it goes much faster.
something like this(for 32bit):
require(plyr)
df1 <- data.frame(Taxa = c('blue', 'red', NA,'blue',
'red', NA,'blue', 'red', NA))
df2 <- data.frame(Taxa = c( 'blue', 'red', NA), Class =
c('Z', 'HI', 'A'))
#function to do the lookup
find.class<-function(x)df2[grep(x, df2$Taxa),'Class']
ddply(.data=df1,
.variables='Taxa',
.fun=transform,
Class=find.class(Taxa))
Joel
From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?
Hi,
I have a large dataframe that contains species names. I have a second dataframe
that contains species names and some additional info, called 'Class',
about each species. I would like match the species name is the first data frame
with the 'Class' information contained in the second. Since the species
names are often formatted differently between the data sets, merge doesn't
work well. grep does the trick, but the function needs to be called separately
for each observation in the first data frame. I put grep into a loop, but this
is too slow. Is there a way to run grep repeatedly without resorting to a loop?
Possibly something in the apply family?
df1 <- data.frame(Taxa = c('blue', 'red', NA))
df2 <- data.frame(Taxa = c( 'blue', 'red', NA), Class =
c('Z', 'HI', 'A'))
index <- NULL
for (i in 1:length(df1$Taxa)) {
index[i] <- grep(df1$Taxa[1], df2$Taxa)
}
index
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)
=================================Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842 (desk)
513-487-2511 (fax)
beaulieu.jake@epa.gov<mailto:beaulieu.jake@epa.gov>
[[alternative HTML version deleted]]