R-help:
I have a variable ("ID_list") containing about 1800 unique numbers,
and a
143066x29 data frame. One of the columns ("ID") in my data frame
contains a
list of ids, many of which appear more than once. I'd like to find the
subset of my data frame for which "ID" matches one of the numbers in
"ID_list." I'm pretty sure I could write a function to do
this--something
like:
dataSubset<-function(df, id_list){
tmp = data.frame()
for(i in id_list){
for(j in 1:dim(df)[1]){
if(i==df$ID[j]){
tmp<-data.frame(df[j,])
}
}
}
tmp
}
but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way? Thanks!
Kyle H. Ambert
Graduate Student, Department of Medical Informatics & Clinical Epidemiology
Oregon Health & Science University
ambertk@ohsu.edu
[[alternative HTML version deleted]]
R-help:
I have a variable ("ID_list") containing about 1800 unique numbers,
and a
143066x29 data frame. One of the columns ("ID") in my data frame
contains a
list of ids, many of which appear more than once. I'd like to find the
subset of my data frame for which "ID" matches one of the numbers in
"ID_list." I'm pretty sure I could write a function to do
this--something
like:
dataSubset<-function(df, id_list){
tmp = data.frame()
for(i in id_list){
for(j in 1:dim(df)[1]){
if(i==df$ID[j]){
tmp<-data.frame(df[j,])
}
}
}
tmp
}
but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way? Thanks!
Kyle H. Ambert
Graduate Student, Department of Medical Informatics & Clinical Epidemiology
Oregon Health & Science University
ambertk@ohsu.edu
[[alternative HTML version deleted]]
I don't know if I understand (small example with R command wouuld help), but, assuming your data.frame is called 'df' subset(df, ID %in% ID_list) Question, is ID_list a "list" or a vector, and are they really "numbers" or "factors"? Kyle. wrote:> R-help: > > I have a variable ("ID_list") containing about 1800 unique numbers, and a > 143066x29 data frame. One of the columns ("ID") in my data frame contains a > list of ids, many of which appear more than once. I'd like to find the > subset of my data frame for which "ID" matches one of the numbers in > "ID_list." I'm pretty sure I could write a function to do this--something > like: > > dataSubset<-function(df, id_list){ > tmp = data.frame() > for(i in id_list){ > for(j in 1:dim(df)[1]){ > if(i==df$ID[j]){ > tmp<-data.frame(df[j,]) > } > } > } > tmp > } > > but this seems inefficient. As I understand it, the subset function won't > really solve my problem, but it seems like there must be something out there > that will that I must be forgetting. Does anyone know of a way to solve this > problem in an efficient way? Thanks! > > > Kyle H. Ambert > Graduate Student, Department of Medical Informatics & Clinical Epidemiology > Oregon Health & Science University > ambertk at ohsu.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.