R-help: I have a variable ("ID_list") containing about 1800 unique numbers, and a 143066x29 data frame. One of the columns ("ID") in my data frame contains a list of ids, many of which appear more than once. I'd like to find the subset of my data frame for which "ID" matches one of the numbers in "ID_list." I'm pretty sure I could write a function to do this--something like: dataSubset<-function(df, id_list){ tmp = data.frame() for(i in id_list){ for(j in 1:dim(df)[1]){ if(i==df$ID[j]){ tmp<-data.frame(df[j,]) } } } tmp } but this seems inefficient. As I understand it, the subset function won't really solve my problem, but it seems like there must be something out there that will that I must be forgetting. Does anyone know of a way to solve this problem in an efficient way? Thanks! Kyle H. Ambert Graduate Student, Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University ambertk@ohsu.edu [[alternative HTML version deleted]]
R-help: I have a variable ("ID_list") containing about 1800 unique numbers, and a 143066x29 data frame. One of the columns ("ID") in my data frame contains a list of ids, many of which appear more than once. I'd like to find the subset of my data frame for which "ID" matches one of the numbers in "ID_list." I'm pretty sure I could write a function to do this--something like: dataSubset<-function(df, id_list){ tmp = data.frame() for(i in id_list){ for(j in 1:dim(df)[1]){ if(i==df$ID[j]){ tmp<-data.frame(df[j,]) } } } tmp } but this seems inefficient. As I understand it, the subset function won't really solve my problem, but it seems like there must be something out there that will that I must be forgetting. Does anyone know of a way to solve this problem in an efficient way? Thanks! Kyle H. Ambert Graduate Student, Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University ambertk@ohsu.edu [[alternative HTML version deleted]]
I don't know if I understand (small example with R command wouuld help), but, assuming your data.frame is called 'df' subset(df, ID %in% ID_list) Question, is ID_list a "list" or a vector, and are they really "numbers" or "factors"? Kyle. wrote:> R-help: > > I have a variable ("ID_list") containing about 1800 unique numbers, and a > 143066x29 data frame. One of the columns ("ID") in my data frame contains a > list of ids, many of which appear more than once. I'd like to find the > subset of my data frame for which "ID" matches one of the numbers in > "ID_list." I'm pretty sure I could write a function to do this--something > like: > > dataSubset<-function(df, id_list){ > tmp = data.frame() > for(i in id_list){ > for(j in 1:dim(df)[1]){ > if(i==df$ID[j]){ > tmp<-data.frame(df[j,]) > } > } > } > tmp > } > > but this seems inefficient. As I understand it, the subset function won't > really solve my problem, but it seems like there must be something out there > that will that I must be forgetting. Does anyone know of a way to solve this > problem in an efficient way? Thanks! > > > Kyle H. Ambert > Graduate Student, Department of Medical Informatics & Clinical Epidemiology > Oregon Health & Science University > ambertk at ohsu.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.