Input: dataframe with 300+columns for a regression. It consists of sets of
factors whose names have the same structure. For example, aa1,aa2,aa3 could be
one set of factors.
After reading in the dataframe, I would like to compute the density (%nonzeroes)
for certain groups of factors and delete the factors which are below the density
threshold. I would like to use regular expressions to specify the factor names.
density.factor = c("^aaa","^bbb")
density.faccol=c()
for(fac in density.factor){
density.faccol=c(density.faccol,grep(fac,names(data.df)))
}
data.df=data.df[,-density.faccol]
Is there a way to avoid the for loop? The following seems to work:
lapply(density.factor,grep,names(data.df))
However, that produces a list of lists which need to be merged. Note that in the
above example since we have 2 regular expressions, there will be two lists but
in the general case there will be many more.
Questions (i) how do I merge the lists into a single list (ii) is there a better
way to achieve the "vectorized" grep?
Thanks.