I just need to confirm something with pattern matching folks. I have a factor with the following levels in a very large data set:> levels(all$Classical.Statistic)[1] "" "AB;ABD" "CollapsedSteps" "CR_P" "CR_Prop;CR_P;AB" [6] "NMK" "NMK;P" "NMK;P;ABD" "P" "ABD" [11] "CR_P;CollapsedSteps" "NMK;AB;ABD" "NMK;ABD" "NMK;P;AB" "NMK;P;AB;ABD" [16] "AB" "CRT;CollapsedSteps" "NMK;AB" "CR_P;CRT;CollapsedSteps" "CR_Prop;CR_P" I need to subset the rows in which the term "CollapsedSteps" appears. So, it may appear as "CollapsedSteps" or may appear as "CR_P;CRT;CollapsedSteps" as you can see above. I'm using grep as follows: all[grep('CollapsedSteps', all$Classical.Statistic),] to find any row in which the term "'CollapsedSteps" appears. Is this certain to catch all cases, or is there an intricacy that I may have missed. Thank you Harold> sessionInfo()R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.8.0 loaded via a namespace (and not attached): [1] gtools_2.6.2 tools_2.10.1 [[alternative HTML version deleted]]
Doran, Harold wrote:> I just need to confirm something with pattern matching folks. I have > a factor with the following levels in a very large data set: > >> levels(all$Classical.Statistic) > [1] "" "AB;ABD" > "CollapsedSteps" "CR_P" "CR_Prop;CR_P;AB" > [6] "NMK" "NMK;P" "NMK;P;ABD" > "P" "ABD" [11] "CR_P;CollapsedSteps" > "NMK;AB;ABD" "NMK;ABD" "NMK;P;AB" > "NMK;P;AB;ABD" [16] "AB" "CRT;CollapsedSteps" > "NMK;AB" "CR_P;CRT;CollapsedSteps" "CR_Prop;CR_P" > > I need to subset the rows in which the term "CollapsedSteps" appears. > So, it may appear as "CollapsedSteps" or may appear as > "CR_P;CRT;CollapsedSteps" as you can see above. I'm using grep as > follows: > > all[grep('CollapsedSteps', all$Classical.Statistic),] > > to find any row in which the term "'CollapsedSteps" appears. Is this > certain to catch all cases, or is there an intricacy that I may have > missed.Well, just try it for yourself on a data.frame that's small enough to verify 'manually'. For instance, the data.frame that contains each level exactly once sounds like a good candidate. test <- subset(all, !duplicated(Classical.Statistic) and then try your line of code ... And do you really want "" as a level, or should those by NA?