Hello R experts, I need a help to create a subset file. I know with subset comand, its very easy to select many different columns, or threshold. But here I have a bit problem as in my data file is big. And I don't want to identify the column numbers or names manually. I am trying to find any way to automatise this. For example I have a file with about 1500 columns from TRFLP intensity data. And the column names are like: [1] "Sample.Name" "Marker" "RE" "Dye" "Allele.1" "Size.1" "Height.1" "Peak.Area.1" "Data.Point.1" [10] "Allele.2" "Size.2" "Height.2" "Peak.Area.2" "Data.Point.2" "Allele.3" "Size.3" "Height.3" "Peak.Area.3" [19] "Data.Point.3" "Allele.4" "Size.4" "Height.4" "Peak.Area.4" "Data.Point.4" "Allele.5" "Size.5" "Height.5" [28] "Peak.Area.5" "Data.Point.5" "Allele.6" "Size.6" "Height.6" "Peak.Area.6" "Data.Point.6" "Allele.7" "Size.7" [37] "Height.7" "Peak.Area.7" "Data.Point.7" "Allele.8" "Size.8" "Height.8" "Peak.Area.8" "Data.Point.8" "Allele.9" [46] "Size.9" "Height.9" "Peak.Area.9" "Data.Point.9" "Allele.10" "Size.10" "Height.10" "Peak.Area.10" "Data.Point.10" ..... Suppose I want to create a subset selecting all the columns with name Peak.Area (as in unix Peak.Area.*) How can I do that in R? Thanks a lot for the help. Best wishes, Mitra [[alternative HTML version deleted]]
Hello, You could try something like the following. The example below assumes your data.frame is named 'dat' cnums <- grep("Peak\\.Area", colnames(dat)) subdat <- dat[cnums] See ?regexp for the regular expressions used by ?grep. Hope this helps, Rui Barradas Em 16-06-2013 08:20, Suparna Mitra escreveu:> Hello R experts, > I need a help to create a subset file. I know with subset comand, its very > easy to select many different columns, or threshold. But here I have a bit > problem as in my data file is big. And I don't want to identify the column > numbers > > or names manually. I am trying to find any way to automatise this. > > For example I have a file with about 1500 columns from TRFLP intensity > data. > > > And the column names are like: > [1] "Sample.Name" "Marker" "RE" "Dye" > "Allele.1" "Size.1" "Height.1" "Peak.Area.1" > "Data.Point.1" > [10] "Allele.2" "Size.2" "Height.2" "Peak.Area.2" > "Data.Point.2" "Allele.3" "Size.3" "Height.3" > "Peak.Area.3" > [19] "Data.Point.3" "Allele.4" "Size.4" "Height.4" > "Peak.Area.4" "Data.Point.4" "Allele.5" "Size.5" > "Height.5" > [28] "Peak.Area.5" "Data.Point.5" "Allele.6" "Size.6" > "Height.6" "Peak.Area.6" "Data.Point.6" "Allele.7" > "Size.7" > [37] "Height.7" "Peak.Area.7" "Data.Point.7" "Allele.8" > "Size.8" "Height.8" "Peak.Area.8" "Data.Point.8" > "Allele.9" > [46] "Size.9" "Height.9" "Peak.Area.9" "Data.Point.9" > "Allele.10" "Size.10" "Height.10" "Peak.Area.10" > "Data.Point.10" > ..... > > Suppose I want to create a subset selecting all the columns with > name Peak.Area > (as in unix Peak.Area.*) > How can I do that in R? > Thanks a lot for the help. > Best wishes, > Mitra > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Supposed your data.frame is called x, try x[ which( substr( colnames( x ), 1, 4 ) == "Peak" ) ] On Sunday 16 June 2013 15:20:37 Suparna Mitra wrote:> Hello R experts, > I need a help to create a subset file. I know with subset comand, its very > easy to select many different columns, or threshold. But here I have a bit > problem as in my data file is big. And I don't want to identify the column > numbers > > or names manually. I am trying to find any way to automatise this. > > For example I have a file with about 1500 columns from TRFLP intensity > data. > > > And the column names are like: > [1] "Sample.Name" "Marker" "RE" "Dye" > "Allele.1" "Size.1" "Height.1" "Peak.Area.1" > "Data.Point.1" > [10] "Allele.2" "Size.2" "Height.2" "Peak.Area.2" > "Data.Point.2" "Allele.3" "Size.3" "Height.3" > "Peak.Area.3" > [19] "Data.Point.3" "Allele.4" "Size.4" "Height.4" > "Peak.Area.4" "Data.Point.4" "Allele.5" "Size.5" > "Height.5" > [28] "Peak.Area.5" "Data.Point.5" "Allele.6" "Size.6" > "Height.6" "Peak.Area.6" "Data.Point.6" "Allele.7" > "Size.7" > [37] "Height.7" "Peak.Area.7" "Data.Point.7" "Allele.8" > "Size.8" "Height.8" "Peak.Area.8" "Data.Point.8" > "Allele.9" > [46] "Size.9" "Height.9" "Peak.Area.9" "Data.Point.9" > "Allele.10" "Size.10" "Height.10" "Peak.Area.10" > "Data.Point.10" > ..... > > Suppose I want to create a subset selecting all the columns with > name Peak.Area > (as in unix Peak.Area.*) > How can I do that in R? > Thanks a lot for the help. > Best wishes, > Mitra > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.