Hello R experts, I need a help to create a subset file. I know with subset comand, its very easy to select many different columns, or threshold. But here I have a bit problem as in my data file is big. And I don't want to identify the column numbers or names manually. I am trying to find any way to automatise this. For example I have a file with about 1500 columns from TRFLP intensity data. And the column names are like: [1] "Sample.Name" "Marker" "RE" "Dye" "Allele.1" "Size.1" "Height.1" "Peak.Area.1" "Data.Point.1" [10] "Allele.2" "Size.2" "Height.2" "Peak.Area.2" "Data.Point.2" "Allele.3" "Size.3" "Height.3" "Peak.Area.3" [19] "Data.Point.3" "Allele.4" "Size.4" "Height.4" "Peak.Area.4" "Data.Point.4" "Allele.5" "Size.5" "Height.5" [28] "Peak.Area.5" "Data.Point.5" "Allele.6" "Size.6" "Height.6" "Peak.Area.6" "Data.Point.6" "Allele.7" "Size.7" [37] "Height.7" "Peak.Area.7" "Data.Point.7" "Allele.8" "Size.8" "Height.8" "Peak.Area.8" "Data.Point.8" "Allele.9" [46] "Size.9" "Height.9" "Peak.Area.9" "Data.Point.9" "Allele.10" "Size.10" "Height.10" "Peak.Area.10" "Data.Point.10" ..... Suppose I want to create a subset selecting all the columns with name Peak.Area (as in unix Peak.Area.*) How can I do that in R? Thanks a lot for the help. Best wishes, Mitra [[alternative HTML version deleted]]
Hello,
You could try something like the following.
The example below assumes your data.frame is named 'dat'
cnums <- grep("Peak\\.Area", colnames(dat))
subdat <- dat[cnums]
See ?regexp for the regular expressions used by ?grep.
Hope this helps,
Rui Barradas
Em 16-06-2013 08:20, Suparna Mitra escreveu:> Hello R experts,
> I need a help to create a subset file. I know with subset comand, its
very
> easy to select many different columns, or threshold. But here I have a bit
> problem as in my data file is big. And I don't want to identify the
column
> numbers
>
> or names manually. I am trying to find any way to automatise this.
>
> For example I have a file with about 1500 columns from TRFLP intensity
> data.
>
>
> And the column names are like:
> [1] "Sample.Name" "Marker" "RE"
"Dye"
> "Allele.1" "Size.1"
"Height.1" "Peak.Area.1"
> "Data.Point.1"
> [10] "Allele.2" "Size.2"
"Height.2" "Peak.Area.2"
> "Data.Point.2" "Allele.3" "Size.3"
"Height.3"
> "Peak.Area.3"
> [19] "Data.Point.3" "Allele.4"
"Size.4" "Height.4"
> "Peak.Area.4" "Data.Point.4" "Allele.5"
"Size.5"
> "Height.5"
> [28] "Peak.Area.5" "Data.Point.5"
"Allele.6" "Size.6"
> "Height.6" "Peak.Area.6"
"Data.Point.6" "Allele.7"
> "Size.7"
> [37] "Height.7" "Peak.Area.7"
"Data.Point.7" "Allele.8"
> "Size.8" "Height.8"
"Peak.Area.8" "Data.Point.8"
> "Allele.9"
> [46] "Size.9" "Height.9"
"Peak.Area.9" "Data.Point.9"
> "Allele.10" "Size.10" "Height.10"
"Peak.Area.10"
> "Data.Point.10"
> .....
>
> Suppose I want to create a subset selecting all the columns with
> name Peak.Area
> (as in unix Peak.Area.*)
> How can I do that in R?
> Thanks a lot for the help.
> Best wishes,
> Mitra
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Supposed your data.frame is called x, try x[ which( substr( colnames( x ), 1, 4 ) == "Peak" ) ] On Sunday 16 June 2013 15:20:37 Suparna Mitra wrote:> Hello R experts, > I need a help to create a subset file. I know with subset comand, its very > easy to select many different columns, or threshold. But here I have a bit > problem as in my data file is big. And I don't want to identify the column > numbers > > or names manually. I am trying to find any way to automatise this. > > For example I have a file with about 1500 columns from TRFLP intensity > data. > > > And the column names are like: > [1] "Sample.Name" "Marker" "RE" "Dye" > "Allele.1" "Size.1" "Height.1" "Peak.Area.1" > "Data.Point.1" > [10] "Allele.2" "Size.2" "Height.2" "Peak.Area.2" > "Data.Point.2" "Allele.3" "Size.3" "Height.3" > "Peak.Area.3" > [19] "Data.Point.3" "Allele.4" "Size.4" "Height.4" > "Peak.Area.4" "Data.Point.4" "Allele.5" "Size.5" > "Height.5" > [28] "Peak.Area.5" "Data.Point.5" "Allele.6" "Size.6" > "Height.6" "Peak.Area.6" "Data.Point.6" "Allele.7" > "Size.7" > [37] "Height.7" "Peak.Area.7" "Data.Point.7" "Allele.8" > "Size.8" "Height.8" "Peak.Area.8" "Data.Point.8" > "Allele.9" > [46] "Size.9" "Height.9" "Peak.Area.9" "Data.Point.9" > "Allele.10" "Size.10" "Height.10" "Peak.Area.10" > "Data.Point.10" > ..... > > Suppose I want to create a subset selecting all the columns with > name Peak.Area > (as in unix Peak.Area.*) > How can I do that in R? > Thanks a lot for the help. > Best wishes, > Mitra > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.