This is better suited for R-help than R-devel, so I'm copying to the R-help
list:
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at
r-project.org] On Behalf Of Martin Kerr
> Sent: October-08-10 3:09 AM
> To: r-devel at r-project.org
> Subject: [Rd] Selecting multiple columns with same name
>
>
> Hello all,
> I've been working on a project involving clustering algorithms and
I've hit a bit of a snag.
> I have my main data frame with is 31 X 1000, I have fed this into dif and
hclust in order to produce a
> 31 item vector stating the perceived grouping of the columns.E.g.
> 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 3 3 3 3
> etc.
> What I want to do is use this information to separate each groups worth of
data into a separate frame
> so I can perform additional calculations on them.I've been attempting
to use subset by setting the
> colnames to the grouping results thus:
> colnames(dataFrame) <-
groupssubset(dataFrame,select=c(colname="1")
Here's one way to do it
> fdf <- as.data.frame(matrix(rnorm(100), ncol = 10))
> fdf
V1 V2 V3 V4 V5 V6
V7 V8
1 0.35264797 -0.4280407 0.4706150 -0.772936086 0.59984719 0.97885696
0.13569457 0.5005072
2 -0.09800830 -0.3946618 -0.6816040 -0.173057585 -0.95377116 1.32702531
0.51894946 1.8779715
3 0.00585569 0.5240508 0.6334294 0.775787713 1.13537433 -0.75363920
0.09240357 1.7652420
4 -1.28667042 -0.3808195 -1.3735447 0.601288920 0.37448709 1.20875897
1.26392905 0.3573046
5 1.05127892 -0.1717773 0.4795011 0.408584918 -1.57947076 -1.76699298
-2.15778156 -0.6202422
6 0.49935805 -0.5858645 0.1466443 1.094320479 -0.01534562 0.03349714
-0.86508986 0.3335337
7 0.64649298 -0.8044967 1.7273739 0.005654138 0.88092416 -0.43467177
0.33123616 -1.0062133
8 0.67393707 -0.8927181 1.9050954 0.824576116 -1.49872072 0.13610000
-0.98904113 -1.1763053
9 -0.06217531 -0.6020426 -0.5198348 0.475774170 0.72492806 -1.93507347
-0.26827918 -0.7902781
10 -4.05961249 -1.1839906 -2.1285662 0.992767748 -1.45187700 -0.32688422
0.92335149 0.2405690
V9 V10
1 -1.10422899 0.7343708
2 -0.21511926 -0.3472193
3 -1.56249900 0.6228027
4 -1.64679524 0.9548577
5 0.31530976 0.7420800
6 0.02644282 -1.0393438
7 -0.70669500 -0.8335578
8 -0.29898269 1.8679939
9 -0.08449491 -0.7413130
10 0.66960457 -0.4666664> colGroups <- c(1,1,1,2,1,3,3,2,3,1)
> fdf[, colGroups == 1]
V1 V2 V3 V5 V10
1 0.35264797 -0.4280407 0.4706150 0.59984719 0.7343708
2 -0.09800830 -0.3946618 -0.6816040 -0.95377116 -0.3472193
3 0.00585569 0.5240508 0.6334294 1.13537433 0.6228027
4 -1.28667042 -0.3808195 -1.3735447 0.37448709 0.9548577
5 1.05127892 -0.1717773 0.4795011 -1.57947076 0.7420800
6 0.49935805 -0.5858645 0.1466443 -0.01534562 -1.0393438
7 0.64649298 -0.8044967 1.7273739 0.88092416 -0.8335578
8 0.67393707 -0.8927181 1.9050954 -1.49872072 1.8679939
9 -0.06217531 -0.6020426 -0.5198348 0.72492806 -0.7413130
10 -4.05961249 -1.1839906 -2.1285662 -1.45187700
-0.4666664> fdf[, colGroups == 2]
V4 V8
1 -0.772936086 0.5005072
2 -0.173057585 1.8779715
3 0.775787713 1.7652420
4 0.601288920 0.3573046
5 0.408584918 -0.6202422
6 1.094320479 0.3335337
7 0.005654138 -1.0062133
8 0.824576116 -1.1763053
9 0.475774170 -0.7902781
10 0.992767748 0.2405690> fdf[, colGroups == 3]
V6 V7 V9
1 0.97885696 0.13569457 -1.10422899
2 1.32702531 0.51894946 -0.21511926
3 -0.75363920 0.09240357 -1.56249900
4 1.20875897 1.26392905 -1.64679524
5 -1.76699298 -2.15778156 0.31530976
6 0.03349714 -0.86508986 0.02644282
7 -0.43467177 0.33123616 -0.70669500
8 0.13610000 -0.98904113 -0.29898269
9 -1.93507347 -0.26827918 -0.08449491
10 -0.32688422 0.92335149 0.66960457>
and this can be automated as a loop or with lapply() and the like.
HTH
Steve McKinney
> This however only returns the first column rather than all instances of a
column with that name. Note
> that these columns may not necessarily be contiguous.
> Is this the correct way to go about this?
> Thank You
> Martin Kerr
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel