thr3ads.net - R help - [R] combining collumns for data.frames [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Martin Hughes

2010-Sep-06 17:49 UTC

[R] combining collumns for data.frames

Hi

This question is far less simple than the title suggests, please read carefully,
thanks.

I have 2 sets of data, both read into R
>data1<-read.table ("1.txt", header=T, sep="\t")
>data2<-read.table ("2.txt", header=T, sep="\t")
>data1
Taxon   stage1   stage2   stage3   stage4
T1          0          0          1          1
T2          0          1          1          0
T3          0          0          0          1
T4          1          0          0          0

>data2 # this is a library file, it contains all possible values of stage
(Col_1) that may be contained in the data1 file (headers of each column), and
what they correspond to           # in the Col_2 ie stages 1:2 == Group1

Col_1        Col_2
Stage1      Group1
Stage2      Group1
Stage3      Group2
Stage4      Group2

 I want to get R to combine the columns in data1 based on the information in
data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2,
summing up the
 values within each column of data1 to get the result below

Taxon   group1   group2

T1          0          1

T2          1          1

T3          0          1

T4          1          0

i have many datasets which have different numbers of stage eg one dataset will
have stage1-10, another will have stage15-35 (data2, Col_2 has all possilbe
stage values so will say what group they correspond to)

so far i can isolate the rows of data2 which contains the stages in data1 with
this:
> data1.names<-names(data1[,-1])                        #take the header
names from data1 minus the 1st column (this is not found in the data2 library
file)
> row.numbers<-match(data1.names, data2[,1])     #match the vector
containing the data1 column header names to those found in the library file of
data2
> data2.small<-data2[row.numbers]                       #reduce the data2
to only include the same stages as found in the data1 file
 from here on i dont know what to, really i wanted to just be able to change the
header names of data1 to their corresponding name that is found in Col_2 and
then use some statement that could merge columns in data1 which were the same
(and also sum the values at each row and dividing by their value if they were
greater than 1 (so i only have 0 or 1 again) but i dont know how to do that.

Can someone help me to get the desired result  (as in the example above) that
doe not require me to manually merge columns? ie get the example output in an
automated way that could take any version of the data1 file (ie with different
stage values) and using the data2 file (library file - same in each instance)
get the output similar as in the example above?


Thanks

Martin








 		 	   		  
	[[alternative HTML version deleted]]

jim holtman

2010-Sep-06 22:16 UTC

head link

[R] combining collumns for data.frames

Try this (after making sure that Col_1 in data2 matches your column
names in data1
> data1 <- read.table(textConnection("Taxon   stage1   stage2  
stage3   stage4+ T1          0          0          1          1
+ T2          0          1          1          0
+ T3          0          0          0          1
+ T4          1          0          0          0"),
header=TRUE)> data2 <- read.table(textConnection("Col_1        Col_2+ stage1      Group1
+ stage2      Group1
+ stage3      Group2
+ stage4      Group2"), header=TRUE, as.is=TRUE)> closeAllConnections()
> # get the columns to summarize by
> colSumz <- split(data2$Col_1, data2$Col_2)
> # create the output matrix
> result <- matrix(0, nrow=nrow(data1), ncol=length(colSumz))
> colnames(result) <- names(colSumz)
> rownames(result) <- data1$Taxon
> for (i in names(colSumz)){+     result[, i] <- rowSums(data1[, colSumz[[i]]])
+ }> result   Group1 Group2
T1      0      2
T2      1      1
T3      0      1
T4      1      0>

On Mon, Sep 6, 2010 at 1:49 PM, Martin Hughes <sensei2002 at hotmail.com>
wrote:>
> Hi
>
> This question is far less simple than the title suggests, please read
carefully, thanks.
>
> I have 2 sets of data, both read into R
>
>>data1<-read.table ("1.txt", header=T, sep="\t")
>>data2<-read.table ("2.txt", header=T, sep="\t")
>
>>data1
>
> Taxon ? stage1 ? stage2 ? stage3 ? stage4
> T1 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?1 ? ? ? ? ?1
> T2 ? ? ? ? ?0 ? ? ? ? ?1 ? ? ? ? ?1 ? ? ? ? ?0
> T3 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?1
> T4 ? ? ? ? ?1 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0
>
>
>>data2 # this is a library file, it contains all possible values of stage
(Col_1) that may be contained in the data1 file (headers of each column), and
what they correspond to
> ? ? ? ? ? # in the Col_2 ie stages 1:2 == Group1
>
> Col_1 ? ? ? ?Col_2
> Stage1 ? ? ?Group1
> Stage2 ? ? ?Group1
> Stage3 ? ? ?Group2
> Stage4 ? ? ?Group2
>
> ?I want to get R to combine the columns in data1 based on the information
in data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2,
summing up the
> ?values within each column of data1 to get the result below
>
> Taxon ? group1 ? group2
>
> T1 ? ? ? ? ?0 ? ? ? ? ?1
>
> T2 ? ? ? ? ?1 ? ? ? ? ?1
>
> T3 ? ? ? ? ?0 ? ? ? ? ?1
>
> T4 ? ? ? ? ?1 ? ? ? ? ?0
>
> i have many datasets which have different numbers of stage eg one dataset
will have stage1-10, another will have stage15-35 (data2, Col_2 has all possilbe
stage values so will say what group they correspond to)
>
> so far i can isolate the rows of data2 which contains the stages in data1
with this:
>
>> data1.names<-names(data1[,-1]) ? ? ? ? ? ? ? ? ? ? ? ?#take the
header names from data1 minus the 1st column (this is not found in the data2
library file)
>> row.numbers<-match(data1.names, data2[,1]) ? ? #match the vector
containing the data1 column header names to those found in the library file of
data2
>> data2.small<-data2[row.numbers] ? ? ? ? ? ? ? ? ? ? ? #reduce the
data2 to only include the same stages as found in the data1 file
>
> ?from here on i dont know what to, really i wanted to just be able to
change the header names of data1 to their corresponding name that is found in
Col_2 and then use some statement that could merge columns in data1 which were
the same (and also sum the values at each row and dividing by their value if
they were greater than 1 (so i only have 0 or 1 again) but i dont know how to do
that.
>
> Can someone help me to get the desired result ?(as in the example above)
that doe not require me to manually merge columns? ie get the example output in
an automated way that could take any version of the data1 file (ie with
different stage values) and using the data2 file (library file - same in each
instance) get the output similar as in the example above?
>
>
> Thanks
>
> Martin
>
>
>
>
>
>
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Sep 2010 - combining collumns for data.frames

[R] combining collumns for data.frames

[R] combining collumns for data.frames

Possibly Parallel Threads