thr3ads.net - R help - [R] help with rowsum/aggregate type functions [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Charles Murtaugh

2008-Mar-25 03:22 UTC

[R] help with rowsum/aggregate type functions

Hi--

  This is a question with a trivial and obvious answer, I'm sure, but I
can't seem to find it in the help files and books that I have handy.  I have
a dataframe consisting of two columns, "Gene_Name," a list of gene
symbols, and "Number," a numeric measure of how frequently a tag
representing that gene showed up in a SAGE library.  Several of the genes are
represented by multiple tags, and therefore are present more than once in the
list, e.g.:

1167     Zcchc8      6
1168     Zcwpw1      5
1169     Zdhhc18     6
1170     Zdhhc20     5
1171     Zdhhc3      6
1172     Zdhhc3      5
1173     Zeb2        9
1174     Zeb2        6

  What I want is to collapse the list by gene name, such that duplicates are
summed up and appear only once in the final version:



Zcchc8      6

Zcwpw1      5

Zdhhc18     6
Zdhhc20     5

Zdhhc3     11

Zeb2       15



  The only way I can figure out to do this is via rowsum:


> rowsum (Number,Gene_Name)


gives me exactly what I want, *except* that in the end, I am left with a matrix
containing the Number values and with the Gene_Names used as row names (the
output therefore looks exactly as printed above) -- what I want is a dataframe
equivalent to the starting table, with numbered rows and separate, accessible
columns containing the Gene_Name and Number values.



  I was able to put such a dataframe together manually, by cobbling together the
row names of the above list with the values:


> genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)),
rowsum(Number,Gene_Name))


but then I have to manually replace the row names of the dataframe with numbers,
to get back to what I wanted in the first place.



  I hope this makes some sort of sense.  Is there an easier way to do this? 
Thanks in advance!



  Charlie Murtaugh







====
L. Charles Murtaugh
Assistant Professor

University of Utah
Dept. of Human Genetics
15 N. 2030 E. Rm. 2100
Salt Lake City, UT 84112

tel 801-581-5958
fax 801-581-6463
email murtaugh@genetics.utah.edu


	[[alternative HTML version deleted]]

Henrique Dallazuanna

2008-Mar-25 11:38 UTC

head link

[R] help with rowsum/aggregate type functions

Try this:

aggregate(list(Number=x$Number), by=list(Gene_Name=x$Gene_Name), sum)

On 25/03/2008, Charles Murtaugh <murtaugh at genetics.utah.edu>
wrote:> Hi--
>
>   This is a question with a trivial and obvious answer, I'm sure, but I
can't seem to find it in the help files and books that I have handy.  I have
a dataframe consisting of two columns, "Gene_Name," a list of gene
symbols, and "Number," a numeric measure of how frequently a tag
representing that gene showed up in a SAGE library.  Several of the genes are
represented by multiple tags, and therefore are present more than once in the
list, e.g.:
>
>  1167     Zcchc8      6
>  1168     Zcwpw1      5
>  1169     Zdhhc18     6
>  1170     Zdhhc20     5
>  1171     Zdhhc3      6
>  1172     Zdhhc3      5
>  1173     Zeb2        9
>  1174     Zeb2        6
>
>   What I want is to collapse the list by gene name, such that duplicates
are summed up and appear only once in the final version:
>
>
>
>  Zcchc8      6
>
>  Zcwpw1      5
>
>  Zdhhc18     6
>  Zdhhc20     5
>
>  Zdhhc3     11
>
>  Zeb2       15
>
>
>
>   The only way I can figure out to do this is via rowsum:
>
>
>
>  > rowsum (Number,Gene_Name)
>
>
>
>  gives me exactly what I want, *except* that in the end, I am left with a
matrix containing the Number values and with the Gene_Names used as row names
(the output therefore looks exactly as printed above) -- what I want is a
dataframe equivalent to the starting table, with numbered rows and separate,
accessible columns containing the Gene_Name and Number values.
>
>
>
>   I was able to put such a dataframe together manually, by cobbling
together the row names of the above list with the values:
>
>
>
>  > genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)),
rowsum(Number,Gene_Name))
>
>
>
>  but then I have to manually replace the row names of the dataframe with
numbers, to get back to what I wanted in the first place.
>
>
>
>   I hope this makes some sort of sense.  Is there an easier way to do this?
Thanks in advance!
>
>
>
>   Charlie Murtaugh
>
>
>
>
>
>
>
>  ====>
>  L. Charles Murtaugh
>  Assistant Professor
>
>  University of Utah
>  Dept. of Human Genetics
>  15 N. 2030 E. Rm. 2100
>  Salt Lake City, UT 84112
>
>  tel 801-581-5958
>  fax 801-581-6463
>  email murtaugh at genetics.utah.edu
>
>
>         [[alternative HTML version deleted]]
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>

-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

John Kane

2008-Mar-25 13:56 UTC

head link

[R] help with rowsum/aggregate type functions

See the reshape package. 

library(reshape)
yy <- melt(xx, id=c("Gene.name")) 
cast(yy, Gene.name~variable, sum)

--- Charles Murtaugh <murtaugh at genetics.utah.edu>
wrote:
> Hi--
> 
>   This is a question with a trivial and obvious
> answer, I'm sure, but I can't seem to find it in the
> help files and books that I have handy.  I have a
> dataframe consisting of two columns, "Gene_Name," a
> list of gene symbols, and "Number," a numeric
> measure of how frequently a tag representing that
> gene showed up in a SAGE library.  Several of the
> genes are represented by multiple tags, and
> therefore are present more than once in the list,
> e.g.:
> 
> 1167     Zcchc8      6
> 1168     Zcwpw1      5
> 1169     Zdhhc18     6
> 1170     Zdhhc20     5
> 1171     Zdhhc3      6
> 1172     Zdhhc3      5
> 1173     Zeb2        9
> 1174     Zeb2        6
> 
>   What I want is to collapse the list by gene name,
> such that duplicates are summed up and appear only
> once in the final version:
> 
> 
> 
> Zcchc8      6
> 
> Zcwpw1      5
> 
> Zdhhc18     6
> Zdhhc20     5
> 
> Zdhhc3     11
> 
> Zeb2       15
> 
> 
> 
>   The only way I can figure out to do this is via
> rowsum:
> 
> 
> 
> > rowsum (Number,Gene_Name)
> 
> 
> 
> gives me exactly what I want, *except* that in the
> end, I am left with a matrix containing the Number
> values and with the Gene_Names used as row names
> (the output therefore looks exactly as printed
> above) -- what I want is a dataframe equivalent to
> the starting table, with numbered rows and separate,
> accessible columns containing the Gene_Name and
> Number values.
> 
> 
> 
>   I was able to put such a dataframe together
> manually, by cobbling together the row names of the
> above list with the values:
> 
> 
> 
> > genes.unique <- data.frame (rownames
> (rowsum(Number,Gene_Name)),
> rowsum(Number,Gene_Name))
> 
> 
> 
> but then I have to manually replace the row names of
> the dataframe with numbers, to get back to what I
> wanted in the first place.
> 
> 
> 
>   I hope this makes some sort of sense.  Is there an
> easier way to do this?  Thanks in advance!
> 
> 
> 
>   Charlie Murtaugh
> 
> 
> 
> 
> 
> 
> 
> ====> 
> L. Charles Murtaugh
> Assistant Professor
> 
> University of Utah
> Dept. of Human Genetics
> 15 N. 2030 E. Rm. 2100
> Salt Lake City, UT 84112
> 
> tel 801-581-5958
> fax 801-581-6463
> email murtaugh at genetics.utah.edu
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 


      __________________________________________________________________
[[elided trailing spam]]

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2008 - help with rowsum/aggregate type functions

[R] help with rowsum/aggregate type functions

[R] help with rowsum/aggregate type functions

[R] help with rowsum/aggregate type functions

Seemingly Similar Threads