thr3ads.net - R help - [R] Mean of matched data [Jul 2012]

If this information is useful, please help other people find it:
Share via:

robgriffin247

2012-Jul-18 09:54 UTC

[R] Mean of matched data

Hi
I think/hope there will be a simple solution to this but google-ing has
provided no answers (probably not using the right words)

I have a long data frame of >2 000 000 rows, and 6 columns. Across this
there are 24 000 combinations of gene in a column (n=12000) and gender in a
column (n=2... obviously). I want to create 2 new columns in the data frame
that on each row gives, in one column the mean value (of gene expression, in
the column called "value") for that row's gene&gender
combination, and in
the other column the standard deviation for the gene&gender combination.

Any suggestions?

Rob

Example of the top of the data frame:

gene	variable	value	gender	line	rep
1	CG10000	X208.F1.30456	4.758010	Female	208	1
2	CG10000	X365.F2.30478	4.915395	Female	365	2
3	CG10000	X799.F2.30509	4.641636	Female	799	2
4	CG10000	X306.M2.32650	4.550676	Male	306	2
5	CG10000	X712.M2.30830	4.633811	Male	712	2
6	CG10000	X732.M2.30504	4.857564	Male	732	2
7	CG10000	X707.F1.31120	5.104165	Female	707	1
8	CG10000	X514.F2.30493	4.730814	Female	514	2

--
View this message in context:
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Jul-18 11:27 UTC

head link

[R] Mean of matched data

Helo,

All problems should be easy.


d <- read.table(text="
gene variable value gender line rep
1 CG10000 X208.F1.30456 4.758010 Female 208 1
2 CG10000 X365.F2.30478 4.915395 Female 365 2
3 CG10000 X799.F2.30509 4.641636 Female 799 2
4 CG10000 X306.M2.32650 4.550676 Male 306 2
5 CG10000 X712.M2.30830 4.633811 Male 712 2
6 CG10000 X732.M2.30504 4.857564 Male 732 2
7 CG10000 X707.F1.31120 5.104165 Female 707 1
8 CG10000 X514.F2.30493 4.730814 Female 514 2
", header=TRUE)

# See what we have
str(d)

# or put function(x) ...etc... in the aggregate
f <- function(x) c(mean=mean(x), sd=sd(x))
aggregate(value ~ gene + gender, data = d, f)


Hope this helps,

Rui Barradas
Em 18-07-2012 10:54, robgriffin247 escreveu:> Hi
> I think/hope there will be a simple solution to this but google-ing has
> provided no answers (probably not using the right words)
>
> I have a long data frame of >2 000 000 rows, and 6 columns. Across this
> there are 24 000 combinations of gene in a column (n=12000) and gender in a
> column (n=2... obviously). I want to create 2 new columns in the data frame
> that on each row gives, in one column the mean value (of gene expression,
in
> the column called "value") for that row's gene&gender
combination, and in
> the other column the standard deviation for the gene&gender
combination.
>
> Any suggestions?
>
> Rob
>
> Example of the top of the data frame:
>
> gene	variable	value	gender	line	rep
> 1	CG10000	X208.F1.30456	4.758010	Female	208	1
> 2	CG10000	X365.F2.30478	4.915395	Female	365	2
> 3	CG10000	X799.F2.30509	4.641636	Female	799	2
> 4	CG10000	X306.M2.32650	4.550676	Male	306	2
> 5	CG10000	X712.M2.30830	4.633811	Male	712	2
> 6	CG10000	X732.M2.30504	4.857564	Male	732	2
> 7	CG10000	X707.F1.31120	5.104165	Female	707	1
> 8	CG10000	X514.F2.30493	4.730814	Female	514	2
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

robgriffin247

2012-Jul-18 11:54 UTC

head link

[R] Mean of matched data

Thanks,
in a way this has worked... with a slight modification to this:
  
   
narrow3<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,mean)
   
narrow4<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,sd)

which gives a table of the 24000 gene&gender means (narrow3) and the
standard deviations (narrow4)

which I then merge in to one df using

   
narrow5<-merge(narrow3,narrow4,by=c("narrow2$gene","narrow2$gender"))
   
colnames(narrow5)<-c("gene","gender","mean","sd")


Is there a way I can lift the mean and std.dev. values from data frame
narrow5 and paste them to the original narrow2 df? In effect... R would read
what gene and gender each row of narrow2 has & then paste in the according
mean value in to a new column. then do the same for a new sd column. each
mean/sd value would occur in the new column 80 times (there are 80
occurrences of each gene&gender combination).

rob

--
View this message in context:
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856p4636871.html
Sent from the R help mailing list archive at Nabble.com.

robgriffin247

2012-Jul-18 12:34 UTC

head link

[R] Mean of matched data

got it... another merge did the trick

    narrow6<-merge(narrow2,narrow5,by=c("gene","gender"))

Thanks for the help Rui

--
View this message in context:
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856p4636877.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Extract Variance Components

R help - Jul 2012 - Mean of matched data

[R] Mean of matched data

[R] Mean of matched data

[R] Mean of matched data

[R] Mean of matched data

Maybe Matching Threads