Hello, My R skills are somewhere between novice and intermediary, and I am hoping that some of you very helpful forum members, whom I've seen work your magic on other peoples' problems/questions, can help me here. I have a matrix with the following format: (i) individual plants comprising many different genotype groups (i.e., a plant is genotype 1 or genotype 2 or genotype 3, etc). The column for genotypes is called "gen", and the plants are members of genotype class 1 - 309, with no overlaps (i.e., you're either a genotype 1 or a genotype something else, but not both) and no missing values. (ii) Various trait measurements taken on the plants, with multiple replicates per genotype group I want to create a covariance matrix, separately for plants from each genotype group. I know how to use the command "cov"; my problem is that I have 309 different genotype groups and so I need to set up some sort of an automated loop to go through each genotype group and create a separate covariance matrix based on it. My question is, how do I make a loop to automatically go through and create these covariance matrices, i.e., a separate covariance matrix for plants from each genotype group? I am familiar with the "for" command, but I cannot get it to work. Here is my code: christina= read.table("christina.txt", sep= ",", na= "NA", header= TRUE) {for (i in 1:309) christina.i= subset(christina, gen == i) christina.i.clean= christina.i[,-1] christina.matrix.i= as.matrix(christina.i.clean) christina.cov.i= cov(christina.matrix.i, y= NULL, use= "complete.obs", method= c("pearson")) write.table(christina.cov.i, sep= ",", file= "covariances.csv", row.names= FALSE, col.names= FALSE, append= TRUE)} The problem occurs at my code snippet "gen == i". I want R to insert a number in place of "i", depending on what round of the loop it is on, but R insists that I am literally referring to a genotype class named "i". I have made sure that the column "gen" is numeric, but the same problem persists if I make the column a factor instead. Any help would be much appreciated, but help that includes sample code would be most useful. Thank you in advance! Sincerely, Josh
On Wed, Aug 20, 2008 at 7:48 AM, Josh B <joshb41 at yahoo.com> wrote:> Hello, > > My R skills are somewhere between novice and intermediary, and I am hoping that some of you very helpful forum members, whom I've seen work your magic on other peoples' problems/questions, can help me here. > > I have a matrix with the following format: > > (i) individual plants comprising many different genotype groups (i.e., a plant is genotype 1 or genotype 2 or genotype 3, etc). The column for genotypes is called "gen", and the plants are members of genotype class 1 - 309, with no overlaps (i.e., you're either a genotype 1 or a genotype something else, but not both) and no missing values. > (ii) Various trait measurements taken on the plants, with multiple replicates per genotype group > > I want to create a covariance matrix, separately for plants from each genotype group. I know how to use the command "cov"; my problem is that I have 309 different genotype groups and so I need to set up some sort of an automated loop to go through each genotype group and create a separate covariance matrix based on it. > > My question is, how do I make a loop to automatically go through and create these covariance matrices, i.e., a separate covariance matrix for plants from each genotype group? > > I am familiar with the "for" command, but I cannot get it to work. Here is my code: > > christina= read.table("christina.txt", sep= ",", na= "NA", header= TRUE) > {for (i in 1:309) > christina.i= subset(christina, gen == i) > christina.i.clean= christina.i[,-1] > christina.matrix.i= as.matrix(christina.i.clean) > christina.cov.i= cov(christina.matrix.i, y= NULL, use= "complete.obs", method= c("pearson")) > write.table(christina.cov.i, sep= ",", file= "covariances.csv", row.names= FALSE, col.names= FALSE, append= TRUE)} > > > The problem occurs at my code snippet "gen == i". I want R to insert a number in place of "i", depending on what round of the loop it is on, but R insists that I am literally referring to a genotype class named "i". I have made sure that the column "gen" is numeric, but the same problem persists if I make the column a factor instead. > > Any help would be much appreciated, but help that includes sample code would be most useful. Thank you in advance! > > Sincerely, > Josh >If you can make your data into a dataframe, you can use something like this: # might work christina.df <- as.data.frame(christina) # this should work, if your ID var ('gen') is the first column in the data frame by(christina.df, christina.df$gen, function(d) { d.clean <- d[,-1] cov(d.clean, y= NULL, use= "complete.obs", method= c("pearson") } ) for more ideas, see ?by Cheers, Dylan
When I try your suggestion, I get the output, e.g.,: christina.df$gen: 309 NULL Returned for each vector. That doesn't seem good (I associate "NULL" with something being empty), but perhaps I am misinterpreting it? ----- Original Message ---- From: Dylan Beaudette <dylan.beaudette at gmail.com> To: Josh B <joshb41 at yahoo.com> Cc: r-help at r-project.org Sent: Wednesday, August 20, 2008 10:58:26 AM Subject: Re: [R] Looping over groups On Wed, Aug 20, 2008 at 7:48 AM, Josh B <joshb41 at yahoo.com> wrote:> Hello, > > My R skills are somewhere between novice and intermediary, and I am hoping that some of you very helpful forum members, whom I've seen work your magic on other peoples' problems/questions, can help me here. > > I have a matrix with the following format: > > (i) individual plants comprising many different genotype groups (i.e., a plant is genotype 1 or genotype 2 or genotype 3, etc). The column for genotypes is called "gen", and the plants are members of genotype class 1 - 309, with no overlaps (i.e., you're either a genotype 1 or a genotype something else, but not both) and no missing values. > (ii) Various trait measurements taken on the plants, with multiple replicates per genotype group > > I want to create a covariance matrix, separately for plants from each genotype group. I know how to use the command "cov"; my problem is that I have 309 different genotype groups and so I need to set up some sort of an automated loop to go through each genotype group and create a separate covariance matrix based on it. > > My question is, how do I make a loop to automatically go through and create these covariance matrices, i.e., a separate covariance matrix for plants from each genotype group? > > I am familiar with the "for" command, but I cannot get it to work. Here is my code: > > christina= read.table("christina.txt", sep= ",", na= "NA", header= TRUE) > {for (i in 1:309) > christina.i= subset(christina, gen == i) > christina.i.clean= christina.i[,-1] > christina.matrix.i= as.matrix(christina.i.clean) > christina.cov.i= cov(christina.matrix.i, y= NULL, use= "complete.obs", method= c("pearson")) > write.table(christina.cov.i, sep= ",", file= "covariances.csv", row.names= FALSE, col.names= FALSE, append= TRUE)} > > > The problem occurs at my code snippet "gen == i". I want R to insert a number in place of "i", depending on what round of the loop it is on, but R insists that I am literally referring to a genotype class named "i". I have made sure that the column "gen" is numeric, but the same problem persists if I make the column a factor instead. > > Any help would be much appreciated, but help that includes sample code would be most useful. Thank you in advance! > > Sincerely, > Josh >If you can make your data into a dataframe, you can use something like this: # might work christina.df <- as.data.frame(christina) # this should work, if your ID var ('gen') is the first column in the data frame by(christina.df, christina.df$gen, function(d) { d.clean <- d[,-1] cov(d.clean, y= NULL, use= "complete.obs", method= c("pearson") } ) for more ideas, see ?by Cheers, Dylan
2008/8/20 Josh B <joshb41 at yahoo.com>:> Here is my underlying data file. Of course, please don't feel obliged to spend any more time on this! > > > ----- Original Message ---- > From: Dylan Beaudette <dylan.beaudette at gmail.com> > To: Josh B <joshb41 at yahoo.com> > Cc: r-help at r-project.org > Sent: Wednesday, August 20, 2008 10:58:26 AM > Subject: Re: [R] Looping over groups > > On Wed, Aug 20, 2008 at 7:48 AM, Josh B <joshb41 at yahoo.com> wrote: >> Hello, >> >> My R skills are somewhere between novice and intermediary, and I am hoping that some of you very helpful forum members, whom I've seen work your magic on other peoples' problems/questions, can help me here. >> >> I have a matrix with the following format: >> >> (i) individual plants comprising many different genotype groups (i.e., a plant is genotype 1 or genotype 2 or genotype 3, etc). The column for genotypes is called "gen", and the plants are members of genotype class 1 - 309, with no overlaps (i.e., you're either a genotype 1 or a genotype something else, but not both) and no missing values. >> (ii) Various trait measurements taken on the plants, with multiple replicates per genotype group >> >> I want to create a covariance matrix, separately for plants from each genotype group. I know how to use the command "cov"; my problem is that I have 309 different genotype groups and so I need to set up some sort of an automated loop to go through each genotype group and create a separate covariance matrix based on it. >> >> My question is, how do I make a loop to automatically go through and create these covariance matrices, i.e., a separate covariance matrix for plants from each genotype group? >> >> I am familiar with the "for" command, but I cannot get it to work. Here is my code: >> >> christina= read.table("christina.txt", sep= ",", na= "NA", header= TRUE) >> {for (i in 1:309) >> christina.i= subset(christina, gen == i) >> christina.i.clean= christina.i[,-1] >> christina.matrix.i= as.matrix(christina.i.clean) >> christina.cov.i= cov(christina.matrix.i, y= NULL, use= "complete.obs", method= c("pearson")) >> write.table(christina.cov.i, sep= ",", file= "covariances.csv", row.names= FALSE, col.names= FALSE, append= TRUE)} >> >> >> The problem occurs at my code snippet "gen == i". I want R to insert a number in place of "i", depending on what round of the loop it is on, but R insists that I am literally referring to a genotype class named "i". I have made sure that the column "gen" is numeric, but the same problem persists if I make the column a factor instead. >> >> Any help would be much appreciated, but help that includes sample code would be most useful. Thank you in advance! >> >> Sincerely, >> Josh >> > > If you can make your data into a dataframe, you can use something like this: > > # might work > christina.df <- as.data.frame(christina) > > # this should work, if your ID var ('gen') is the first column in the data frame > by(christina.df, christina.df$gen, function(d) { > d.clean <- d[,-1] > cov(d.clean, y= NULL, use= "complete.obs", method= c("pearson") > } ) > > for more ideas, see ?by > > Cheers, > > Dylan > > > >Works for me: x <- read.csv('christina.txt') x.list <- by(x, x$gen, function(d) { d.clean <- d[,-1] cov(d.clean, y= NULL, use= "complete.obs", method="pearson") } ) note that the output is a list, where each element corresponds to one level of 'gen'. if you need to write each element out to a file, see ?sapply or ?lapply . Cheers, Dylan
> Works for me: > x <- read.csv('christina.txt') > > x.list <- by(x, x$gen, function(d) { > d.clean <- d[,-1] > cov(d.clean, y= NULL, use= "complete.obs", method="pearson") > } ) > >note that the output is a list, where each element corresponds to one >level of 'gen'. if you need to write each element out to a file, see >?sapply or ?lapply .Yes, how would I do that? The usage of sapply is pretty hard to understand, at least at first glance, and I have never played with it before. I will need to output all of the covariance matrices to one CSV or text file (there's probably some sort of "append = TRUE" argument involved). Does anyone know how to do this easily? [[alternative HTML version deleted]]