Hi All, I'm trying to run a simulation of host-pathogen evolution based around individuals. What I need to have is a dataframe or table of some description - describing all the individuals of a pathogen population (so far I've implemented this as a matrix): ID No_of_Effectors Effectors (Sequences) [1,] 0001 3 ## 3 Random Numbers ## There will be many such rows for many individuals. They have something called effectors, the number of which is randomly generated, so say you get 3 in the No_of_Effectors column. Then I make R generate 3 numbers from between 1 and 10,000, this gives me three numerical representations of genes. These numbers will be compared to a similar data structure of the host individuals who have their immune genes with similar numbers. My problem is that obviously I can't stick 3 numbers in one "cell" of the matrix (I've tried) : Pathogen_Individuals[1,3] <- c(2,3,4) Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : number of items to replace is not a multiple of replacement length In future I'm also going to have more variables such as whether a gene is expressed. Such information may require a matrix in itself - something like: Effector ID Sequence Expressed? [1,] 0001 345,567,678 1 (or 0). Is there a way then I can put more than one value in the cell like a list of values, or a way to put objects in a cell of a data frame, matrix or table etc. Almost an inception deal - data structures nested in a data structure? If I search for things like "insert list into matrix" I get results like how to turn one into another, which is not what I think I need to be doing. I have been considering having several data structures not nested in each other, something like for every individual create a new matrix object with the name Effectors_[Individual_ID] and some how get my simulation loops operating on those objects but I find it hard to see how to tell R all of those matrices are to be included in an operation, as you can all lines of a data frame for example with for loops. This is strange for me because this model was written in a macro-code for another program which handles data in a different format and layout to R. My problem is I think, each individual in the model has many variables - in this case representations of genes. So I'm having trouble getting my head about this. Hopefully someone more experienced will be able to offer advice or a solution, it will be very appreciated. Many Thanks, Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). P.S. I have searched previous queries to the list, and I'm not sure but this may be useful for relevant: Have you thought of using a list?> a <- matrix(1:10, nrow=2) > b <- 1:5 > x <- list(a=a, b=b) > x$a [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10 $b [1] 1 2 3 4 5> x$a[,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10> x$b[1] 1 2 3 4 5 oliveoil and yarn datasets have been mentioned. [[alternative HTML version deleted]]
On 28.10.2012 10:32, Benjamin Ward (ENV) wrote:> Hi All, > > I'm trying to run a simulation of host-pathogen evolution based around individuals. > What I need to have is a dataframe or table of some description - describing all the individuals of a pathogen population (so far I've implemented this as a matrix): > > ID No_of_Effectors Effectors (Sequences) > [1,] 0001 3 ## 3 Random Numbers ## > > There will be many such rows for many individuals. They have something called effectors, the number of which is randomly generated, so say you get 3 in the No_of_Effectors column. Then I make R generate 3 numbers from between 1 and 10,000, this gives me three numerical representations of genes. These numbers will be compared to a similar data structure of the host individuals who have their immune genes with similar numbers. > > My problem is that obviously I can't stick 3 numbers in one "cell" of the matrix (I've tried) : > > Pathogen_Individuals[1,3] <- c(2,3,4)Consider to use a data.frame with the third column (Effectors) of list type. Then you can do: Pathogen_Individuals$Effectors[1] <- list(c(2,3,4)) And what you get is: > Pathogen_Individuals ID No_of_Effectors Effectors 1 0001 3 2, 3, 4 Uwe Ligges> Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : > number of items to replace is not a multiple of replacement length > > In future I'm also going to have more variables such as whether a gene is expressed. Such information may require a matrix in itself - something like: > > > Effector ID Sequence Expressed? > [1,] 0001 345,567,678 1 (or 0). > > Is there a way then I can put more than one value in the cell like a list of values, or a way to put objects in a cell of a data frame, matrix or table etc. Almost an inception deal - data structures nested in a data structure? If I search for things like "insert list into matrix" I get results like how to turn one into another, which is not what I think I need to be doing. > > I have been considering having several data structures not nested in each other, something like for every individual create a new matrix object with the name Effectors_[Individual_ID] and some how get my simulation loops operating on those objects but I find it hard to see how to tell R all of those matrices are to be included in an operation, as you can all lines of a data frame for example with for loops. > This is strange for me because this model was written in a macro-code for another program which handles data in a different format and layout to R. > > My problem is I think, each individual in the model has many variables - in this case representations of genes. So I'm having trouble getting my head about this. > > Hopefully someone more experienced will be able to offer advice or a solution, it will be very appreciated. > > Many Thanks, > Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). > > P.S. I have searched previous queries to the list, and I'm not sure but this may be useful for relevant: > > > Have you thought of using a list? > >> a <- matrix(1:10, nrow=2) >> b <- 1:5 >> x <- list(a=a, b=b) >> x > $a > [,1] [,2] [,3] [,4] [,5] > [1,] 1 3 5 7 9 > [2,] 2 4 6 8 10 > > $b > [1] 1 2 3 4 5 > >> x$a > [,1] [,2] [,3] [,4] [,5] > [1,] 1 3 5 7 9 > [2,] 2 4 6 8 10 >> x$b > [1] 1 2 3 4 5 > > oliveoil and yarn datasets have been mentioned. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Search on "ragged array". My preferred approach is to use a data frame with one row per effector that repeats the per-ID information. If that occupies too much memory, you can setup another data frame with one row per ID and refer to that information as using lapply and subset the effectors data as needed. The plyr package is also useful for such processing. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:>Hi All, > >I'm trying to run a simulation of host-pathogen evolution based around >individuals. >What I need to have is a dataframe or table of some description - >describing all the individuals of a pathogen population (so far I've >implemented this as a matrix): > > ID No_of_Effectors Effectors (Sequences) > [1,] 0001 3 ## 3 Random Numbers ## > >There will be many such rows for many individuals. They have something >called effectors, the number of which is randomly generated, so say you >get 3 in the No_of_Effectors column. Then I make R generate 3 numbers >from between 1 and 10,000, this gives me three numerical >representations of genes. These numbers will be compared to a similar >data structure of the host individuals who have their immune genes with >similar numbers. > >My problem is that obviously I can't stick 3 numbers in one "cell" of >the matrix (I've tried) : > >Pathogen_Individuals[1,3] <- c(2,3,4) >Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : > number of items to replace is not a multiple of replacement length > >In future I'm also going to have more variables such as whether a gene >is expressed. Such information may require a matrix in itself - >something like: > > > Effector ID Sequence Expressed? > [1,] 0001 345,567,678 1 (or 0). > >Is there a way then I can put more than one value in the cell like a >list of values, or a way to put objects in a cell of a data frame, >matrix or table etc. Almost an inception deal - data structures nested >in a data structure? If I search for things like "insert list into >matrix" I get results like how to turn one into another, which is not >what I think I need to be doing. > >I have been considering having several data structures not nested in >each other, something like for every individual create a new matrix >object with the name Effectors_[Individual_ID] and some how get my >simulation loops operating on those objects but I find it hard to see >how to tell R all of those matrices are to be included in an operation, >as you can all lines of a data frame for example with for loops. >This is strange for me because this model was written in a macro-code >for another program which handles data in a different format and layout >to R. > >My problem is I think, each individual in the model has many variables >- in this case representations of genes. So I'm having trouble getting >my head about this. > >Hopefully someone more experienced will be able to offer advice or a >solution, it will be very appreciated. > >Many Thanks, >Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). > >P.S. I have searched previous queries to the list, and I'm not sure but >this may be useful for relevant: > > >Have you thought of using a list? > >> a <- matrix(1:10, nrow=2) >> b <- 1:5 >> x <- list(a=a, b=b) >> x >$a > [,1] [,2] [,3] [,4] [,5] >[1,] 1 3 5 7 9 >[2,] 2 4 6 8 10 > >$b >[1] 1 2 3 4 5 > >> x$a > [,1] [,2] [,3] [,4] [,5] >[1,] 1 3 5 7 9 >[2,] 2 4 6 8 10 >> x$b >[1] 1 2 3 4 5 > >oliveoil and yarn datasets have been mentioned. > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
HI, May be this helps. dat1<-data.frame(ID=formatC(0001:0010,width=4,flag="0"),No_of_Effectors=rep(3,10)) dat1<-within(dat1,{ID<-as.character(ID)}) list1<-lapply(1:nrow(dat1),function(x) paste(sample(1:10000,3,replace=TRUE)),sep=",") dat2<-data.frame(dat1,do.call(rbind,lapply(lapply(1:nrow(dat1),function(x) sample(1:10000,3,replace=TRUE)),function(x) paste(x,collapse=",")))) colnames(dat2)[3]<-"Effectors" ?dat2 #???? ID No_of_Effectors????? Effectors #1? 0001?????????????? 3 4759,8109,7997 #2? 0002?????????????? 3 2649,9496,9167 #3? 0003?????????????? 3 4229,3282,6235 #4? 0004?????????????? 3 5388,3088,6420 #5? 0005?????????????? 3 5602,5981,4749 #6? 0006?????????????? 3 4971,6956,5913 #7? 0007?????????????? 3? 4999,9465,799 #8? 0008?????????????? 3? 8419,4346,266 #9? 0009?????????????? 3 9329,8819,4011 #10 0010?????????????? 3 5817,8729,6499 ?dat3<-within(dat2,{Effectors<-as.character(Effectors)}) #converting back the Effector column to numeric 3 columns res<-do.call(rbind,lapply(strsplit(dat3[,3],","),function(x) as.numeric(x))) ?head(res) #???? [,1] [,2] [,3] #[1,] 4759 8109 7997 #[2,] 2649 9496 9167 #[3,] 4229 3282 6235 #[4,] 5388 3088 6420 #[5,] 5602 5981 4749 #[6,] 4971 6956 5913 A.K. ----- Original Message ----- From: Benjamin Ward (ENV) <B.Ward at uea.ac.uk> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Sunday, October 28, 2012 5:32 AM Subject: [R] Having some Trouble Data Structures Hi All, I'm trying to run a simulation of host-pathogen evolution based around individuals. What I need to have is a dataframe or table of some description - describing all the individuals of a pathogen population (so far I've implemented this as a matrix): ? ? ? ? ID? ? ? ? No_of_Effectors? ? ? ? ? ? ? ? ? Effectors (Sequences) ? [1,] 0001? ? ? ? ? ? ? 3? ? ? ? ? ? ? ? ? ##? 3 Random Numbers ## There will be many such rows for many individuals. They have something called effectors, the number of which is randomly generated, so say you get 3 in the No_of_Effectors column. Then I make R generate 3 numbers from between 1 and 10,000, this gives me three numerical representations of genes. These numbers will be compared to a similar data structure of the host individuals who have their immune genes with similar numbers. My problem is that obviously I can't stick 3 numbers in one "cell" of the matrix (I've tried) : Pathogen_Individuals[1,3] <- c(2,3,4) Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : ? number of items to replace is not a multiple of replacement length In future I'm also going to have more variables such as whether a gene is expressed. Such information may require a matrix in itself - something like: ? ? ? ? Effector ID? ? ? ? ? ? Sequence? ? ? ? ? ? ? ? ? Expressed? ? [1,]? ? 0001? ? ? ? ? ? ? 345,567,678? ? ? ? ? ? ? ? ? ? ? 1 (or 0). Is there a way then I can put more than one value in the cell like a list of values, or a way to put objects in a cell of a data frame, matrix or table etc. Almost an inception deal - data structures nested in a data structure? If I search for things like "insert list into matrix" I get results like how to turn one into another, which is not what I think I need to be doing. I have been considering having several data structures not nested in each other, something like for every individual create a new matrix object with the name Effectors_[Individual_ID] and some how get my simulation loops operating on those objects but I find it hard to see how to tell R all of those matrices are to be included in an operation, as you can all lines of a data frame for example with for loops. This is strange for me because this model was written in a macro-code for another program which handles data in a different format and layout to R. My problem is I think, each individual in the model has many variables - in this case representations of genes. So I'm having trouble getting my head about this. Hopefully someone more experienced will be able to offer advice or a solution, it will be very appreciated. Many Thanks, Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). P.S. I have searched previous queries to the list, and I'm not sure but this may be useful for relevant: Have you thought of using a list?> a <- matrix(1:10, nrow=2) > b <- 1:5 > x <- list(a=a, b=b) > x$a ? ? [,1] [,2] [,3] [,4] [,5] [1,]? ? 1? ? 3? ? 5? ? 7? ? 9 [2,]? ? 2? ? 4? ? 6? ? 8? 10 $b [1] 1 2 3 4 5> x$a? ? [,1] [,2] [,3] [,4] [,5] [1,]? ? 1? ? 3? ? 5? ? 7? ? 9 [2,]? ? 2? ? 4? ? 6? ? 8? 10> x$b[1] 1 2 3 4 5 oliveoil and yarn datasets have been mentioned. ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, In addition to using paste(), you can also try this: dat1<-data.frame(ID=formatC(0001:0010,width=4,flag="0"),No_of_Effectors=rep(3,10)) dat1<-within(dat1,{ID<-as.character(ID)}) list1<-lapply(1:nrow(dat1),function(x) sample(1:10000,3,replace=TRUE)) dat2<-data.frame(dat1,Effectors=I(list1)) ?str(dat2) #'data.frame':??? 10 obs. of? 3 variables: # $ ID???????????? : chr? "0001" "0002" "0003" "0004" ... # $ No_of_Effectors: num? 3 3 3 3 3 3 3 3 3 3 # $ Effectors????? :List of 10 ?# ..$ : int? 6155 979 3079 ? #..$ : int? 690 5515 9469 ?# ..$ : int? 903 7439 7582 ? #..$ : int? 9788 5930 7456 ? #..$ : int? 8106 8319 2396 ? #..$ : int? 8050 5299 264 ? #..$ : int? 5558 7401 8865 ? #..$ : int? 7178 7273 4065 ? #..$ : int? 2135 75 7571 ? #..$ : int? 6652 9900 2313 ? #..- attr(*, "class")= chr "AsIs" head(dat2) #??? ID No_of_Effectors??? Effectors #1 0001?????????????? 3 6155, 97.... #2 0002?????????????? 3 690, 551.... #3 0003?????????????? 3 903, 743.... #4 0004?????????????? 3 9788, 59.... #5 0005?????????????? 3 8106, 83.... #6 0006?????????????? 3 8050, 52.... BTW, I had a line of code in my previous reply which was not required and it will not work.? #list1<-lapply(1:nrow(dat1),function(x) paste(sample(1:10000,3,replace=TRUE)),sep=",") A.K. ----- Original Message ----- From: Benjamin Ward (ENV) <B.Ward at uea.ac.uk> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Sunday, October 28, 2012 5:32 AM Subject: [R] Having some Trouble Data Structures Hi All, I'm trying to run a simulation of host-pathogen evolution based around individuals. What I need to have is a dataframe or table of some description - describing all the individuals of a pathogen population (so far I've implemented this as a matrix): ? ? ? ? ID? ? ? ? No_of_Effectors? ? ? ? ? ? ? ? ? Effectors (Sequences) ? [1,] 0001? ? ? ? ? ? ? 3? ? ? ? ? ? ? ? ? ##? 3 Random Numbers ## There will be many such rows for many individuals. They have something called effectors, the number of which is randomly generated, so say you get 3 in the No_of_Effectors column. Then I make R generate 3 numbers from between 1 and 10,000, this gives me three numerical representations of genes. These numbers will be compared to a similar data structure of the host individuals who have their immune genes with similar numbers. My problem is that obviously I can't stick 3 numbers in one "cell" of the matrix (I've tried) : Pathogen_Individuals[1,3] <- c(2,3,4) Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : ? number of items to replace is not a multiple of replacement length In future I'm also going to have more variables such as whether a gene is expressed. Such information may require a matrix in itself - something like: ? ? ? ? Effector ID? ? ? ? ? ? Sequence? ? ? ? ? ? ? ? ? Expressed? ? [1,]? ? 0001? ? ? ? ? ? ? 345,567,678? ? ? ? ? ? ? ? ? ? ? 1 (or 0). Is there a way then I can put more than one value in the cell like a list of values, or a way to put objects in a cell of a data frame, matrix or table etc. Almost an inception deal - data structures nested in a data structure? If I search for things like "insert list into matrix" I get results like how to turn one into another, which is not what I think I need to be doing. I have been considering having several data structures not nested in each other, something like for every individual create a new matrix object with the name Effectors_[Individual_ID] and some how get my simulation loops operating on those objects but I find it hard to see how to tell R all of those matrices are to be included in an operation, as you can all lines of a data frame for example with for loops. This is strange for me because this model was written in a macro-code for another program which handles data in a different format and layout to R. My problem is I think, each individual in the model has many variables - in this case representations of genes. So I'm having trouble getting my head about this. Hopefully someone more experienced will be able to offer advice or a solution, it will be very appreciated. Many Thanks, Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). P.S. I have searched previous queries to the list, and I'm not sure but this may be useful for relevant: Have you thought of using a list?> a <- matrix(1:10, nrow=2) > b <- 1:5 > x <- list(a=a, b=b) > x$a ? ? [,1] [,2] [,3] [,4] [,5] [1,]? ? 1? ? 3? ? 5? ? 7? ? 9 [2,]? ? 2? ? 4? ? 6? ? 8? 10 $b [1] 1 2 3 4 5> x$a? ? [,1] [,2] [,3] [,4] [,5] [1,]? ? 1? ? 3? ? 5? ? 7? ? 9 [2,]? ? 2? ? 4? ? 6? ? 8? 10> x$b[1] 1 2 3 4 5 oliveoil and yarn datasets have been mentioned. ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Please keep mail threads on the mailing list. Please follow the posting guidelines and provide a sample of data and desired outcome. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:>Hi, >Thank you very much for your reply - how you prefer, is how my >supervisor implemented the layout in Minitab, however I was unsure of >how to get R to do this repeating ID behaviour and how to know that in >a for loop going through individual 1 to say 10, I want it to: > >Randomly sample a number from a distribution for the number of >effectors (I can do this but with runif), > >Then put one value in a cell of the Effector column and repeat the ID >for each effector row. I'm also then left wondering when I do for loops >then that use ID, will it go and apply operations row by row, or ID by >ID - for example in the immunology part I would need a loop to check >individual by individual if any of the effectors it has means death in >the host, in which case all instances of - say ID "1" would need to be >deleted. > >Would you be able to provide an example chunk of how you accomplish >this with your preferred approach, if you have the time? > >Thanks, >Ben W. > >________________________________________ >From: Jeff Newmiller [jdnewmil at dcn.davis.ca.us] >Sent: 28 October 2012 15:27 >To: Benjamin Ward (ENV); r-help at r-project.org >Subject: Re: [R] Having some Trouble Data Structures > >Search on "ragged array". > >My preferred approach is to use a data frame with one row per effector >that repeats the per-ID information. If that occupies too much memory, >you can setup another data frame with one row per ID and refer to that >information as using lapply and subset the effectors data as needed. >The plyr package is also useful for such processing. >--------------------------------------------------------------------------- >Jeff Newmiller The ..... ..... Go >Live... >DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... > Live: OO#.. Dead: OO#.. Playing >Research Engineer (Solar/Batteries O.O#. #.O#. with >/Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >--------------------------------------------------------------------------- >Sent from my phone. Please excuse my brevity. > >"Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote: > >>Hi All, >> >>I'm trying to run a simulation of host-pathogen evolution based around >>individuals. >>What I need to have is a dataframe or table of some description - >>describing all the individuals of a pathogen population (so far I've >>implemented this as a matrix): >> >> ID No_of_Effectors Effectors >(Sequences) >> [1,] 0001 3 ## 3 Random Numbers ## >> >>There will be many such rows for many individuals. They have something >>called effectors, the number of which is randomly generated, so say >you >>get 3 in the No_of_Effectors column. Then I make R generate 3 numbers >>from between 1 and 10,000, this gives me three numerical >>representations of genes. These numbers will be compared to a similar >>data structure of the host individuals who have their immune genes >with >>similar numbers. >> >>My problem is that obviously I can't stick 3 numbers in one "cell" of >>the matrix (I've tried) : >> >>Pathogen_Individuals[1,3] <- c(2,3,4) >>Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : >> number of items to replace is not a multiple of replacement length >> >>In future I'm also going to have more variables such as whether a gene >>is expressed. Such information may require a matrix in itself - >>something like: >> >> >> Effector ID Sequence Expressed? >> [1,] 0001 345,567,678 1 (or >0). >> >>Is there a way then I can put more than one value in the cell like a >>list of values, or a way to put objects in a cell of a data frame, >>matrix or table etc. Almost an inception deal - data structures nested >>in a data structure? If I search for things like "insert list into >>matrix" I get results like how to turn one into another, which is not >>what I think I need to be doing. >> >>I have been considering having several data structures not nested in >>each other, something like for every individual create a new matrix >>object with the name Effectors_[Individual_ID] and some how get my >>simulation loops operating on those objects but I find it hard to see >>how to tell R all of those matrices are to be included in an >operation, >>as you can all lines of a data frame for example with for loops. >>This is strange for me because this model was written in a macro-code >>for another program which handles data in a different format and >layout >>to R. >> >>My problem is I think, each individual in the model has many variables >>- in this case representations of genes. So I'm having trouble getting >>my head about this. >> >>Hopefully someone more experienced will be able to offer advice or a >>solution, it will be very appreciated. >> >>Many Thanks, >>Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). >> >>P.S. I have searched previous queries to the list, and I'm not sure >but >>this may be useful for relevant: >> >> >>Have you thought of using a list? >> >>> a <- matrix(1:10, nrow=2) >>> b <- 1:5 >>> x <- list(a=a, b=b) >>> x >>$a >> [,1] [,2] [,3] [,4] [,5] >>[1,] 1 3 5 7 9 >>[2,] 2 4 6 8 10 >> >>$b >>[1] 1 2 3 4 5 >> >>> x$a >> [,1] [,2] [,3] [,4] [,5] >>[1,] 1 3 5 7 9 >>[2,] 2 4 6 8 10 >>> x$b >>[1] 1 2 3 4 5 >> >>oliveoil and yarn datasets have been mentioned. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help at r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code.
From: Benjamin Ward (ENV) Sent: 03 November 2012 13:29 To: Jeff Newmiller; r-help at r-project.org Subject: RE: [R] Having some Trouble Data Structures Hi, Thank you very much for your reply - how you prefer, is how my supervisor implemented the layout in Minitab, however I was unsure of how to get R to do this repeating ID behaviour and how to know that in a for loop going through individual 1 to say 10, I want it to: Randomly sample a number from a distribution for the number of effectors (I can do this but with runif), Then put one value in a cell of the Effector column and repeat the ID for each effector row. I'm also then left wondering when I do for loops then that use ID, will it go and apply operations row by row, or ID by ID - for example in the immunology part I would need a loop to check individual by individual if any of the effectors it has means death in the host, in which case all instances of - say ID "1" would need to be deleted. Would you be able to provide an example chunk of how you accomplish this with your preferred approach, if you have the time? Thanks, Ben W. ________________________________________ From: Jeff Newmiller [jdnewmil at dcn.davis.ca.us] Sent: 28 October 2012 15:27 To: Benjamin Ward (ENV); r-help at r-project.org Subject: Re: [R] Having some Trouble Data Structures Search on "ragged array". My preferred approach is to use a data frame with one row per effector that repeats the per-ID information. If that occupies too much memory, you can setup another data frame with one row per ID and refer to that information as using lapply and subset the effectors data as needed. The plyr package is also useful for such processing. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:>Hi All, > >I'm trying to run a simulation of host-pathogen evolution based around >individuals. >What I need to have is a dataframe or table of some description - >describing all the individuals of a pathogen population (so far I've >implemented this as a matrix): > > ID No_of_Effectors Effectors (Sequences) > [1,] 0001 3 ## 3 Random Numbers ## > >There will be many such rows for many individuals. They have something >called effectors, the number of which is randomly generated, so say you >get 3 in the No_of_Effectors column. Then I make R generate 3 numbers >from between 1 and 10,000, this gives me three numerical >representations of genes. These numbers will be compared to a similar >data structure of the host individuals who have their immune genes with >similar numbers. > >My problem is that obviously I can't stick 3 numbers in one "cell" of >the matrix (I've tried) : > >Pathogen_Individuals[1,3] <- c(2,3,4) >Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) : > number of items to replace is not a multiple of replacement length > >In future I'm also going to have more variables such as whether a gene >is expressed. Such information may require a matrix in itself - >something like: > > > Effector ID Sequence Expressed? > [1,] 0001 345,567,678 1 (or 0). > >Is there a way then I can put more than one value in the cell like a >list of values, or a way to put objects in a cell of a data frame, >matrix or table etc. Almost an inception deal - data structures nested >in a data structure? If I search for things like "insert list into >matrix" I get results like how to turn one into another, which is not >what I think I need to be doing. > >I have been considering having several data structures not nested in >each other, something like for every individual create a new matrix >object with the name Effectors_[Individual_ID] and some how get my >simulation loops operating on those objects but I find it hard to see >how to tell R all of those matrices are to be included in an operation, >as you can all lines of a data frame for example with for loops. >This is strange for me because this model was written in a macro-code >for another program which handles data in a different format and layout >to R. > >My problem is I think, each individual in the model has many variables >- in this case representations of genes. So I'm having trouble getting >my head about this. > >Hopefully someone more experienced will be able to offer advice or a >solution, it will be very appreciated. > >Many Thanks, >Ben Ward (ENV, UEA & The Sainsbury Lab, JIC). > >P.S. I have searched previous queries to the list, and I'm not sure but >this may be useful for relevant: > > >Have you thought of using a list? > >> a <- matrix(1:10, nrow=2) >> b <- 1:5 >> x <- list(a=a, b=b) >> x >$a > [,1] [,2] [,3] [,4] [,5] >[1,] 1 3 5 7 9 >[2,] 2 4 6 8 10 > >$b >[1] 1 2 3 4 5 > >> x$a > [,1] [,2] [,3] [,4] [,5] >[1,] 1 3 5 7 9 >[2,] 2 4 6 8 10 >> x$b >[1] 1 2 3 4 5 > >oliveoil and yarn datasets have been mentioned. > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.