Hi all, I have a dataset consisting of 5 columns and over 5000 rows. Each row gives information about an individual animal, including longevity, i.e. at what age an animal died. For the model I use I need to create n rows for each animal, n being its longevity, and a new column 'survival' with a binary 0/1 outcome. When an animal died e.g. at age 5, there have to be 5 rows of identical data, except 4 with 0 (=alive) for 'survival', and 1 row with '1' for 'survival'. I thought of creating matrices for each individual, adding first one column 'survival' containing zeros to the original dataset, then creating matrices with data = 'the vector containing all elements of an individual/row' ([1,], nrow = [a,b], exctracting the element for longevity, and then with byrow = TRUE letting the data be filled in by row. At the end I would have to set the last element in 'survival' to '1', and then combine all matrices into one single one. So far I've used Excel to create these datesets manually, but with more than 1000 individuals this gets really tedious. I haven't used R before for this sort of a bit more advanced data manipulation, and I would really appreciate any input/primer about how people would go about doing this. Thanks, Felix ______________________________________________________________ ::Felix Zajitschek Evolution & Ecology Research Centre School of Biological, Earth and Environmental Sciences University of New South Wales - Sydney NSW 2052 - Australia Tel +61 (0)2 9385 8068 Fax +61 (0)2 9385 1558 eMail <mailto:felix.zajitschek@unsw.edu.au> felix.zajitschek@unsw.edu.au <http://www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.htm l> www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.html [[alternative HTML version deleted]]
Since you did not provide a sample of your data, here is an example of how to take a vector and create a matrix with 5 entries for each value, with the extra ones having a zero in the second column:> x <- sample(1:7, 20, T) > table(x)x 1 2 3 4 5 6 7 2 4 3 2 4 4 1> # create a matrix with 5 rows of each value in the vector 'x' with the extra rows > # having 0 in the second column > x.l <- lapply(split(x, x), function(.val){+ # pad with at least 5 extra rows to make sure matrix is filled out + z <- cbind(c(.val, rep(.val[1],5)), c(rep(1, length(.val)), rep(0,5))) + z[1:5,] # only return the first 5 + })> # output the new matrix > do.call(rbind, x.l)[,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 1 0 [4,] 1 0 [5,] 1 0 [6,] 2 1 [7,] 2 1 [8,] 2 1 [9,] 2 1 [10,] 2 0 [11,] 3 1 [12,] 3 1 [13,] 3 1 [14,] 3 0 [15,] 3 0 [16,] 4 1 [17,] 4 1 [18,] 4 0 [19,] 4 0 [20,] 4 0 [21,] 5 1 [22,] 5 1 [23,] 5 1 [24,] 5 1 [25,] 5 0 [26,] 6 1 [27,] 6 1 [28,] 6 1 [29,] 6 1 [30,] 6 0 [31,] 7 1 [32,] 7 0 [33,] 7 0 [34,] 7 0 [35,] 7 0>On Thu, Mar 20, 2008 at 1:51 AM, Felix Zajitschek - UNSW <felix.zajitschek at unsw.edu.au> wrote:> Hi all, > > I have a dataset consisting of 5 columns and over 5000 rows. Each row > gives information about an individual animal, including longevity, i.e. > at what age an animal died. > For the model I use I need to create n rows for each animal, n being its > longevity, and a new column 'survival' with a binary 0/1 outcome. When > an animal died e.g. at age 5, there have to be 5 rows of identical data, > except 4 with 0 (=alive) for 'survival', and 1 row with '1' for > 'survival'. > > I thought of creating matrices for each individual, adding first one > column 'survival' containing zeros to the original dataset, then > creating matrices with data = 'the vector containing all elements of an > individual/row' ([1,], nrow = [a,b], exctracting the element for > longevity, and then with byrow = TRUE letting the data be filled in by > row. At the end I would have to set the last element in 'survival' to > '1', and then combine all matrices into one single one. > > So far I've used Excel to create these datesets manually, but with more > than 1000 individuals this gets really tedious. I haven't used R before > for this sort of a bit more advanced data manipulation, and I would > really appreciate any input/primer about how people would go about doing > this. > > Thanks, > Felix > > > ______________________________________________________________ > ::Felix Zajitschek > Evolution & Ecology Research Centre > School of Biological, Earth and Environmental Sciences > University of New South Wales - Sydney NSW 2052 - Australia > Tel +61 (0)2 9385 8068 > Fax +61 (0)2 9385 1558 > eMail <mailto:felix.zajitschek at unsw.edu.au> > felix.zajitschek at unsw.edu.au > > <http://www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.htm > l> www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.html > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
There may be a less baroque way of doing it, but does this do what you want? Say you have a data.frame called dat:> datx1 x2 Longevity 1 -1.9582519 a 4 2 0.8724081 b 2 3 -0.9150847 c 5 # now create a new long data.frame:> dat.long <- as.data.frame(mapply(function (x) rep(x, dat$Longevity), dat[,1:2]))# Add in the survival column:> dat.long$Survival <- unlist(sapply(dat$Longevity, function (x) c(rep(0, x-1),1))) > dat.longx1 x2 Survival 1 -1.95825191986208 a 0 2 -1.95825191986208 a 0 3 -1.95825191986208 a 0 4 -1.95825191986208 a 1 5 0.872408144284977 b 0 6 0.872408144284977 b 1 7 -0.91508470125413 c 0 8 -0.91508470125413 c 0 9 -0.91508470125413 c 0 10 -0.91508470125413 c 0 11 -0.91508470125413 c 1 HTH, Simon. Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey. -----Original Message----- From: r-help-bounces@r-project.org on behalf of Felix Zajitschek - UNSW Sent: Thu 20/03/2008 4:51 PM To: r-help@r-project.org Subject: [R] create matrix Hi all, I have a dataset consisting of 5 columns and over 5000 rows. Each row gives information about an individual animal, including longevity, i.e. at what age an animal died. For the model I use I need to create n rows for each animal, n being its longevity, and a new column 'survival' with a binary 0/1 outcome. When an animal died e.g. at age 5, there have to be 5 rows of identical data, except 4 with 0 (=alive) for 'survival', and 1 row with '1' for 'survival'. I thought of creating matrices for each individual, adding first one column 'survival' containing zeros to the original dataset, then creating matrices with data = 'the vector containing all elements of an individual/row' ([1,], nrow = [a,b], exctracting the element for longevity, and then with byrow = TRUE letting the data be filled in by row. At the end I would have to set the last element in 'survival' to '1', and then combine all matrices into one single one. So far I've used Excel to create these datesets manually, but with more than 1000 individuals this gets really tedious. I haven't used R before for this sort of a bit more advanced data manipulation, and I would really appreciate any input/primer about how people would go about doing this. Thanks, Felix ______________________________________________________________ ::Felix Zajitschek Evolution & Ecology Research Centre School of Biological, Earth and Environmental Sciences University of New South Wales - Sydney NSW 2052 - Australia Tel +61 (0)2 9385 8068 Fax +61 (0)2 9385 1558 eMail <mailto:felix.zajitschek@unsw.edu.au> felix.zajitschek@unsw.edu.au <http://www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.htm l> www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.html [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
It depends on how you have your data laid out. This is clumsy but I think it will work. There should be an easier way than a loop but I don't see it. People with more experience in R will likely have much better solutions. In any case it sure beats Excel or any other spreadsheet :) ===================================================== # Create mock data set (data.frame) with four cows. i <- 1:4 cows <- paste("cow",i, sep="") food <- sample(c("hay", "grain"), 4, replace=TRUE) cattle <- data.frame(cows, food) # Vecctor of longevity data age <- sample(3:9, 4, replace = TRUE) #Create empty list mylist <- NULL for(i in 1:length(cattle[,1])) { count <-(seq(1,age[i])) dead <- c(rep(0,length(count)-1),1) newcow <- data.frame(merge(data.frame(cattle[i,]), count),dead) mylist[[i]] <- newcow } # Turn mylist into a data.frame mydata <- do.call(rbind, mylist) # Get rid of unneeded count varable. mydata <- mydata[,-3] mydata =====================================================--- Felix Zajitschek - UNSW <felix.zajitschek at unsw.edu.au> wrote:> Hi all, > > I have a dataset consisting of 5 columns and over > 5000 rows. Each row > gives information about an individual animal, > including longevity, i.e. > at what age an animal died. > For the model I use I need to create n rows for each > animal, n being its > longevity, and a new column 'survival' with a binary > 0/1 outcome. When > an animal died e.g. at age 5, there have to be 5 > rows of identical data, > except 4 with 0 (=alive) for 'survival', and 1 row > with '1' for > 'survival'. > > I thought of creating matrices for each individual, > adding first one > column 'survival' containing zeros to the original > dataset, then > creating matrices with data = 'the vector containing > all elements of an > individual/row' ([1,], nrow = [a,b], exctracting the > element for > longevity, and then with byrow = TRUE letting the > data be filled in by > row. At the end I would have to set the last element > in 'survival' to > '1', and then combine all matrices into one single > one. > > So far I've used Excel to create these datesets > manually, but with more > than 1000 individuals this gets really tedious. I > haven't used R before > for this sort of a bit more advanced data > manipulation, and I would > really appreciate any input/primer about how people > would go about doing > this. > > Thanks, > Felix > > >______________________________________________________________> ::Felix Zajitschek > Evolution & Ecology Research Centre > School of Biological, Earth and Environmental > Sciences > University of New South Wales - Sydney NSW 2052 - > Australia > Tel +61 (0)2 9385 8068 > Fax +61 (0)2 9385 1558 > eMail <mailto:felix.zajitschek at unsw.edu.au> > felix.zajitschek at unsw.edu.au > ><http://www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.htm> l> >www.bees.unsw.edu.au/school/researchstudents/zajitschekfelix.html> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >[[elided trailing spam]]