R-masters, I have a problem that I have been working on for a while and it seems that there may be a simple solution that I have yet to figure out, so I thought that I would venture to post to the help list. Let's say there was a data.frame with three vectors, two that are factors identifying the data, and one that holds the frequency of occurrence (the events are binary, yes or no). I would like to perform logistic regression on this data, and it seems that I need a vector of 0s and 1s for input into lrm. How might I convert between a frequency table and a vector of binary data while still maintaining all identifier information? I have thought about using the rep command over and over again and basically building the data.frame "by hand" but that seems long and tedious. Is there a quick and dirty way of doing this? Thanks in advance! Kevin -- ------------------------------------ ------------------------------------ Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon University of Oregon Eugene, OR 97403 kemerson at dakrwing.uoregon.edu
Kevin J Emerson a ??crit :> R-masters, > > I have a problem that I have been working on for a while and it seems > that there may be a simple solution that I have yet to figure out, so I > thought that I would venture to post to the help list. > > Let's say there was a data.frame with three vectors, two that are > factors identifying the data, and one that holds the frequency of > occurrence (the events are binary, yes or no). I would like to perform > logistic regression on this data, and it seems that I need a vector of > 0s and 1s for input into lrm. How might I convert between a frequency > table and a vector of binary data while still maintaining all identifier > information? > > I have thought about using the rep command over and over again and > basically building the data.frame "by hand" but that seems long and > tedious. Is there a quick and dirty way of doing this? > > Thanks in advance! > KevinHi Kevin, I don't know lrm so can't answer on this point. However, you can fit log reg models using the "regular" glm. See ?glm. There are 3 ways to fit the model: 1. Fit binomial data with the syntax cbind(y, n - y) ~ x1 + x2, where y is the count of events of interest, and n the sample size for the covariate patterns defined by x1 and x2 2. Fit proportions, with the syntax y/n ~ x1 + x2, weights = n 3. "Unfold" the data as you suggest. I guess many people wrote utility functions for this purpose. One of them is available in the package aod (on CRAN) and is called splitbin: > data(orob2) > head(orob2) seed root n y 1 O75 BEAN 39 10 2 O75 BEAN 62 23 3 O75 BEAN 81 23 4 O75 BEAN 51 26 5 O75 BEAN 39 17 6 O75 CUCUMBER 6 5 > res <- splitbin(cbind(y, n - y) ~ root + seed, orob2) > res[1:39, ] id y root seed 1 1 0 BEAN O75 2 1 0 BEAN O75 3 1 0 BEAN O75 4 1 0 BEAN O75 5 1 0 BEAN O75 6 1 0 BEAN O75 7 1 0 BEAN O75 8 1 0 BEAN O75 9 1 0 BEAN O75 10 1 0 BEAN O75 11 1 0 BEAN O75 12 1 0 BEAN O75 13 1 0 BEAN O75 14 1 0 BEAN O75 15 1 0 BEAN O75 16 1 0 BEAN O75 17 1 0 BEAN O75 18 1 0 BEAN O75 19 1 0 BEAN O75 20 1 0 BEAN O75 21 1 0 BEAN O75 22 1 0 BEAN O75 23 1 0 BEAN O75 24 1 0 BEAN O75 25 1 0 BEAN O75 26 1 0 BEAN O75 27 1 0 BEAN O75 28 1 0 BEAN O75 29 1 0 BEAN O75 30 1 1 BEAN O75 31 1 1 BEAN O75 32 1 1 BEAN O75 33 1 1 BEAN O75 34 1 1 BEAN O75 35 1 1 BEAN O75 36 1 1 BEAN O75 37 1 1 BEAN O75 38 1 1 BEAN O75 39 1 1 BEAN O75 -- Dr Renaud Lancelot, v??t??rinaire Projet FSP r??gional ??pid??miologie v??t??rinaire C/0 Ambassade de France - SCAC BP 834 Antananarivo 101 - Madagascar e-mail: renaud.lancelot at cirad.fr tel.: +261 32 40 165 53 (cell) +261 20 22 665 36 ext. 225 (work) +261 20 22 494 37 (home)
Here's an example of how to replicate rows according to a count that is provided in one of the variables.> foo <- data.frame(id=letters[1:3],cl=LETTERS[1:3],n.yes=c(3,5,2)) > fooid cl n.yes 1 a A 3 2 b B 5 3 c C 2> cbind(foo[rep(1:nrow(foo),foo$n.yes),c('id','cl')],res=rep(1,sum(foo$n.yes)))id cl res 1 a A 1 1.1 a A 1 1.2 a A 1 2 b B 1 2.1 b B 1 2.2 b B 1 2.3 b B 1 2.4 b B 1 3 c C 1 3.1 c C 1 I assumed your third column is the frequency of "yes" events; I don't know w here you meant the frequency of "no" events to come from. -Don At 11:38 AM -0700 6/20/05, Kevin J Emerson wrote:>R-masters, > >I have a problem that I have been working on for a while and it seems >that there may be a simple solution that I have yet to figure out, so I >thought that I would venture to post to the help list. > >Let's say there was a data.frame with three vectors, two that are >factors identifying the data, and one that holds the frequency of >occurrence (the events are binary, yes or no). I would like to perform >logistic regression on this data, and it seems that I need a vector of >0s and 1s for input into lrm. How might I convert between a frequency >table and a vector of binary data while still maintaining all identifier >information? > >I have thought about using the rep command over and over again and >basically building the data.frame "by hand" but that seems long and >tedious. Is there a quick and dirty way of doing this? > >Thanks in advance! >Kevin >-- >------------------------------------ >------------------------------------ >Kevin J Emerson >Center for Ecology and Evolutionary Biology >1210 University of Oregon >University of Oregon >Eugene, OR 97403 >kemerson at dakrwing.uoregon.edu > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA