I'm new to R. I am trying to create panel data with a slight alteration from a typical dataset. At present, I have data on a few hundred people with the dates of occurrences for several events (like marriage and employment). The dates are in year/quarter format, so 68.0 equals the 1st quarter of 1968 and 68.25 equals the 2nd quarter of 1968. If the event never occurred, 0 is recorded for the Year Of Occurrence. Somewhat redundantly, I also have separate dichotomous variables indicating whether the event ever occurred (0/1 format). For example: x <- data.frame( id = c(1,2), Event1Occur = c(1,0), YearOfOccurEvent1 c(68.25,0), Event2Occur = c(0,1), YearOfOccurEvent2 = c(0,68.5)) I need to transform that dataframe so that I have a separate row for each time period (year/quarter) for each person, with variables for whether the event had already occurred during that time period. If the event occurred during an earlier time, it is presumed to still be occurring at later times. E.g., if the person got married in the first quarter of 1968, they are presumed to still be married at all later time periods. I need those time periods marked (0/1). For example: y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c (68.0,68.25,68.50,68.75,69.0)) y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0) y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1) can someone get me started. Thanks Jeff
Sounds like you would find it worthwhile to read a good Intro R tutorial -- like the one that comes shipped with R. Have you done so? If not, why not? If so, how about the data import/export manual? I certainly wouldn't guarantee that these will answer all your questions. They're just places to start BEFORE posting here. Setting up proper data structures can be tricky (have you considered what form the functions/packages with which you are going to analyze the data want?). You might also find it useful to use Hadley Wickham's plyr and/or reshape2 packages, whose aim is to standardize and simplify data manipulation tasks. Vignettes/tutorials are available for both. Cheers, Bert On Mon, Jul 23, 2012 at 8:21 AM, Jeff <r at jp.pair.com> wrote:> > I'm new to R. > I am trying to create panel data with a slight alteration from a typical > dataset. > At present, I have data on a few hundred people with the dates of > occurrences for several events (like marriage and employment). The dates are > in year/quarter format, so 68.0 equals the 1st quarter of 1968 and 68.25 > equals the 2nd quarter of 1968. If the event never occurred, 0 is recorded > for the Year Of Occurrence. Somewhat redundantly, I also have separate > dichotomous variables indicating whether the event ever occurred (0/1 > format). > For example: > x <- data.frame( id = c(1,2), Event1Occur = c(1,0), YearOfOccurEvent1 > c(68.25,0), Event2Occur = c(0,1), YearOfOccurEvent2 = c(0,68.5)) > I need to transform that dataframe so that I have a separate row for each > time period (year/quarter) for each person, with variables for whether the > event had already occurred during that time period. If the event occurred > during an earlier time, it is presumed to still be occurring at later times. > E.g., if the person got married in the first quarter of 1968, they are > presumed to still be married at all later time periods. I need those time > periods marked (0/1). > For example: > y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c > (68.0,68.25,68.50,68.75,69.0)) > y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0) > y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1) > can someone get me started. > Thanks > Jeff > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
This looks really ugly but it 'may' do what you want. I was too lazy to generate enough raw data to check. Note i changed the names in x as they were a bit clumsy. x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5)) y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c (68.0,68.25,68.50,68.75,69.0)) y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0) y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1) x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5)) dd <- melt(x, id= c("id", "Event1", "Event2"), value.name="year.quarter" ) dd1 <- subset(dd, dd[, 5] != 0 ) dd1 <- dd1[ , c(1,2,3,5)] John Kane Kingston ON Canada> -----Original Message----- > From: r at jp.pair.com > Sent: Mon, 23 Jul 2012 11:33:37 -0500 > To: gunter.berton at gene.com > Subject: Re: [R] Creating panel data > > At 10:38 AM 7/23/2012, you wrote: > >You might also find it useful to use Hadley Wickham's plyr > >and/or reshape2 packages, whose aim is to standardize and simplify > >data manipulation tasks. >> > >Cheers, > >Bert > > > I have already used R enough to have correctly imported the actual > data. After import, it is in the approximate format at the x > dataframe I previously posted. I already found the plyr and reshape2 > packages and had assumed that the cast (or dcast) options might be > the correct ones. Melt seemed to get me only what I already have. The > examples I have seen thus far start with data in a various formats > and end up in the format that I am starting with. In other words, > they seem to do the exact opposite of what I'm trying to do. So I'm > still stuck with how to get started and whether the functions in > reshape2 are actually the correct ones to consider. > > ...still looking for some help on this. > > Jeff > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
Apparently Analagous Threads
- coded to categorical variables in a large dataset
- Exporting columns into multiple files - loop query
- Nested anova with unbalanced design and corrected sample size for spatial autocorrelation
- no ForceFeedback in Wine
- Prototype Ajax - How to pass my own params to onComplete ?