giacomo begnis
2015-Jun-24 18:26 UTC
[R] create a dummy variables for companies with complete history.
Hi, I have a dataset ?(728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) attach(z) n<-length(z$cod) ?// number of obs dataset d1<-numeric(n) ? // dummy variable for (i in 5:n) ?{ ?? if (z$cod[i]==z$cod[i-4]) ? ? ? ? ? ? // cod is the code of a company? ? ? ? ? ? ?{ d1[i]<=1} else { d1[i]<=0} ? ? ? ? ?// d1=1 for a company with complete history, d1=0 if the history is not complete??}d1 When I run the program d1 is always equal to zero. Why? Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge ?I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm [[alternative HTML version deleted]]
Sarah Goslee
2015-Jun-24 18:49 UTC
[R] create a dummy variables for companies with complete history.
Please repost your question in plain text rather than HTML - you can see below that your code got rather mangled. Please also include some sample data using dput() - made-up data of similar form is fine, but it's very hard to answer a question based on guessing what the data look like. Sarah On Wed, Jun 24, 2015 at 2:26 PM, giacomo begnis <gmbegnis at yahoo.it> wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset > > d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 > When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] >-- Sarah Goslee http://www.functionaldiversity.org
Michael Dewey
2015-Jun-24 19:11 UTC
[R] create a dummy variables for companies with complete history.
Comments below On 24/06/2015 19:26, giacomo begnis wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset >Could also use nrow(z)> d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company{ d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type <= which means less than or equals to? If so, try replacing it with <- and see what happens.> When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
David L Carlson
2015-Jun-24 20:36 UTC
[R] create a dummy variables for companies with complete history.
You may want to consider another way of getting your answer that takes advantage of some of R's features:> # Make some example data > cods <- LETTERS[1:10] # Ten companies > yrs <- 2010:2014 # 5 years > set.seed(42) # Set random seed so we all get the same values > # Chances of revenue for a given year are 95% > rev <- round(rbinom(50, 1, .95)*runif(50, 25, 50), 2) > z <- data.frame(expand.grid(year=yrs, cod=cods)[, 2:1], rev) > # Remove years with missing (0) revenue > z <- z[z$rev > 1, ] > str(z)'data.frame': 45 obs. of 3 variables: $ cod : Factor w/ 10 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ... $ year: int 2010 2011 2012 2013 2014 2010 2011 2012 2013 2014 ... $ rev : num 33.3 33.7 35 44.6 26 ...> > # Construct the dummy variable > tbl <- xtabs(~cod+year, z) > tblyear cod 2010 2011 2012 2013 2014 A 1 1 1 1 1 B 1 1 1 1 1 C 1 1 1 1 1 D 1 0 1 1 1 E 1 1 0 1 1 F 1 1 1 1 1 G 1 1 1 1 1 H 1 1 1 1 1 I 1 1 1 0 1 J 0 1 1 0 1> dummy <- as.integer(apply(tbl, 1, all)) > dummy[1] 1 1 1 0 0 1 1 1 0 0 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey Sent: Wednesday, June 24, 2015 2:12 PM To: giacomo begnis; r-help at r-project.org Subject: Re: [R] create a dummy variables for companies with complete history. Comments below On 24/06/2015 19:26, giacomo begnis wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset >Could also use nrow(z)> d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company{ d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type <= which means less than or equals to? If so, try replacing it with <- and see what happens.> When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Mark Sharp
2015-Jun-24 21:00 UTC
[R] create a dummy variables for companies with complete history.
Giacomo, Please include some representative data. It is not clear why your offset of 4 (z$cod[i - 4]) is going to be an accurate surrogate for complete data. Since I do not have your data set or its true structure I am having to guess. # make 5 copies of 200 companies companies <- paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5)) companies <- companies[order(companies)] years <- rep(1:5, 200) z <- data.frame(cod = companies, year = years, revenue = round(rnorm(1000, mean = 100000, sd = 10000))) # trim this down to the 728 rows you have by pulling out records at random set.seed(1) # so that you can repeat these results z <- z[sample.int(1000, 728), ] z <- z[order(z$cod, z$year), ] #No matter how you order these data, your offset approach will not tell you which companies have full records.> head(z, 10)cod year revenue 1 A1 1 112192 2 A1 2 105840 4 A1 4 112357 5 A1 5 91772 7 A10 2 102601 8 A10 3 105183 11 A11 1 101269 12 A11 2 100719 14 A11 4 86138 15 A11 5 105044 #You can do something like the following. counts <- table(z$cod) complete <- names(counts[as.integer(counts) == 5]) # It is probably better to keep the dummy variable inside the dataframe. z$complete <- ifelse(z$cod %in% complete, TRUE, FALSE)> head(z, 20)cod year revenue complete 1 A1 1 112192 FALSE 2 A1 2 105840 FALSE 4 A1 4 112357 FALSE 5 A1 5 91772 FALSE 7 A10 2 102601 FALSE 8 A10 3 105183 FALSE 11 A11 1 101269 FALSE 12 A11 2 100719 FALSE 14 A11 4 86138 FALSE 15 A11 5 105044 FALSE 20 A12 5 95872 FALSE 21 A13 1 78513 TRUE 22 A13 2 90502 TRUE 23 A13 3 108683 TRUE 24 A13 4 110711 TRUE 25 A13 5 87842 TRUE 28 A14 3 99939 FALSE 30 A14 5 111289 FALSE 31 A15 1 100930 FALSE 32 A15 2 93765 FALSE>Do not use HTML. Use plain text. The character string "//" is not a comment indicator in R. Do not use attach(). It does not do anything in your example, but it is poor practice. Always write out TRUE and FALSE R. Mark Sharp, Ph.D. msharp at TxBiomed.org> On Jun 24, 2015, at 1:26 PM, giacomo begnis <gmbegnis at yahoo.it> wrote: > > Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset > > d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 > When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.