giacomo begnis
2015-Jun-24 18:26 UTC
[R] create a dummy variables for companies with complete history.
Hi, I have a dataset ?(728 obs) containing three variables code of a company,
year and revenue. Some companies have a complete history of 5 years, others have
not a complete history (for instance observations for three or four years).I
would like to determine the companies with a complete history using a dummy
variables.I have written the following program but there is somehting wrong
because the dummy variable that I have create is always equal to zero.Can
somebody help me?Thanks, gm
z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T)
attach(z)
n<-length(z$cod) ?// number of obs dataset
d1<-numeric(n) ? // dummy variable
for (i in 5:n) ?{
?? if (z$cod[i]==z$cod[i-4]) ? ? ? ? ? ? // cod is the code of a company? ? ? ?
? ? ?{ d1[i]<=1} else { d1[i]<=0} ? ? ? ? ?// d1=1 for a company with
complete history, d1=0 if the history is not complete??}d1
When I run the program d1 is always equal to zero. Why?
Once I have create the dummy variable with subset I obtains the code of the
companies with a complete history and finally with a merge ?I determine a panel
of companies with a complete history.But how to determine correctly d1?My best
regards, gm
[[alternative HTML version deleted]]
Sarah Goslee
2015-Jun-24 18:49 UTC
[R] create a dummy variables for companies with complete history.
Please repost your question in plain text rather than HTML - you can see below that your code got rather mangled. Please also include some sample data using dput() - made-up data of similar form is fine, but it's very hard to answer a question based on guessing what the data look like. Sarah On Wed, Jun 24, 2015 at 2:26 PM, giacomo begnis <gmbegnis at yahoo.it> wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset > > d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 > When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] >-- Sarah Goslee http://www.functionaldiversity.org
Michael Dewey
2015-Jun-24 19:11 UTC
[R] create a dummy variables for companies with complete history.
Comments below On 24/06/2015 19:26, giacomo begnis wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset >Could also use nrow(z)> d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company{ d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type <= which means less than or equals to? If so, try replacing it with <- and see what happens.> When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
David L Carlson
2015-Jun-24 20:36 UTC
[R] create a dummy variables for companies with complete history.
You may want to consider another way of getting your answer that takes advantage of some of R's features:> # Make some example data > cods <- LETTERS[1:10] # Ten companies > yrs <- 2010:2014 # 5 years > set.seed(42) # Set random seed so we all get the same values > # Chances of revenue for a given year are 95% > rev <- round(rbinom(50, 1, .95)*runif(50, 25, 50), 2) > z <- data.frame(expand.grid(year=yrs, cod=cods)[, 2:1], rev) > # Remove years with missing (0) revenue > z <- z[z$rev > 1, ] > str(z)'data.frame': 45 obs. of 3 variables: $ cod : Factor w/ 10 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ... $ year: int 2010 2011 2012 2013 2014 2010 2011 2012 2013 2014 ... $ rev : num 33.3 33.7 35 44.6 26 ...> > # Construct the dummy variable > tbl <- xtabs(~cod+year, z) > tblyear cod 2010 2011 2012 2013 2014 A 1 1 1 1 1 B 1 1 1 1 1 C 1 1 1 1 1 D 1 0 1 1 1 E 1 1 0 1 1 F 1 1 1 1 1 G 1 1 1 1 1 H 1 1 1 1 1 I 1 1 1 0 1 J 0 1 1 0 1> dummy <- as.integer(apply(tbl, 1, all)) > dummy[1] 1 1 1 0 0 1 1 1 0 0 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey Sent: Wednesday, June 24, 2015 2:12 PM To: giacomo begnis; r-help at r-project.org Subject: Re: [R] create a dummy variables for companies with complete history. Comments below On 24/06/2015 19:26, giacomo begnis wrote:> Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset >Could also use nrow(z)> d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company{ d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type <= which means less than or equals to? If so, try replacing it with <- and see what happens.> When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Mark Sharp
2015-Jun-24 21:00 UTC
[R] create a dummy variables for companies with complete history.
Giacomo,
Please include some representative data. It is not clear why your offset of 4
(z$cod[i - 4]) is going to be an accurate surrogate for complete data.
Since I do not have your data set or its true structure I am having to guess.
# make 5 copies of 200 companies
companies <- paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5))
companies <- companies[order(companies)]
years <- rep(1:5, 200)
z <- data.frame(cod = companies, year = years,
revenue = round(rnorm(1000, mean = 100000, sd = 10000)))
# trim this down to the 728 rows you have by pulling out records at random
set.seed(1) # so that you can repeat these results
z <- z[sample.int(1000, 728), ]
z <- z[order(z$cod, z$year), ]
#No matter how you order these data, your offset approach will not tell you
which companies have full records.> head(z, 10)
cod year revenue
1 A1 1 112192
2 A1 2 105840
4 A1 4 112357
5 A1 5 91772
7 A10 2 102601
8 A10 3 105183
11 A11 1 101269
12 A11 2 100719
14 A11 4 86138
15 A11 5 105044
#You can do something like the following.
counts <- table(z$cod)
complete <- names(counts[as.integer(counts) == 5])
# It is probably better to keep the dummy variable inside the dataframe.
z$complete <- ifelse(z$cod %in% complete, TRUE, FALSE)
> head(z, 20)
cod year revenue complete
1 A1 1 112192 FALSE
2 A1 2 105840 FALSE
4 A1 4 112357 FALSE
5 A1 5 91772 FALSE
7 A10 2 102601 FALSE
8 A10 3 105183 FALSE
11 A11 1 101269 FALSE
12 A11 2 100719 FALSE
14 A11 4 86138 FALSE
15 A11 5 105044 FALSE
20 A12 5 95872 FALSE
21 A13 1 78513 TRUE
22 A13 2 90502 TRUE
23 A13 3 108683 TRUE
24 A13 4 110711 TRUE
25 A13 5 87842 TRUE
28 A14 3 99939 FALSE
30 A14 5 111289 FALSE
31 A15 1 100930 FALSE
32 A15 2 93765 FALSE>
Do not use HTML. Use plain text. The character string "//" is not a
comment indicator in R. Do not use attach(). It does not do anything in your
example, but it is poor practice. Always write out TRUE and FALSE
R. Mark Sharp, Ph.D.
msharp at TxBiomed.org
> On Jun 24, 2015, at 1:26 PM, giacomo begnis <gmbegnis at yahoo.it>
wrote:
>
> Hi, I have a dataset (728 obs) containing three variables code of a
company, year and revenue. Some companies have a complete history of 5 years,
others have not a complete history (for instance observations for three or four
years).I would like to determine the companies with a complete history using a
dummy variables.I have written the following program but there is somehting
wrong because the dummy variable that I have create is always equal to zero.Can
somebody help me?Thanks, gm
>
> z<-read.table(file="c:/Rp/cddat.txt", sep="",
header=T)
> attach(z)
> n<-length(z$cod) // number of obs dataset
>
> d1<-numeric(n) // dummy variable
>
> for (i in 5:n) {
> if (z$cod[i]==z$cod[i-4]) // cod is the code of a company
{ d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with complete
history, d1=0 if the history is not complete }d1
> When I run the program d1 is always equal to zero. Why?
> Once I have create the dummy variable with subset I obtains the code of the
companies with a complete history and finally with a merge I determine a panel
of companies with a complete history.But how to determine correctly d1?My best
regards, gm
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.