tom soyer
2007-Nov-28 18:27 UTC
[R] How to create data frame from data with unequal length
Hi, I have two sets of data that I would like to put into a data frame. But since they have different length, I am not sure how to do this. Here is an example of my data: data set one: date growth 1/1/2007 10 1/2/2007 10.2 1/3/2007 10.4 1/4/2007 10.6 data set two: date growth 1/1/2007 22 1/2/2007 22.5 1/4/2007 22.4 I would like to combine the two data sets and create a data frame like this: date growthA growthB 1/1/2007 10 22 1/2/2007 10.2 22.5 1/3/2007 10.4 NA 1/4/2007 10.6 22.4 Or skipping the missing data point all together, like this: date growthA growthB 1/1/2007 10 22 1/2/2007 10.2 22.5 1/4/2007 10.6 22.4 Right now I am doing this by hand, and it is really time consuming. I am wondering if there is an easier way of creating data frames from unequal length data using existing R functions. Is there a way to create data with equal length based on the date column? I would appreciate any help from the group. Thanks, -- Tom [[alternative HTML version deleted]]
Matthew Keller
2007-Nov-28 18:33 UTC
[R] How to create data frame from data with unequal length
Tom, Check out ?merge. Does exactly what you need Matt On Nov 28, 2007 11:27 AM, tom soyer <tom.soyer at gmail.com> wrote:> Hi, > > I have two sets of data that I would like to put into a data frame. But > since they have different length, I am not sure how to do this. Here is an > example of my data: > > data set one: > date growth > 1/1/2007 10 > 1/2/2007 10.2 > 1/3/2007 10.4 > 1/4/2007 10.6 > > data set two: > date growth > 1/1/2007 22 > 1/2/2007 22.5 > 1/4/2007 22.4 > > I would like to combine the two data sets and create a data frame like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/3/2007 10.4 NA > 1/4/2007 10.6 22.4 > > Or skipping the missing data point all together, like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/4/2007 10.6 22.4 > > Right now I am doing this by hand, and it is really time consuming. I am > wondering if there is an easier way of creating data frames from unequal > length data using existing R functions. Is there a way to create data > with equal length based on the date column? I would appreciate any help from > the group. > > Thanks, > > -- > Tom > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com
Peter Dalgaard
2007-Nov-28 18:38 UTC
[R] How to create data frame from data with unequal length
tom soyer wrote:> Hi, > > I have two sets of data that I would like to put into a data frame. But > since they have different length, I am not sure how to do this. Here is an > example of my data: > > data set one: > date growth > 1/1/2007 10 > 1/2/2007 10.2 > 1/3/2007 10.4 > 1/4/2007 10.6 > > data set two: > date growth > 1/1/2007 22 > 1/2/2007 22.5 > 1/4/2007 22.4 > > I would like to combine the two data sets and create a data frame like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/3/2007 10.4 NA > 1/4/2007 10.6 22.4 > > Or skipping the missing data point all together, like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/4/2007 10.6 22.4 > > Right now I am doing this by hand, and it is really time consuming. I am > wondering if there is an easier way of creating data frames from unequal > length data using existing R functions. Is there a way to create data > with equal length based on the date column? I would appreciate any help from > the group. > > Thanks, > >I'd have a look at merge() if I were you. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Henrique Dallazuanna
2007-Nov-28 18:40 UTC
[R] How to create data frame from data with unequal length
Try this: merge(df1, df2, by.y=1, by.x=1, all=T) merge(df1, df2, by.y=1, by.x=1) On 28/11/2007, tom soyer <tom.soyer at gmail.com> wrote:> Hi, > > I have two sets of data that I would like to put into a data frame. But > since they have different length, I am not sure how to do this. Here is an > example of my data: > > data set one: > date growth > 1/1/2007 10 > 1/2/2007 10.2 > 1/3/2007 10.4 > 1/4/2007 10.6 > > data set two: > date growth > 1/1/2007 22 > 1/2/2007 22.5 > 1/4/2007 22.4 > > I would like to combine the two data sets and create a data frame like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/3/2007 10.4 NA > 1/4/2007 10.6 22.4 > > Or skipping the missing data point all together, like this: > date growthA growthB > 1/1/2007 10 22 > 1/2/2007 10.2 22.5 > 1/4/2007 10.6 22.4 > > Right now I am doing this by hand, and it is really time consuming. I am > wondering if there is an easier way of creating data frames from unequal > length data using existing R functions. Is there a way to create data > with equal length based on the date column? I would appreciate any help from > the group. > > Thanks, > > -- > Tom > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Malte Brockmann
2007-Nov-28 20:45 UTC
[R] Problem using Tobit models in R (Testing and controlling for distributional assumptions and endogeneity)
Dear R-Community, I am currently using Tobit models (survreg in the survival package). 1a) Does R provide a straight-forward way to test distributional assumptions for tobit models? 1b) If not: I tried to apply the Hausman-test proposed in Newey (1987), Journal of Econometrics, on the Tobit estimator and the symmetrically censored least squares estimator proposed by Powell (1986) (quantreg package). Unfortunately, quantreg only provides covariance matrices based on the bootstrap which are not positive semi-definite, therefore the hausman test statistic based on the difference between both covariance matrices can be negative. Newey proposes 2 ways to calculate positive semi-definite covariance matrices: Is there a way to implement any of these without manually coding (or adapting) the tobit and SCLS estimation procedures to extract the necessary information needed for the estimation (first derivative of loglik w.r.t. theta, etc.)? 2) I apply the test for endogeneity proposed by Smith and Blundell (1986), Econometrica, and one of my variables turns out to be endogenous. Does R have a package for simultaneous equations with censored dependent variables? As far as I know, the sem package does estimate these types of equations. Thanks in advance Malte
roger koenker
2007-Nov-28 21:14 UTC
[R] Problem using Tobit models in R (Testing and controlling for distributional assumptions and endogeneity)
url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Nov 28, 2007, at 2:45 PM, Malte Brockmann wrote:> > Dear R-Community, > > I am currently using Tobit models (survreg in the survival package). > > 1a) Does R provide a straight-forward way to test distributional > assumptions for tobit models? > 1b) If not: I tried to apply the Hausman-test proposed in Newey > (1987), Journal of Econometrics, on the Tobit estimator and the > symmetrically censored least squares estimator proposed by Powell > (1986) (quantreg package).This "symmetrically censored least squares estimator" is NOT what is computed by the quantreg package. What is computed is the Powell quantile regression estimator.> Unfortunately, quantreg only provides covariance matrices based on > the bootstrap which are not positive semi-definite,The bootstrapped covariance provided by quantreg is the usual sample covariance matrix of the bootstrapped realizations and is therefore necessarily positive semi-definite. Perhaps what you meant to say was that the difference between the two covariance matrices that you have computed was not psd; this could easily happen. Nothing ensures that the Powell QR estimate is less efficient than the usual (normal theory) tobit estimator, indeed there are very plausible conditions under which this is not the case.> therefore the hausman test statistic based on the difference > between both covariance matrices can be negative. Newey proposes 2 > ways to calculate positive semi-definite covariance matrices: Is > there a way to implement any of these without manually coding (or > adapting) the tobit and SCLS estimation procedures to extract the > necessary information needed for the estimation (first derivative > of loglik w.r.t. theta, etc.)? > > 2) I apply the test for endogeneity proposed by Smith and Blundell > (1986), Econometrica, and one of my variables turns out to be > endogenous. Does R have a package for simultaneous equations with > censored dependent variables? As far as I know, the sem package > does estimate these types of equations. > > Thanks in advance > Malte > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Malte Brockmann
2007-Nov-29 08:02 UTC
[R] Problem using Tobit models in R (Testing and controlling for distributional assumptions and endogeneity)
Roger, thanks for your reply and especially for pointing out that quantreg does not calculate the SCLS estimator as I thought. You are also certainly right about the covariance matrices, I meant the difference to be psd. Nevertheless, my main questions remain open: How to I test distributional assumptions and endogeneity for Tobit models? In case I cannot reject endogeneity, how do I model structural equations with censored dependent variables? -----Urspr?ngliche Nachricht----- Von: roger koenker [mailto:rkoenker at uiuc.edu] Gesendet: Mittwoch, 28. November 2007 22:14 An: Malte Brockmann Cc: r-help at r-project.org Betreff: Re: [R] Problem using Tobit models in R (Testing and controlling for distributional assumptions and endogeneity) url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Nov 28, 2007, at 2:45 PM, Malte Brockmann wrote:> > Dear R-Community, > > I am currently using Tobit models (survreg in the survival package). > > 1a) Does R provide a straight-forward way to test distributional > assumptions for tobit models? > 1b) If not: I tried to apply the Hausman-test proposed in Newey > (1987), Journal of Econometrics, on the Tobit estimator and the > symmetrically censored least squares estimator proposed by Powell > (1986) (quantreg package).This "symmetrically censored least squares estimator" is NOT what is computed by the quantreg package. What is computed is the Powell quantile regression estimator.> Unfortunately, quantreg only provides covariance matrices based on > the bootstrap which are not positive semi-definite,The bootstrapped covariance provided by quantreg is the usual sample covariance matrix of the bootstrapped realizations and is therefore necessarily positive semi-definite. Perhaps what you meant to say was that the difference between the two covariance matrices that you have computed was not psd; this could easily happen. Nothing ensures that the Powell QR estimate is less efficient than the usual (normal theory) tobit estimator, indeed there are very plausible conditions under which this is not the case.> therefore the hausman test statistic based on the difference > between both covariance matrices can be negative. Newey proposes 2 > ways to calculate positive semi-definite covariance matrices: Is > there a way to implement any of these without manually coding (or > adapting) the tobit and SCLS estimation procedures to extract the > necessary information needed for the estimation (first derivative > of loglik w.r.t. theta, etc.)? > > 2) I apply the test for endogeneity proposed by Smith and Blundell > (1986), Econometrica, and one of my variables turns out to be > endogenous. Does R have a package for simultaneous equations with > censored dependent variables? As far as I know, the sem package > does estimate these types of equations. > > Thanks in advance > Malte > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.