james.vordtriede at att.net
2015-Sep-27 07:58 UTC
[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'
Hi--I?m new to R. For a dissertation, my panel data is for 48 Sub-Saharan countries (cross-sectional index=?i?) over 55 years 1960-2014 (time-series index=?t?). The variables read into R from a text file are levels data. The 2SLS regression due to reverse causality will be based on change in the levels data, so will need to difference the data grouped by cross-sectional index ?i?. There are nearly 50 total variables, but the model essentially will regress the differenced Yit ~ X1it+X2it+X3it+X4it+X5it+X6it, with a dummy variable attached to each of the change-X(s). Due to missing data, R originally classified each X and Y variable as a ?factor?, subsequently changed to ?numeric? via ?as.numeric? command. However, when I write the following command for dplr solely to difference Yit (=Yit-Yi[t-1]) mutated to new variable dYit, I receive error messages to the effect that Yit and each of the X variables are ?factors?.>library (dplr)>dt = CSUdata2 %>% group_by (i) %>% (dYit=Yit-lag(Yit))?CSUdata2? is the object in which the tab-delimited text file dataset is stored. Questions: Any idea why dplyr reads the variables as ?factors?? A class(*) command per variable shows R to know each Y and X as ?numeric?. Is the command to difference Yit done correctly? I plan to use the same command for each variable requiring change until I understand the commands better. Thank you. Sent from Windows Mail [[alternative HTML version deleted]]
Thierry Onkelinx
2015-Sep-27 19:55 UTC
[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'
I doubt that dplyr is the problem. have a look at the output of str(CSUdata2) The problem is probably in there. Sending a reproducible example of the problem makes it easier for us to help you. Note that this list doesn't accept HTML mail. I suggest that you read the posting guide carefully. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-09-27 9:58 GMT+02:00 <james.vordtriede at att.net>:> Hi--I?m new to R. For a dissertation, my panel data is for 48 Sub-Saharan > countries (cross-sectional index=?i?) over 55 years 1960-2014 (time-series > index=?t?). The variables read into R from a text file are levels data. > The 2SLS regression due to reverse causality will be based on change in the > levels data, so will need to difference the data grouped by cross-sectional > index ?i?. > > > There are nearly 50 total variables, but the model essentially will > regress the differenced Yit ~ X1it+X2it+X3it+X4it+X5it+X6it, with a dummy > variable attached to each of the change-X(s). > > > Due to missing data, R originally classified each X and Y variable as a > ?factor?, subsequently changed to ?numeric? via ?as.numeric? command. > > > However, when I write the following command for dplr solely to difference > Yit (=Yit-Yi[t-1]) mutated to new variable dYit, I receive error messages > to the effect that Yit and each of the X variables are ?factors?. > > > > > >library (dplr) > > >dt = CSUdata2 %>% group_by (i) %>% (dYit=Yit-lag(Yit)) > > > > ?CSUdata2? is the object in which the tab-delimited text file dataset is > stored. > > > Questions: > > > Any idea why dplyr reads the variables as ?factors?? A class(*) command > per variable shows R to know each Y and X as ?numeric?. > > > Is the command to difference Yit done correctly? I plan to use the same > command for each variable requiring change until I understand the commands > better. > > > > Thank you. > > > > > > > > > > Sent from Windows Mail > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Bert Gunter
2015-Sep-27 20:12 UTC
[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'
I believe you need to spend some time with an R tutorial, as I don't believe what you understand what factors are and how they should be used."Dummy variables" are also almost certainly unnecessary and usually undesirable, as well. A few comments below may help.. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sun, Sep 27, 2015 at 12:58 AM, <james.vordtriede at att.net> wrote:> Hi--I?m new to R. For a dissertation, my panel data is for 48 Sub-Saharan countries (cross-sectional index=?i?) over 55 years 1960-2014 (time-series index=?t?). The variables read into R from a text file are levels data. The 2SLS regression due to reverse causality will be based on change in the levels data, so will need to difference the data grouped by cross-sectional index ?i?. > > > There are nearly 50 total variables, but the model essentially will regress the differenced Yit ~ X1it+X2it+X3it+X4it+X5it+X6it, with a dummy variable attached to each of the change-X(s). > > > Due to missing data, R originally classified each X and Y variable as a ?factor?, subsequently changed to ?numeric? via ?as.numeric? command.No. a) missing data will not cause numeric data to become factor. There's something wrong in the data from the beginning (as Thierry said) b) If f is numeric data that is a factor, as.numeric(f) is almost certainly **not** the corrrect way to change it to numeric. You will get garbage, viz.:> f <- runif(5) > f[1] 0.42568762 0.03105132 0.46606135 0.35251240 0.57303571> as.numeric(factor(f))[1] 3 1 4 2 5> > > However, when I write the following command for dplr solely to difference Yit (=Yit-Yi[t-1]) mutated to new variable dYit, I receive error messages to the effect that Yit and each of the X variables are ?factors?. > > > > >>library (dplr) > >>dt = CSUdata2 %>% group_by (i) %>% (dYit=Yit-lag(Yit)) > > > > ?CSUdata2? is the object in which the tab-delimited text file dataset is stored. > > > Questions: > > > Any idea why dplyr reads the variables as ?factors?? A class(*) command per variable shows R to know each Y and X as ?numeric?. > > > Is the command to difference Yit done correctly? I plan to use the same command for each variable requiring change until I understand the commands better.Almost certainly not. See ?diff> > > > Thank you. > > > > > > > > > > Sent from Windows Mail > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
peter dalgaard
2015-Sep-27 20:29 UTC
[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'
> On 27 Sep 2015, at 22:12 , Bert Gunter <bgunter.4567 at gmail.com> wrote: > >> >> Due to missing data, R originally classified each X and Y variable as a ?factor?, subsequently changed to ?numeric? via ?as.numeric? command. > > No. > a) missing data will not cause numeric data to become factor. There's > something wrong in the data from the beginning (as Thierry said)Well, if you forget to tell R what the input code for missing is (na.strings if you use read.table), then that is de facto what happens: The whole column gets interpreted as character and subsequently converted to a factor. The fix is to _remember_ to tell R what missing value codes are being used.> > b) If f is numeric data that is a factor, as.numeric(f) is almost > certainly **not** the corrrect way to change it to numeric.Amen... as.numeric(as.character(f)) if you must, but the proper fix is usually the above. -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com