I reread the data, and use 'na.rm = T' when reading the data. This time it has no such problem. It seems that the existence of NAs convert the integer to factor. Thanks for your help. On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at gmail.com> wrote:> Add the "stringsAsFactors = F" when you read the data, and then > convert them to numeric. > > On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com> wrote: > > Yes, it is stored as factor. I can't check out any problem in the > original > > data. Reread data doesn't help either. I use read.csv to read in the > data, > > do you think it is better to use read.table? Thanks again. > > > > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at gmail.com> wrote: > > > >> This indicates that your Discharge column has been stored/converted as > >> a factor (run str(df) to verify and check other columns). This > >> usually happens when functions like read.table are left to try to > >> figure out what each column is and it finds something in that column > >> that cannot be converted to a number (possibly an oh instead of a > >> zero, an el instead of a one, or just a letter or punctuation mark > >> accidentally in the file). You can either find the error in your > >> original data, fix it, and reread the data, or specify that the column > >> should be numeric using the colClasses argument to read.table or other > >> function. > >> > >> > >> > >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at gmail.com> wrote: > >> > Hi R users, > >> > > >> > I have a problem in reading data. > >> > For example, part of my dataframe is like this: > >> > > >> > df > >> > month day year Discharge > >> > 3 1 2010 6.4 > >> > 3 2 2010 7.58 > >> > 3 3 2010 6.82 > >> > 3 4 2010 8.63 > >> > 3 5 2010 8.16 > >> > 3 6 2010 7.58 > >> > > >> > Then if I type summary(df), why it converts the discharge data to > >> levels? I > >> > also met the same problem when reading some other csv files. How to > solve > >> > this problem? Thanks. > >> > > >> > Discharge > >> > 7.58 :2 > >> > 6.4 :1 > >> > 6.82 :1 > >> > 8.63 :1 > >> > 8.16 :1 > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/ > >> posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> 538280 at gmail.com > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jianling Fan > ??? >[[alternative HTML version deleted]]
I suppose you can do what works for your data, but I wouldn't recommend na.rm=TRUE because it hides problems rather than clarifying them. If in fact your data includes true NA values (the letters NA or simply nothing between the commas are typical ways this information may be indicated), then read.csv will NOT change from integer to factor (particularly if you have specified which markers represent NA using the na.strings argument documented under read.table)... so you probably DO have unexpected garbage still in your data which could be obscuring valuable information that could affect your conclusions. -- Sent from my phone. Please excuse my brevity. On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at gmail.com> wrote:>I reread the data, and use 'na.rm = T' when reading the data. This time >it >has no such problem. It seems that the existence of NAs convert the >integer >to factor. Thanks for your help. > > >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at gmail.com> >wrote: > >> Add the "stringsAsFactors = F" when you read the data, and then >> convert them to numeric. >> >> On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com> wrote: >> > Yes, it is stored as factor. I can't check out any problem in the >> original >> > data. Reread data doesn't help either. I use read.csv to read in >the >> data, >> > do you think it is better to use read.table? Thanks again. >> > >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at gmail.com> >wrote: >> > >> >> This indicates that your Discharge column has been >stored/converted as >> >> a factor (run str(df) to verify and check other columns). This >> >> usually happens when functions like read.table are left to try to >> >> figure out what each column is and it finds something in that >column >> >> that cannot be converted to a number (possibly an oh instead of a >> >> zero, an el instead of a one, or just a letter or punctuation mark >> >> accidentally in the file). You can either find the error in your >> >> original data, fix it, and reread the data, or specify that the >column >> >> should be numeric using the colClasses argument to read.table or >other >> >> function. >> >> >> >> >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at gmail.com> >wrote: >> >> > Hi R users, >> >> > >> >> > I have a problem in reading data. >> >> > For example, part of my dataframe is like this: >> >> > >> >> > df >> >> > month day year Discharge >> >> > 3 1 2010 6.4 >> >> > 3 2 2010 7.58 >> >> > 3 3 2010 6.82 >> >> > 3 4 2010 8.63 >> >> > 3 5 2010 8.16 >> >> > 3 6 2010 7.58 >> >> > >> >> > Then if I type summary(df), why it converts the discharge data >to >> >> levels? I >> >> > also met the same problem when reading some other csv files. How >to >> solve >> >> > this problem? Thanks. >> >> > >> >> > Discharge >> >> > 7.58 :2 >> >> > 6.4 :1 >> >> > 6.82 :1 >> >> > 8.63 :1 >> >> > 8.16 :1 >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible >code. >> >> >> >> >> >> >> >> -- >> >> Gregory (Greg) L. Snow Ph.D. >> >> 538280 at gmail.com >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Jianling Fan >> ??? >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Thanks. Then what should I do to solve the problem? On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I suppose you can do what works for your data, but I wouldn't recommend > na.rm=TRUE because it hides problems rather than clarifying them. > > If in fact your data includes true NA values (the letters NA or simply > nothing between the commas are typical ways this information may be > indicated), then read.csv will NOT change from integer to factor > (particularly if you have specified which markers represent NA using the > na.strings argument documented under read.table)... so you probably DO have > unexpected garbage still in your data which could be obscuring valuable > information that could affect your conclusions. > -- > Sent from my phone. Please excuse my brevity. > > On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at gmail.com> wrote: > >I reread the data, and use 'na.rm = T' when reading the data. This time > >it > >has no such problem. It seems that the existence of NAs convert the > >integer > >to factor. Thanks for your help. > > > > > >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at gmail.com> > >wrote: > > > >> Add the "stringsAsFactors = F" when you read the data, and then > >> convert them to numeric. > >> > >> On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com> wrote: > >> > Yes, it is stored as factor. I can't check out any problem in the > >> original > >> > data. Reread data doesn't help either. I use read.csv to read in > >the > >> data, > >> > do you think it is better to use read.table? Thanks again. > >> > > >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at gmail.com> > >wrote: > >> > > >> >> This indicates that your Discharge column has been > >stored/converted as > >> >> a factor (run str(df) to verify and check other columns). This > >> >> usually happens when functions like read.table are left to try to > >> >> figure out what each column is and it finds something in that > >column > >> >> that cannot be converted to a number (possibly an oh instead of a > >> >> zero, an el instead of a one, or just a letter or punctuation mark > >> >> accidentally in the file). You can either find the error in your > >> >> original data, fix it, and reread the data, or specify that the > >column > >> >> should be numeric using the colClasses argument to read.table or > >other > >> >> function. > >> >> > >> >> > >> >> > >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at gmail.com> > >wrote: > >> >> > Hi R users, > >> >> > > >> >> > I have a problem in reading data. > >> >> > For example, part of my dataframe is like this: > >> >> > > >> >> > df > >> >> > month day year Discharge > >> >> > 3 1 2010 6.4 > >> >> > 3 2 2010 7.58 > >> >> > 3 3 2010 6.82 > >> >> > 3 4 2010 8.63 > >> >> > 3 5 2010 8.16 > >> >> > 3 6 2010 7.58 > >> >> > > >> >> > Then if I type summary(df), why it converts the discharge data > >to > >> >> levels? I > >> >> > also met the same problem when reading some other csv files. How > >to > >> solve > >> >> > this problem? Thanks. > >> >> > > >> >> > Discharge > >> >> > 7.58 :2 > >> >> > 6.4 :1 > >> >> > 6.82 :1 > >> >> > 8.63 :1 > >> >> > 8.16 :1 > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > ______________________________________________ > >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, > >see > >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> > PLEASE do read the posting guide http://www.R-project.org/ > >> >> posting-guide.html > >> >> > and provide commented, minimal, self-contained, reproducible > >code. > >> >> > >> >> > >> >> > >> >> -- > >> >> Gregory (Greg) L. Snow Ph.D. > >> >> 538280 at gmail.com > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/ > >> posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Jianling Fan > >> ??? > >> > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]