thr3ads.net - R help - [R] about data problem [Sep 2016]

If this information is useful, please help other people find it:
Share via:

lily li

2016-Sep-20 22:42 UTC

[R] about data problem

Thanks. Then what should I do to solve the problem?

On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
wrote:
> I suppose you can do what works for your data, but I wouldn't recommend
> na.rm=TRUE because it hides problems rather than clarifying them.
>
> If in fact your data includes true NA values (the letters NA or simply
> nothing between the commas are typical ways this information may be
> indicated), then read.csv will NOT change from integer to factor
> (particularly if you have specified which markers represent NA using the
> na.strings argument documented under read.table)... so you probably DO have
> unexpected garbage still in your data which could be obscuring valuable
> information that could affect your conclusions.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at
gmail.com> wrote:
> >I reread the data, and use 'na.rm = T' when reading the data.
This time
> >it
> >has no such problem. It seems that the existence of NAs convert the
> >integer
> >to factor. Thanks for your help.
> >
> >
> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at
gmail.com>
> >wrote:
> >
> >> Add the "stringsAsFactors = F"  when you read the data,
and then
> >> convert them to numeric.
> >>
> >> On 20 September 2016 at 16:00, lily li <chocold12 at
gmail.com> wrote:
> >> > Yes, it is stored as factor. I can't check out any
problem in the
> >> original
> >> > data. Reread data doesn't help either. I use read.csv to
read in
> >the
> >> data,
> >> > do you think it is better to use read.table? Thanks again.
> >> >
> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at
gmail.com>
> >wrote:
> >> >
> >> >> This indicates that your Discharge column has been
> >stored/converted as
> >> >> a factor (run str(df) to verify and check other columns).
This
> >> >> usually happens when functions like read.table are left
to try to
> >> >> figure out what each column is and it finds something in
that
> >column
> >> >> that cannot be converted to a number (possibly an oh
instead of a
> >> >> zero, an el instead of a one, or just a letter or
punctuation mark
> >> >> accidentally in the file).  You can either find the error
in your
> >> >> original data, fix it, and reread the data, or specify
that the
> >column
> >> >> should be numeric using the colClasses argument to
read.table or
> >other
> >> >> function.
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at
gmail.com>
> >wrote:
> >> >> > Hi R users,
> >> >> >
> >> >> > I have a problem in reading data.
> >> >> > For example, part of my dataframe is like this:
> >> >> >
> >> >> > df
> >> >> > month day year          Discharge
> >> >> >    3        1   2010                6.4
> >> >> >    3        2   2010               7.58
> >> >> >    3        3   2010               6.82
> >> >> >    3        4   2010               8.63
> >> >> >    3        5   2010               8.16
> >> >> >    3        6   2010               7.58
> >> >> >
> >> >> > Then if I type summary(df), why it converts the
discharge data
> >to
> >> >> levels? I
> >> >> > also met the same problem when reading some other
csv files. How
> >to
> >> solve
> >> >> > this problem? Thanks.
> >> >> >
> >> >> > Discharge
> >> >> > 7.58     :2
> >> >> > 6.4       :1
> >> >> > 6.82     :1
> >> >> > 8.63     :1
> >> >> > 8.16     :1
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
> >see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide
http://www.R-project.org/
> >> >> posting-guide.html
> >> >> > and provide commented, minimal, self-contained,
reproducible
> >code.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> 538280 at gmail.com
> >> >>
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible
code.
> >>
> >>
> >>
> >> --
> >> Jianling Fan
> >> ???
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

lily li

2016-Sep-20 22:56 UTC

head link

[R] about data problem

Is there a function in read.csv that I can use to avoid converting numeric
to factor? Thanks a lot.



On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com> wrote:
> Thanks. Then what should I do to solve the problem?
>
> On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
> wrote:
>
>> I suppose you can do what works for your data, but I wouldn't
recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using
the
>> na.strings argument documented under read.table)... so you probably DO
have
>> unexpected garbage still in your data which could be obscuring valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at
gmail.com>
>> wrote:
>> >I reread the data, and use 'na.rm = T' when reading the
data. This time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at
gmail.com>
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the
data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li <chocold12 at
gmail.com> wrote:
>> >> > Yes, it is stored as factor. I can't check out any
problem in the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv
to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks
again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at
gmail.com>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other
columns).  This
>> >> >> usually happens when functions like read.table are
left to try to
>> >> >> figure out what each column is and it finds something
in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh
instead of a
>> >> >> zero, an el instead of a one, or just a letter or
punctuation mark
>> >> >> accidentally in the file).  You can either find the
error in your
>> >> >> original data, fix it, and reread the data, or
specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to
read.table or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
<chocold12 at gmail.com>
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year          Discharge
>> >> >> >    3        1   2010                6.4
>> >> >> >    3        2   2010               7.58
>> >> >> >    3        3   2010               6.82
>> >> >> >    3        4   2010               8.63
>> >> >> >    3        5   2010               8.16
>> >> >> >    3        6   2010               7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the
discharge data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some
other csv files. How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58     :2
>> >> >> > 6.4       :1
>> >> >> > 6.82     :1
>> >> >> > 8.63     :1
>> >> >> > 8.16     :1
>> >> >> >
>> >> >> >         [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained,
reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> 538280 at gmail.com
>> >> >>
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained,
reproducible code.
>> >>
>> >>
>> >>
>> >> --
>> >> Jianling Fan
>> >> ???
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
	[[alternative HTML version deleted]]

Joe Ceradini

2016-Sep-20 23:06 UTC

head link

[R] about data problem

read.csv("your_data.csv", stringsAsFactors=FALSE)
(I'm just reiterating Jianling said...)

Joe

On Tue, Sep 20, 2016 at 4:56 PM, lily li <chocold12 at gmail.com> wrote:
> Is there a function in read.csv that I can use to avoid converting numeric
> to factor? Thanks a lot.
>
>
>
> On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com>
wrote:
>
> > Thanks. Then what should I do to solve the problem?
> >
> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> jdnewmil at dcn.davis.ca.us>
> > wrote:
> >
> >> I suppose you can do what works for your data, but I wouldn't
recommend
> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>
> >> If in fact your data includes true NA values (the letters NA or
simply
> >> nothing between the commas are typical ways this information may
be
> >> indicated), then read.csv will NOT change from integer to factor
> >> (particularly if you have specified which markers represent NA
using the
> >> na.strings argument documented under read.table)... so you
probably DO
> have
> >> unexpected garbage still in your data which could be obscuring
valuable
> >> information that could affect your conclusions.
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at
gmail.com>
> >> wrote:
> >> >I reread the data, and use 'na.rm = T' when reading
the data. This time
> >> >it
> >> >has no such problem. It seems that the existence of NAs
convert the
> >> >integer
> >> >to factor. Thanks for your help.
> >> >
> >> >
> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling
at gmail.com>
> >> >wrote:
> >> >
> >> >> Add the "stringsAsFactors = F"  when you read
the data, and then
> >> >> convert them to numeric.
> >> >>
> >> >> On 20 September 2016 at 16:00, lily li <chocold12 at
gmail.com> wrote:
> >> >> > Yes, it is stored as factor. I can't check out
any problem in the
> >> >> original
> >> >> > data. Reread data doesn't help either. I use
read.csv to read in
> >> >the
> >> >> data,
> >> >> > do you think it is better to use read.table? Thanks
again.
> >> >> >
> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
<538280 at gmail.com>
> >> >wrote:
> >> >> >
> >> >> >> This indicates that your Discharge column has
been
> >> >stored/converted as
> >> >> >> a factor (run str(df) to verify and check other
columns).  This
> >> >> >> usually happens when functions like read.table
are left to try to
> >> >> >> figure out what each column is and it finds
something in that
> >> >column
> >> >> >> that cannot be converted to a number (possibly
an oh instead of a
> >> >> >> zero, an el instead of a one, or just a letter
or punctuation mark
> >> >> >> accidentally in the file).  You can either find
the error in your
> >> >> >> original data, fix it, and reread the data, or
specify that the
> >> >column
> >> >> >> should be numeric using the colClasses argument
to read.table or
> >> >other
> >> >> >> function.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
<chocold12 at gmail.com>
> >> >wrote:
> >> >> >> > Hi R users,
> >> >> >> >
> >> >> >> > I have a problem in reading data.
> >> >> >> > For example, part of my dataframe is like
this:
> >> >> >> >
> >> >> >> > df
> >> >> >> > month day year          Discharge
> >> >> >> >    3        1   2010                6.4
> >> >> >> >    3        2   2010               7.58
> >> >> >> >    3        3   2010               6.82
> >> >> >> >    3        4   2010               8.63
> >> >> >> >    3        5   2010               8.16
> >> >> >> >    3        6   2010               7.58
> >> >> >> >
> >> >> >> > Then if I type summary(df), why it converts
the discharge data
> >> >to
> >> >> >> levels? I
> >> >> >> > also met the same problem when reading some
other csv files. How
> >> >to
> >> >> solve
> >> >> >> > this problem? Thanks.
> >> >> >> >
> >> >> >> > Discharge
> >> >> >> > 7.58     :2
> >> >> >> > 6.4       :1
> >> >> >> > 6.82     :1
> >> >> >> > 8.63     :1
> >> >> >> > 8.16     :1
> >> >> >> >
> >> >> >> >         [[alternative HTML version
deleted]]
> >> >> >> >
> >> >> >> >
______________________________________________
> >> >> >> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
> >> >see
> >> >> >> >
https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> > PLEASE do read the posting guide
http://www.R-project.org/
> >> >> >> posting-guide.html
> >> >> >> > and provide commented, minimal,
self-contained, reproducible
> >> >code.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> >> 538280 at gmail.com
> >> >> >>
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide
http://www.R-project.org/
> >> >> posting-guide.html
> >> >> > and provide commented, minimal, self-contained,
reproducible code.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jianling Fan
> >> >> ???
> >> >>
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> >______________________________________________
> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible
code.
> >>
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
JoeCeradini at gmail.com / 914.707.8506
wyocoopunit.org

	[[alternative HTML version deleted]]

Jeff Newmiller

2016-Sep-20 23:08 UTC

head link

[R] about data problem

Find the offending data. One approach is to look at the input data with your
image sensors and neural pattern processor (eyes and brain). One way to reduce
the load on those told is to read in the data with the stringsAsFactors=TRUE
argument and try manually converting the resulting character strings into
numeric values. You can then use the is.na function to find which rows failed to
convert and use indexing to review the strings that had trouble.

# I recommend against using df as a variable name, since it is the name of a
function in base R
dta$DischargeNum <- as.numeric( dta$Discharge )
dta[ is.na( dta$DischargeNum ), "Discharge" ]
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 3:42:39 PM PDT, lily li <chocold12 at gmail.com>
wrote:>Thanks. Then what should I do to solve the problem?
>
>On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us>
>wrote:
>
>> I suppose you can do what works for your data, but I wouldn't
>recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or
>simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using
>the
>> na.strings argument documented under read.table)... so you probably
>DO have
>> unexpected garbage still in your data which could be obscuring
>valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at
gmail.com>
>wrote:
>> >I reread the data, and use 'na.rm = T' when reading the
data. This
>time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
><fanjianling at gmail.com>
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the
data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li <chocold12 at
gmail.com>
>wrote:
>> >> > Yes, it is stored as factor. I can't check out any
problem in
>the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv
to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks
again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at
gmail.com>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other
columns).  This
>> >> >> usually happens when functions like read.table are
left to try
>to
>> >> >> figure out what each column is and it finds something
in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh
instead of
>a
>> >> >> zero, an el instead of a one, or just a letter or
punctuation
>mark
>> >> >> accidentally in the file).  You can either find the
error in
>your
>> >> >> original data, fix it, and reread the data, or
specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to
read.table
>or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
<chocold12 at gmail.com>
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year          Discharge
>> >> >> >    3        1   2010                6.4
>> >> >> >    3        2   2010               7.58
>> >> >> >    3        3   2010               6.82
>> >> >> >    3        4   2010               8.63
>> >> >> >    3        5   2010               8.16
>> >> >> >    3        6   2010               7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the
discharge
>data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some
other csv files.
>How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58     :2
>> >> >> > 6.4       :1
>> >> >> > 6.82     :1
>> >> >> > 8.63     :1
>> >> >> > 8.16     :1
>> >> >> >
>> >> >> >         [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained,
reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> 538280 at gmail.com
>> >> >>
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more,
>see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained,
reproducible
>code.
>> >>
>> >>
>> >>
>> >> --
>> >> Jianling Fan
>> >> ???
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>>

R help - Sep 2016 - about data problem

[R] about data problem

[R] about data problem

[R] about data problem

[R] about data problem