thr3ads.net - R help - [R] Need fresh eyes to see what I'm missing [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Rich Shepard

2021-Sep-14 15:21 UTC

[R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep =
',', stringsAsFactors = FALSE)
vel$year <- as.integer(vel$year)
vel$month <- as.integer(vel$month)
vel$day <- as.integer(vel$day)
vel$hour <- as.integer(vel$hour)
vel$min <- as.integer(vel$min)
vel$fps <- as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly
# means
vel_by_month = vel %>%
     group_by(year, month) %>%
     summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:> source('vel.R')`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:> head(vel)   year month day hour min  fps
1 2016     3   3   12   0 1.74
2 2016     3   3   12  10 1.75
3 2016     3   3   12  20 1.76
4 2016     3   3   12  30 1.81
5 2016     3   3   12  40 1.79
6 2016     3   3   12  50 1.75

and the resulting grouping:> vel_by_month# A tibble: 67 ? 3
# Groups:   year [8]
     year month   flow
    <int> <int>  <dbl>
  1     0    NA NaN
  2  2016     3   2.40
  3  2016     4   3.00
  4  2016     5   2.86
  5  2016     6   2.51
  6  2016     7   2.18
  7  2016     8   1.89
  8  2016     9   1.38
  9  2016    10   1.73
10  2016    11   2.01
# ? with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this
result.

TIA,

Rich

Eric Berger

2021-Sep-14 15:29 UTC

head link

[R] Need fresh eyes to see what I'm missing

Before you create vel_by_month you can check vel for NAs and NaNs by

sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))

HTH,
Eric


On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard <rshepard at appl-ecosys.com>
wrote:
> The data file begins this way:
> year,month,day,hour,min,fps
> 2016,03,03,12,00,1.74
> 2016,03,03,12,10,1.75
> 2016,03,03,12,20,1.76
> 2016,03,03,12,30,1.81
> 2016,03,03,12,40,1.79
> 2016,03,03,12,50,1.75
> 2016,03,03,13,00,1.78
> 2016,03,03,13,10,1.81
>
> The script to process it:
> library('tidyverse')
> vel <- read.csv('../data/water/vel.dat', header = TRUE, sep =
',',
> stringsAsFactors = FALSE)
> vel$year <- as.integer(vel$year)
> vel$month <- as.integer(vel$month)
> vel$day <- as.integer(vel$day)
> vel$hour <- as.integer(vel$hour)
> vel$min <- as.integer(vel$min)
> vel$fps <- as.double(vel$fps, length = 6)
>
> # use dplyr to filter() by year, month, day; summarize() to get monthly
> # means
> vel_by_month = vel %>%
>      group_by(year, month) %>%
>      summarize(flow = mean(fps, na.rm = TRUE))
>
> R's display after running the script:
> > source('vel.R')
> `summarise()` has grouped output by 'year'. You can override using
the
> `.groups` argument.
> Warning messages:
> 1: In eval(ei, envir) : NAs introduced by coercion
> 2: In eval(ei, envir) : NAs introduced by coercion
> 3: In eval(ei, envir) : NAs introduced by coercion
>
> The dataframe created by the read.csv() command:
> > head(vel)
>    year month day hour min  fps
> 1 2016     3   3   12   0 1.74
> 2 2016     3   3   12  10 1.75
> 3 2016     3   3   12  20 1.76
> 4 2016     3   3   12  30 1.81
> 5 2016     3   3   12  40 1.79
> 6 2016     3   3   12  50 1.75
>
> and the resulting grouping:
> > vel_by_month
> # A tibble: 67 ? 3
> # Groups:   year [8]
>      year month   flow
>     <int> <int>  <dbl>
>   1     0    NA NaN
>   2  2016     3   2.40
>   3  2016     4   3.00
>   4  2016     5   2.86
>   5  2016     6   2.51
>   6  2016     7   2.18
>   7  2016     8   1.89
>   8  2016     9   1.38
>   9  2016    10   1.73
> 10  2016    11   2.01
> # ? with 57 more rows
>
> I cannot find why line 1 is there. Other data sets don't produce this
> result.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Avi Gross

2021-Sep-14 16:16 UTC

head link

[R] Need fresh eyes to see what I'm missing

Rich,

I reproduced your problem on my re-arranging the code the mailer mangled. I
tried variations like not using pipes or changing what it is grouped by and they
all show your results on the abbreviated data with the error:

`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.

I think I fixed summarise()  but it makes me wonder if there is an inconsistency
introduced along the way as what you used is supposed to work and has worked for
me in the past.

I note the man page for summarise() mentions that the .groups="..." is
experimental and a tad confusing:

I changed your code to this by telling it to keep the grouping in the output the
same:

vel_by_month = vel %>%
  group_by(year, month) %>%
  summarise(flow = mean(fps, na.rm = TRUE), .groups="keep")

The change from your code is the addition at the very end of the
.groups="keep" argument.

Since I used your limited data, this is all I get:
> vel_by_month# A tibble: 1 x 3
# Groups:   year, month [1]
year month  flow
<int> <int> <dbl>
  1  2016     3  1.77

For now, all I did was shut summarise() up.

Not having the rest of your data, the question is where your NA and Nan are
introduced. If the change I made above does not resolve it, then as others
suggested, you begin by looking at your data more carefully perhaps starting
with the .CSV file and then the data structures in R, along the lines of what
you were shown. I find the table() function useful for categorical data with
limited choices as it would spit out the anomaly as happening once.

I see your point about needing fresh eyes. My eyes do not see what you did wrong
but am just following clues you may be ignoring.


-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:21 AM
To: r-help at r-project.org
Subject: [R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep =
',', stringsAsFactors = FALSE) vel$year <- as.integer(vel$year)
vel$month <- as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour
<- as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <-
as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly # means
vel_by_month = vel %>%
     group_by(year, month) %>%
     summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:> source('vel.R')`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:> head(vel)   year month day hour min  fps
1 2016     3   3   12   0 1.74
2 2016     3   3   12  10 1.75
3 2016     3   3   12  20 1.76
4 2016     3   3   12  30 1.81
5 2016     3   3   12  40 1.79
6 2016     3   3   12  50 1.75

and the resulting grouping:> vel_by_month# A tibble: 67 ? 3
# Groups:   year [8]
     year month   flow
    <int> <int>  <dbl>
  1     0    NA NaN
  2  2016     3   2.40
  3  2016     4   3.00
  4  2016     5   2.86
  5  2016     6   2.51
  6  2016     7   2.18
  7  2016     8   1.89
  8  2016     9   1.38
  9  2016    10   1.73
10  2016    11   2.01
# ? with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this
result.

TIA,

Rich

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2021 - Need fresh eyes to see what I'm missing

[R] Need fresh eyes to see what I'm missing

[R] Need fresh eyes to see what I'm missing

[R] Need fresh eyes to see what I'm missing