I ran into a puzzling minor behaviour I would like to understand.
Reading in a csv file, I find an extraneous "." after a column header,
"in" [short for "inches"] thus, "in.". Is this due
to "in" being
reserved? I initially blamed this on RStudio or to processing the data
through LibreCalc. However, the same result occurs in a console R
session. Sending the file to the console via less reveals no strange
characters in the first line. The data is California statewide
rainfall which was screen captured from the Western Regional Climate
Center web site.
First 15 lines including header line:
"yr","mo","Data","in"
1895,1,8243,8.243
1895,2,2265,2.265
1895,3,2340,2.34
1895,4,1014,1.014
1895,5,1281,1.281
1895,6,58,0.058
1895,7,156,0.156
1895,8,140,0.14
1895,9,1087,1.087
1895,10,322,0.322
1895,11,1331,1.331
1895,12,2428,2.428
1896,1,7156,7.156
1896,2,712,0.712
1896,3,2982,2.982
File read in as follows:
x <- read.csv('DRI-mo-prp.csv', header = T)
Structure:
str(x)
'data.frame': 1469 obs. of 4 variables:
$ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
$ mo : int 1 2 3 4 5 6 7 8 9 10 ...
$ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ...
$ in. : num 8.24 2.27 2.34 1.01 1.28 ...
[note "in" is now "in."]
try the 'read_csv' function in the 'readr' package:> x <- readr::read_csv('"yr","mo","Data","in"+ 1895,1,8243,8.243 + 1895,2,2265,2.265 + 1895,3,2340,2.34 + 1895,4,1014,1.014 + 1895,5,1281,1.281 + 1895,6,58,0.058 + 1895,7,156,0.156 + 1895,8,140,0.14 + 1895,9,1087,1.087 + 1895,10,322,0.322 + 1895,11,1331,1.331 + 1895,12,2428,2.428 + 1896,1,7156,7.156 + 1896,2,712,0.712 + 1896,3,2982,2.982 + ')> str(x)Classes ?tbl_df?, ?tbl? and 'data.frame': 15 obs. of 4 variables: $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... $ mo : int 1 2 3 4 5 6 7 8 9 10 ... $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... $ in : num 8.24 2.27 2.34 1.01 1.28 ... Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
or use the 'check.names = FALSE':> x <- read.csv(text = '"yr","mo","Data","in"+ 1895,1,8243,8.243 + 1895,2,2265,2.265 + 1895,3,2340,2.34 + 1895,4,1014,1.014 + 1895,5,1281,1.281 + 1895,6,58,0.058 + 1895,7,156,0.156 + 1895,8,140,0.14 + 1895,9,1087,1.087 + 1895,10,322,0.322 + 1895,11,1331,1.331 + 1895,12,2428,2.428 + 1896,1,7156,7.156 + 1896,2,712,0.712 + 1896,3,2982,2.982 + ', check.names = FALSE)> str(x)'data.frame': 15 obs. of 4 variables: $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... $ mo : int 1 2 3 4 5 6 7 8 9 10 ... $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... $ in : num 8.24 2.27 2.34 1.01 1.28 ... Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> On Jun 28, 2017, at 4:30 PM, John <jwd at surewest.net> wrote: > > I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T)If I change one of those other headers to "for", I also see the period-suffix appended, which supports your theory about reserved words being protected. If for some reason this were important to you, hten I'd suggest first looking at the code for make.names which in turn indicates that it's done with a .Internal call, so you'll need to look at the source code for the base-package. -- David.> > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
On 28/06/2017 7:30 PM, John wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."]Yes, "in" is not a valid variable name, because of its syntactic use. You can stop this correction by setting check.names=FALSE in your call to read.csv. This will make it a little tricky to deal with in some situations, e.g. > x <- data.frame(4) > names(x) <- "in" > x in 1 4 > x$in Error: unexpected 'in' in "x$in" but you can work around this problem: x[, "in"] and x$`in` are both fine. Duncan Murdoch