I ran into a puzzling minor behaviour I would like to understand. Reading in a csv file, I find an extraneous "." after a column header, "in" [short for "inches"] thus, "in.". Is this due to "in" being reserved? I initially blamed this on RStudio or to processing the data through LibreCalc. However, the same result occurs in a console R session. Sending the file to the console via less reveals no strange characters in the first line. The data is California statewide rainfall which was screen captured from the Western Regional Climate Center web site. First 15 lines including header line: "yr","mo","Data","in" 1895,1,8243,8.243 1895,2,2265,2.265 1895,3,2340,2.34 1895,4,1014,1.014 1895,5,1281,1.281 1895,6,58,0.058 1895,7,156,0.156 1895,8,140,0.14 1895,9,1087,1.087 1895,10,322,0.322 1895,11,1331,1.331 1895,12,2428,2.428 1896,1,7156,7.156 1896,2,712,0.712 1896,3,2982,2.982 File read in as follows: x <- read.csv('DRI-mo-prp.csv', header = T) Structure: str(x) 'data.frame': 1469 obs. of 4 variables: $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... $ mo : int 1 2 3 4 5 6 7 8 9 10 ... $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... $ in. : num 8.24 2.27 2.34 1.01 1.28 ... [note "in" is now "in."]
try the 'read_csv' function in the 'readr' package:> x <- readr::read_csv('"yr","mo","Data","in"+ 1895,1,8243,8.243 + 1895,2,2265,2.265 + 1895,3,2340,2.34 + 1895,4,1014,1.014 + 1895,5,1281,1.281 + 1895,6,58,0.058 + 1895,7,156,0.156 + 1895,8,140,0.14 + 1895,9,1087,1.087 + 1895,10,322,0.322 + 1895,11,1331,1.331 + 1895,12,2428,2.428 + 1896,1,7156,7.156 + 1896,2,712,0.712 + 1896,3,2982,2.982 + ')> str(x)Classes ?tbl_df?, ?tbl? and 'data.frame': 15 obs. of 4 variables: $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... $ mo : int 1 2 3 4 5 6 7 8 9 10 ... $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... $ in : num 8.24 2.27 2.34 1.01 1.28 ... Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
or use the 'check.names = FALSE':> x <- read.csv(text = '"yr","mo","Data","in"+ 1895,1,8243,8.243 + 1895,2,2265,2.265 + 1895,3,2340,2.34 + 1895,4,1014,1.014 + 1895,5,1281,1.281 + 1895,6,58,0.058 + 1895,7,156,0.156 + 1895,8,140,0.14 + 1895,9,1087,1.087 + 1895,10,322,0.322 + 1895,11,1331,1.331 + 1895,12,2428,2.428 + 1896,1,7156,7.156 + 1896,2,712,0.712 + 1896,3,2982,2.982 + ', check.names = FALSE)> str(x)'data.frame': 15 obs. of 4 variables: $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... $ mo : int 1 2 3 4 5 6 7 8 9 10 ... $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... $ in : num 8.24 2.27 2.34 1.01 1.28 ... Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> On Jun 28, 2017, at 4:30 PM, John <jwd at surewest.net> wrote: > > I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T)If I change one of those other headers to "for", I also see the period-suffix appended, which supports your theory about reserved words being protected. If for some reason this were important to you, hten I'd suggest first looking at the code for make.names which in turn indicates that it's done with a .Internal call, so you'll need to look at the source code for the base-package. -- David.> > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
On 28/06/2017 7:30 PM, John wrote:> I ran into a puzzling minor behaviour I would like to understand. > Reading in a csv file, I find an extraneous "." after a column header, > "in" [short for "inches"] thus, "in.". Is this due to "in" being > reserved? I initially blamed this on RStudio or to processing the data > through LibreCalc. However, the same result occurs in a console R > session. Sending the file to the console via less reveals no strange > characters in the first line. The data is California statewide > rainfall which was screen captured from the Western Regional Climate > Center web site. > > First 15 lines including header line: > > "yr","mo","Data","in" > 1895,1,8243,8.243 > 1895,2,2265,2.265 > 1895,3,2340,2.34 > 1895,4,1014,1.014 > 1895,5,1281,1.281 > 1895,6,58,0.058 > 1895,7,156,0.156 > 1895,8,140,0.14 > 1895,9,1087,1.087 > 1895,10,322,0.322 > 1895,11,1331,1.331 > 1895,12,2428,2.428 > 1896,1,7156,7.156 > 1896,2,712,0.712 > 1896,3,2982,2.982 > > File read in as follows: > > x <- read.csv('DRI-mo-prp.csv', header = T) > > Structure: > > str(x) > 'data.frame': 1469 obs. of 4 variables: > $ yr : int 1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ... > $ mo : int 1 2 3 4 5 6 7 8 9 10 ... > $ Data: int 8243 2265 2340 1014 1281 58 156 140 1087 322 ... > $ in. : num 8.24 2.27 2.34 1.01 1.28 ... > [note "in" is now "in."]Yes, "in" is not a valid variable name, because of its syntactic use. You can stop this correction by setting check.names=FALSE in your call to read.csv. This will make it a little tricky to deal with in some situations, e.g. > x <- data.frame(4) > names(x) <- "in" > x in 1 4 > x$in Error: unexpected 'in' in "x$in" but you can work around this problem: x[, "in"] and x$`in` are both fine. Duncan Murdoch