TAYLOR, Benjamin (BLACKPOOL TEACHING HOSPITALS NHS FOUNDATION TRUST)
2021-Feb-25 09:11 UTC
[Rd] read.csv, worrying behaviour?
Dear all I've been using R for around 16 years now and I've only just become aware of a behaviour of read.csv that I find worrying which is why I'm contacting this list. A simplified example of the behaviour is as follows I created a "test.csv" file containing the following lines: a,b,c,d,e,f,g 1,2,3,4 And then read it into R using:> d = read.csv("test.csv") > da b c d e f g 1 1 2 3 4 NA NA NA I was surprised that this did not issue a warning. I can understand why the following csv would not issue a warning: a,b,c,d,e,f,g 1,2,3,4,,, But the missing commas in the first example? Thoughts from others would be welcome. Kind regards Ben ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Benjamin M. Taylor, MSci, MSc, PhD Lead Data Scientist Blackpool Teaching Hospitals NHS Foundation Trust Home 15 Whinney Heys Road Blackpool FY3 8NR Scholar: https://scholar.google.co.uk/citations?user=6Hf0CJkAAAAJ&hl=en Github: https://github.com/bentaylor1 Gitlab: https://gitlab.com/ben_taylor ORCID: http://orcid.org/0000-0001-8667-4089 ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:19}}
I believe this is documented behavior. The 'read.csv' function is a front-end to 'read.table' with different default values. IN this particular case, read.csv sets fill = TRUE, which means that it is supposed to fill incomplete lines with NA's. It also sets header=TRUE, which is presumably what it is using to determine the expected length of a line-row. ? -- Kevin On 2/25/2021 4:11 AM, TAYLOR, Benjamin (BLACKPOOL TEACHING HOSPITALS NHS FOUNDATION TRUST) via R-devel wrote:> Dear all > > I've been using R for around 16 years now and I've only just become aware of a behaviour of read.csv that I find worrying which is why I'm contacting this list. A simplified example of the behaviour is as follows > > I created a "test.csv" file containing the following lines: > > a,b,c,d,e,f,g > 1,2,3,4 > > And then read it into R using: > >> d = read.csv("test.csv") >> d > a b c d e f g > 1 1 2 3 4 NA NA NA > > I was surprised that this did not issue a warning. I can understand why the following csv would not issue a warning: > > a,b,c,d,e,f,g > 1,2,3,4,,, > > But the missing commas in the first example? Thoughts from others would be welcome. > > Kind regards > > Ben > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Benjamin M. Taylor, MSci, MSc, PhD > Lead Data Scientist > Blackpool Teaching Hospitals NHS Foundation Trust > Home 15 > Whinney Heys Road > Blackpool > FY3 8NR > > Scholar: https://scholar.google.co.uk/citations?user=6Hf0CJkAAAAJ&hl=en > Github: https://github.com/bentaylor1 > Gitlab: https://gitlab.com/ben_taylor > ORCID: http://orcid.org/0000-0001-8667-4089 > > > > ******************************************************************************************************************** > > This message may contain confidential information. If ...{{dropped:6}}