bgreen at dyson.brisnet.org.au
2011-Jan-14 02:44 UTC
[R] CSV value not being read as it appears
I have a frustrating issue which I am hoping someone may have a suggestion about. I am running XP and R 2.12.0 and saved an EXCEL file that I was sent as a csv file. The initial code I ran follows. dec <- read.csv("g://FMH/FO30122010.csv",header=T) dec.open <- subset (dec, Status == "Open") table(dec.open$AMHS) I was checking the output and noticed a difference between my manual count and R output. Two subject's rows were not being detected by the subset command: For the AMHS where there was a discrepancy I then ran: wm <- subset (dec, AMHS == "WM") The problem appears to be that there is a space before the 'Open" value for two indivduals, as per the example below. 10/02/2010 Open 22/08/2007 Open Checking in EXCEL there does not appear to be a space and the format is the same (e.g 'general'). I resolved the problem by copying over the values for the two individuals where I identified a problem. Given this problem was not detected by visual scanning I would appreciate advice on how this problem can be detected in future without my having to manually check raw data against R output. Any assistance is appreciated, Bob
try strip.white=TRUE to strip out white space Sent from my iPad On Jan 13, 2011, at 21:44, bgreen at dyson.brisnet.org.au wrote:> > I have a frustrating issue which I am hoping someone may have a suggestion > about. > > I am running XP and R 2.12.0 and saved an EXCEL file that I was sent as a > csv file. > > The initial code I ran follows. > > dec <- read.csv("g://FMH/FO30122010.csv",header=T) > dec.open <- subset (dec, Status == "Open") > table(dec.open$AMHS) > > I was checking the output and noticed a difference between my manual count > and R output. Two subject's rows were not being detected by the subset > command: > > For the AMHS where there was a discrepancy I then ran: > wm <- subset (dec, AMHS == "WM") > > The problem appears to be that there is a space before the 'Open" value > for two indivduals, as per the example below. > > 10/02/2010 Open > 22/08/2007 Open > > Checking in EXCEL there does not appear to be a space and the format is > the same (e.g 'general'). I resolved the problem by copying over the > values for the two individuals where I identified a problem. > > Given this problem was not detected by visual scanning I would appreciate > advice on how this problem can be detected in future without my having to > manually check raw data against R output. > > Any assistance is appreciated, > > Bob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.