I am having trouble reading this CSV file in R. There are six attributes that I need to read - CVar1, CVar2, Location, Year, Nvar3, Nvar4. Can somebody help in reading this file? On line 10 it has city and state separated by comma. I had been a user of SAS where I can use different format to read in for this line. Can I do this in R too?
What do you mean by Mixed? If a field has a comma, then it is supposed be to enclosed in quotes. You could preprocess the file looking for cases where there are more fields than there there are supposed to be, and if they are always in the same place, you could enclose them in quotes and then reprocess. You would really have to show what the file looks like for the different "mixed" cases to get a good answer to your question. And of course, R can do it, if we knew what it was we are supposed to do. So at least provide commented, minimal, self-contained, reproducible code and data. On Fri, Mar 16, 2012 at 7:03 AM, Ashish Agarwal <ashish.agarwala at gmail.com> wrote:> I am having trouble reading this CSV file in R. There are six attributes > that I need to read ?- CVar1, CVar2, Location, Year, Nvar3, Nvar4. Can > somebody help in reading this file? > On line 10 it has city and state separated by comma. I had been a user of > SAS where I can use different format to read in for this line. Can I do > this in R too? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
I want to import this CSV file into R. The CSV file is ,,,1968,21,0 ,,Boston,1968,13,0 ,,Boston,1968,18,0 ,,Chicago,1967,44,0 ,,Providence,1968,17,0 ,,Providence,1969,48,0 ,,Binky,1968,24,0 ,,Chicago,1968,23,0 ,,Dally,1968,7,0 ,,Raleigh, North Carol,1968,25,0 Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 DEF_REQPRF/,,Dartmouth,1967,31,1 PL,,,1967,38,1 XY,PopatLal,,1967,5,1 XY,PopatLal,,1967,6,8 XY,PopatLal,,1967,7,7 XY,PopatLal,,1967,9,1 XY,PopatLal,,1967,10,1 XY,PopatLal,,1967,13,1 XY,PopatLal,Boston,1967,6,1 XY,PopatLal,Boston,1967,7,11 XY,PopatLal,Boston,1967,9,2 XY,PopatLal,Boston,1967,10,3 XY,PopatLal,Boston,1967,7,2 I tried using scan and read.table but results are not visible :(> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->xRead 51 records> x[[1]] ?[1] "??" ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? "" [16] ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? "" [31] ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? ""?? "" [46] ""?? ""?? ""?? ""?? ""?? "" ....> read.table("D:/data/temp.csv",header=F,sep=",") ->x > xV1 V2 1 ?? NA 2 NA 3 NA 4 NA Can somebody please help in importing this CSV file?
On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote:> I want to import this CSV file into R. > > The CSV file is > > ,,,1968,21,0 > ,,Boston,1968,13,0 > ,,Boston,1968,18,0 > ,,Chicago,1967,44,0 > ,,Providence,1968,17,0 > ,,Providence,1969,48,0 > ,,Binky,1968,24,0 > ,,Chicago,1968,23,0 > ,,Dally,1968,7,0 > ,,Raleigh, North Carol,1968,25,0 > Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 > DEF_REQPRF/,,Dartmouth,1967,31,1 > PL,,,1967,38,1 > XY,PopatLal,,1967,5,1 > XY,PopatLal,,1967,6,8 > XY,PopatLal,,1967,7,7 > XY,PopatLal,,1967,9,1 > XY,PopatLal,,1967,10,1 > XY,PopatLal,,1967,13,1 > XY,PopatLal,Boston,1967,6,1 > XY,PopatLal,Boston,1967,7,11 > XY,PopatLal,Boston,1967,9,2 > XY,PopatLal,Boston,1967,10,3 > XY,PopatLal,Boston,1967,7,2 > > I tried using scan and read.table but results are not visible :( > >> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x > Read 51 records >> x > [[1]] > [1] "??" "" "" "" "" "" "" "" "" "" "" "" > "" "" "" > [16] "" "" "" "" "" "" "" "" "" "" "" "" > "" "" "" > [31] "" "" "" "" "" "" "" "" "" "" "" "" > "" "" "" > [46] "" "" "" "" "" "" > .... > >> read.table("D:/data/temp.csv",header=F,sep=",") ->x >> x > V1 V2 > 1 ?? NA > 2 NA > 3 NA > 4 NA > > Can somebody please help in importing this CSV file?Looks like an encoding mismatch. You have not offered the requested information about you setup so further comment would all be guesswork. But you can perhaps educate yourself by reading: ?Encoding And line ten has 7 elements. > count.fields(textConnection(",,,1968,21,0 + ,,Boston,1968,13,0 + ,,Boston,1968,18,0 + ,,Chicago,1967,44,0 + ,,Providence,1968,17,0 + ,,Providence,1969,48,0 + ,,Binky,1968,24,0 + ,,Chicago,1968,23,0 + ,,Dally,1968,7,0 + ,,Raleigh, North Carol,1968,25,0 + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 + DEF_REQPRF/,,Dartmouth,1967,31,1 + PL,,,1967,38,1 + XY,PopatLal,,1967,5,1 + XY,PopatLal,,1967,6,8 + XY,PopatLal,,1967,7,7 + XY,PopatLal,,1967,9,1 + XY,PopatLal,,1967,10,1 + XY,PopatLal,,1967,13,1 + XY,PopatLal,Boston,1967,6,1 + XY,PopatLal,Boston,1967,7,11 + XY,PopatLal,Boston,1967,9,2 + XY,PopatLal,Boston,1967,10,3 + XY,PopatLal,Boston,1967,7,2"),sep=",") [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote: > >> I want to import this CSV file into R. >> >> The CSV file is >> >> ,,,1968,21,0 >> ,,Boston,1968,13,0 >> ,,Boston,1968,18,0 >> ,,Chicago,1967,44,0 >> ,,Providence,1968,17,0 >> ,,Providence,1969,48,0 >> ,,Binky,1968,24,0 >> ,,Chicago,1968,23,0 >> ,,Dally,1968,7,0 >> ,,Raleigh, North Carol,1968,25,0 >> Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 >> DEF_REQPRF/,,Dartmouth,1967,31,1 >> PL,,,1967,38,1 >> XY,PopatLal,,1967,5,1 >> XY,PopatLal,,1967,6,8 >> XY,PopatLal,,1967,7,7 >> XY,PopatLal,,1967,9,1 >> XY,PopatLal,,1967,10,1 >> XY,PopatLal,,1967,13,1 >> XY,PopatLal,Boston,1967,6,1 >> XY,PopatLal,Boston,1967,7,11 >> XY,PopatLal,Boston,1967,9,2 >> XY,PopatLal,Boston,1967,10,3 >> XY,PopatLal,Boston,1967,7,2 >> >> I tried using scan and read.table but results are not visible :( >> >>> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x >> >> Read 51 records >>> >>> x >> >> [[1]] >> ?[1] "??" "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" >> "" >> [16] "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" >> "" >> [31] "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" ? "" >> "" >> [46] "" ? "" ? "" ? "" ? "" ? "" >> .... >> >>> read.table("D:/data/temp.csv",header=F,sep=",") ->x >>> x >> >> ? V1 V2 >> 1 ? ?? NA >> 2 ? ? ?NA >> 3 ? ? ?NA >> 4 ? ? ?NA >> >> Can somebody please help in importing this CSV file? > > > Looks like an encoding mismatch. You have not offered the requested > information about you setup so further comment would all be guesswork. But > you can perhaps educate yourself by reading: > > ?Encoding > > And line ten has 7 elements. > >> count.fields(textConnection(",,,1968,21,0 > + ,,Boston,1968,13,0 > + ,,Boston,1968,18,0 > + ,,Chicago,1967,44,0 > + ,,Providence,1968,17,0 > + ,,Providence,1969,48,0 > + ,,Binky,1968,24,0 > + ,,Chicago,1968,23,0 > + ,,Dally,1968,7,0 > + ,,Raleigh, North Carol,1968,25,0 > + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 > + DEF_REQPRF/,,Dartmouth,1967,31,1 > + PL,,,1967,38,1 > + XY,PopatLal,,1967,5,1 > + XY,PopatLal,,1967,6,8 > + XY,PopatLal,,1967,7,7 > + XY,PopatLal,,1967,9,1 > + XY,PopatLal,,1967,10,1 > + XY,PopatLal,,1967,13,1 > + XY,PopatLal,Boston,1967,6,1 > + XY,PopatLal,Boston,1967,7,11 > + XY,PopatLal,Boston,1967,9,2 > + XY,PopatLal,Boston,1967,10,3 > + XY,PopatLal,Boston,1967,7,2"),sep=",") > ?[1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 > > >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > David Winsemius, MD > West Hartford, CT >
On 2012-03-16 10:48, Ashish Agarwal wrote:> Line 10 has City and State that too separated by comma. For line 10 > how can I read differently as compared to the other lines?Edit the file and put quotes around the city-state combination: "Raleigh, North Carol" Also: always run count.fields() on your files before importing. Peter Ehlers> > On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius > <dwinsemius at comcast.net> wrote: >> >> On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote: >> >>> I want to import this CSV file into R. >>> >>> The CSV file is >>> >>> ,,,1968,21,0 >>> ,,Boston,1968,13,0 >>> ,,Boston,1968,18,0 >>> ,,Chicago,1967,44,0 >>> ,,Providence,1968,17,0 >>> ,,Providence,1969,48,0 >>> ,,Binky,1968,24,0 >>> ,,Chicago,1968,23,0 >>> ,,Dally,1968,7,0 >>> ,,Raleigh, North Carol,1968,25,0 >>> Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 >>> DEF_REQPRF/,,Dartmouth,1967,31,1 >>> PL,,,1967,38,1 >>> XY,PopatLal,,1967,5,1 >>> XY,PopatLal,,1967,6,8 >>> XY,PopatLal,,1967,7,7 >>> XY,PopatLal,,1967,9,1 >>> XY,PopatLal,,1967,10,1 >>> XY,PopatLal,,1967,13,1 >>> XY,PopatLal,Boston,1967,6,1 >>> XY,PopatLal,Boston,1967,7,11 >>> XY,PopatLal,Boston,1967,9,2 >>> XY,PopatLal,Boston,1967,10,3 >>> XY,PopatLal,Boston,1967,7,2 >>> >>> I tried using scan and read.table but results are not visible :( >>> >>>> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x >>> >>> Read 51 records >>>> >>>> x >>> >>> [[1]] >>> [1] "??" "" "" "" "" "" "" "" "" "" "" "" "" "" >>> "" >>> [16] "" "" "" "" "" "" "" "" "" "" "" "" "" "" >>> "" >>> [31] "" "" "" "" "" "" "" "" "" "" "" "" "" "" >>> "" >>> [46] "" "" "" "" "" "" >>> .... >>> >>>> read.table("D:/data/temp.csv",header=F,sep=",") ->x >>>> x >>> >>> V1 V2 >>> 1 ?? NA >>> 2 NA >>> 3 NA >>> 4 NA >>> >>> Can somebody please help in importing this CSV file? >> >> >> Looks like an encoding mismatch. You have not offered the requested >> information about you setup so further comment would all be guesswork. But >> you can perhaps educate yourself by reading: >> >> ?Encoding >> >> And line ten has 7 elements. >> >>> count.fields(textConnection(",,,1968,21,0 >> + ,,Boston,1968,13,0 >> + ,,Boston,1968,18,0 >> + ,,Chicago,1967,44,0 >> + ,,Providence,1968,17,0 >> + ,,Providence,1969,48,0 >> + ,,Binky,1968,24,0 >> + ,,Chicago,1968,23,0 >> + ,,Dally,1968,7,0 >> + ,,Raleigh, North Carol,1968,25,0 >> + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 >> + DEF_REQPRF/,,Dartmouth,1967,31,1 >> + PL,,,1967,38,1 >> + XY,PopatLal,,1967,5,1 >> + XY,PopatLal,,1967,6,8 >> + XY,PopatLal,,1967,7,7 >> + XY,PopatLal,,1967,9,1 >> + XY,PopatLal,,1967,10,1 >> + XY,PopatLal,,1967,13,1 >> + XY,PopatLal,Boston,1967,6,1 >> + XY,PopatLal,Boston,1967,7,11 >> + XY,PopatLal,Boston,1967,9,2 >> + XY,PopatLal,Boston,1967,10,3 >> + XY,PopatLal,Boston,1967,7,2"),sep=",") >> [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 >> >> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> David Winsemius, MD >> West Hartford, CT >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I have a file that is 5000 records and to edit that file is not easy. Is there any way to line 10 differently to account for changes in the third field? On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:> On 2012-03-16 10:48, Ashish Agarwal wrote: >> >> Line 10 has City and State that too separated by comma. For line 10 >> how can I read differently as compared to the other lines? > > > Edit the file and put quotes around the city-state combination: > ?"Raleigh, North Carol" >
On 26-03-2012, at 08:26, Berend Hasselman wrote:> > On 26-03-2012, at 08:16, Ashish Agarwal wrote: > >> Why does the output in the following say 2 and not 6? >> >>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 >> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 >> + LL1532Ap,ABC# Depot-A+,China,1971,17,1 >> + LL1532Ap,ABC# Depot-A+,China,1971,33,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1 >> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 >> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 >> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 >> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",") >> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >>> >> > > Have you done > > ?count.fields > > and read what it says about the argument "sep" and the default? > > So if a comma is the separator what value would you give sep?Sorry I should have had a closer look at what you had done. But still ?count.fields should have given you a pointer. Look at what it says in the entry for argument "comment.char". You have a character # in your text. Set comment.char to something other than # . Berend