Donald Braman
2010-Oct-26 14:33 UTC
[R] stripping #s in a text file prior to reading into table or dataframe
I'm importing a lot of text tables of data (from Latent Gold) that includes hashes in some of the column names ("Cluster#1", "Cluster#2", etc.). Is there an easy way to strip the offending hashes out before pushing the text into a table or data frame? I thought I'd use gsub, e.g., but can't figure out how to read in a text file without reading it into a table or data frame (which would be ill structured, given the hashes). I could do it in another scripting language or shell script, but would like to try to do it in R. [[alternative HTML version deleted]]
Duncan Murdoch
2010-Oct-26 14:49 UTC
[R] stripping #s in a text file prior to reading into table or dataframe
On 26/10/2010 10:33 AM, Donald Braman wrote:> I'm importing a lot of text tables of data (from Latent Gold) that includes > hashes in some of the column names ("Cluster#1", "Cluster#2", etc.). Is > there an easy way to strip the offending hashes out before pushing the text > into a table or data frame? I thought I'd use gsub, e.g., but can't figure > out how to read in a text file without reading it into a table or data frame > (which would be ill structured, given the hashes). I could do it in another > scripting language or shell script, but would like to try to do it in R.readLines() will read it, but you may not need to do that. Set comment.char="" to turn off the special meaning of # in read.table() and related functions. Duncan
Donald Braman
2010-Oct-27 13:51 UTC
[R] stripping #s in a text file prior to reading into table or dataframe
Thanks for your advice! I still get the same error, though -- not sure why.> read.table('don.5.clusters.txt', header = TRUE, comment.char = '', quote='') Error in read.table("don.5.clusters.txt", header = TRUE, comment.char = "", : more columns than column names Any other thoughts? -- Donald Braman http://ssrn.com/author=286206 http://www.culturalcognition.net/braman/ http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 Henrique Dallazuanna Tue, 26 Oct 2010 09:11:33 -0700 Try this: read.table('don.5.clusters.txt', header = TRUE, comment.char = '', quote '') On Tue, Oct 26, 2010 at 1:15 PM, Donald Braman <dbra...@law.gwu.edu> wrote:> That's one of the things I tried, but which didn't work. I get the > following > error when I do that: > > Error in read.table(file = "don.5.clusters.txt", header = TRUE, > comment.char > = "", : > more columns than column names > > If I remove the hashes by other means, I don't get that error. > > > On Tue, Oct 26, 2010 at 10:49 AM, Duncan Murdoch > <murdoch.dun...@gmail.com>wrote: > > > On 26/10/2010 10:33 AM, Donald Braman wrote: > > > >> I'm importing a lot of text tables of data (from Latent Gold) that > >> includes > >> hashes in some of the column names ("Cluster#1", "Cluster#2", etc.). Is > >> there an easy way to strip the offending hashes out before pushing the > >> text > >> into a table or data frame? I thought I'd use gsub, e.g., but can't > >> figure > >> out how to read in a text file without reading it into a table or data > >> frame > >> (which would be ill structured, given the hashes). I could do it in > >> another > >> scripting language or shell script, but would like to try to do it in R. > >> > > > > readLines() will read it, but you may not need to do that. Set > > comment.char="" to turn off the special meaning of # in read.table() and > > related functions. > > > > Duncan > > > > > > -- > Donald Braman > phone: 971-645-0607 > http://www.culturalcognition.net/braman/ > http://ssrn.com/author=286206 > http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] ______________________________________________R-help@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Donald Braman
2010-Oct-27 14:32 UTC
[R] stripping #s in a text file prior to reading into table or dataframe
read.delim2 did the trick -- many thanks!!! On Wed, Oct 27, 2010 at 10:01 AM, Jorge Ivan Velez <jorgeivanvelez@gmail.com> wrote:> ?read.delim2 > > HTH, > Jorge > > > On Wed, Oct 27, 2010 at 9:51 AM, Donald Braman <dbraman@law.gwu.edu>wrote: > >> Thanks for your advice! I still get the same error, though -- not sure >> why. >> >> >> > read.table('don.5.clusters.txt', header = TRUE, comment.char = '', quote >> ='') >> >> Error in read.table("don.5.clusters.txt", header = TRUE, comment.char >> "", >> : >> >> more columns than column names >> >> >> Any other thoughts? >> >> -- >> Donald Braman >> http://ssrn.com/author=286206 >> http://www.culturalcognition.net/braman/ >> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 >> >> Henrique Dallazuanna >> Tue, 26 Oct 2010 09:11:33 -0700 >> >> Try this: >> >> read.table('don.5.clusters.txt', header = TRUE, comment.char = '', quote >> '') >> >> On Tue, Oct 26, 2010 at 1:15 PM, Donald Braman <dbra...@law.gwu.edu> >> wrote: >> >> > That's one of the things I tried, but which didn't work. I get the >> > following >> > error when I do that: >> > >> > Error in read.table(file = "don.5.clusters.txt", header = TRUE, >> > comment.char >> > = "", : >> > more columns than column names >> > >> > If I remove the hashes by other means, I don't get that error. >> > >> > >> > On Tue, Oct 26, 2010 at 10:49 AM, Duncan Murdoch >> > <murdoch.dun...@gmail.com>wrote: >> > >> > > On 26/10/2010 10:33 AM, Donald Braman wrote: >> > > >> > >> I'm importing a lot of text tables of data (from Latent Gold) that >> > >> includes >> > >> hashes in some of the column names ("Cluster#1", "Cluster#2", etc.). >> Is >> > >> there an easy way to strip the offending hashes out before pushing >> the >> > >> text >> > >> into a table or data frame? I thought I'd use gsub, e.g., but can't >> > >> figure >> > >> out how to read in a text file without reading it into a table or >> data >> > >> frame >> > >> (which would be ill structured, given the hashes). I could do it in >> > >> another >> > >> scripting language or shell script, but would like to try to do it in >> R. >> > >> >> > > >> > > readLines() will read it, but you may not need to do that. Set >> > > comment.char="" to turn off the special meaning of # in read.table() >> and >> > > related functions. >> > > >> > > Duncan >> > > >> > >> > >> > >> > -- >> > Donald Braman >> > phone: 971-645-0607 >> > http://www.culturalcognition.net/braman/ >> > http://ssrn.com/author=286206 >> > http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html < >> http://www.r-project.org/posting-guide.html> >> >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Henrique Dallazuanna >> Curitiba-Paraná-Brasil >> 25° 25' 40" S 49° 16' 22" O >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________R-help@r-project.org >> mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >-- Donald Braman phone: 971-645-0607 http://www.culturalcognition.net/braman/ http://ssrn.com/author=286206 http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 [[alternative HTML version deleted]]