A colleague is sending me quite a few files that have been saved with MS SQL Server 2005. I am using R 2.15.1 on Windows 7. I am trying to read in the files using standard techniques. Although the file has a csv extension when I go to Excel or WordPad and do SAVE AS I see that it is Unicode Text. Notepad indicates that the encoding is Unicode. Right now I have to do a few things from within Excel (such as Text to Columns) and eventually save as a true csv file before I can read it into R and then use it. Is there an easy way to solve this from within R? I am also open to easy SQL Server 2005 solutions. I tried the following from within R. testDF = read.table("Info06.csv", header = TRUE, sep = ",")> testDF2 = iconv(x = testDF, from = "Unicode", to = "")Error in iconv(x = testDF, from = "Unicode", to = "") : unsupported conversion from 'Unicode' to '' in codepage 1252 # The next line did not produce an error message> testDF3 = iconv(x = testDF, from = "UTF-8" , to = "")> testDF3[1:6, 1:3]Error in testDF3[1:6, 1:3] : incorrect number of dimensions # The next line did not produce an error message> testDF4 = iconv(x = testDF, from = "macroman" , to = "")> testDF4[1:6, 1:3]Error in testDF4[1:6, 1:3] : incorrect number of dimensions> Encoding(testDF3)[1] "unknown"> Encoding(testDF4)[1] "unknown" This is the first few lines from WordPad Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2 2006-01-03 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834 2006-01-03 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929 [[alternative HTML version deleted]]
'Unicode' is a not an encoding. As the help says fileEncoding: character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ?Encoding? section of the help for ?file?, the ?R Data Import/Export Manual? and ?Note?. The first of the cross references explains this. On 09/10/2013 00:02, Ira Sharenow wrote:> A colleague is sending me quite a few files that have been saved with MS > SQL Server 2005. I am using R 2.15.1 on Windows 7.See the posting guide: your R update is overdue as there have been 5 releases since then.> I am trying to read in the files using standard techniques. Although the > file has a csv extension when I go to Excel or WordPad and do SAVE AS I > see that it is Unicode Text. Notepad indicates that the encoding is > Unicode. Right now I have to do a few things from within Excel (such as > Text to Columns) and eventually save as a true csv file before I can > read it into R and then use it. > > Is there an easy way to solve this from within R? I am also open to easy > SQL Server 2005 solutions. > > I tried the following from within R. > > testDF = read.table("Info06.csv", header = TRUE, sep = ",") > >> testDF2 = iconv(x = testDF, from = "Unicode", to = "") > > Error in iconv(x = testDF, from = "Unicode", to = "") : > > unsupported conversion from 'Unicode' to '' in codepage 1252 > > # The next line did not produce an error message > >> testDF3 = iconv(x = testDF, from = "UTF-8" , to = "") > >> testDF3[1:6, 1:3] > > Error in testDF3[1:6, 1:3] : incorrect number of dimensions > > # The next line did not produce an error message > >> testDF4 = iconv(x = testDF, from = "macroman" , to = "") > >> testDF4[1:6, 1:3] > > Error in testDF4[1:6, 1:3] : incorrect number of dimensions > >> Encoding(testDF3) > > [1] "unknown" > >> Encoding(testDF4) > > [1] "unknown" > > This is the first few lines from WordPad > > Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2 > > 2006-01-03 > 00:00:00.000, at Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834 > > 2006-01-03 > 00:00:00.000, at Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Le mardi 08 octobre 2013 ? 16:02 -0700, Ira Sharenow a ?crit :> A colleague is sending me quite a few files that have been saved with MS > SQL Server 2005. I am using R 2.15.1 on Windows 7. > > I am trying to read in the files using standard techniques. Although the > file has a csv extension when I go to Excel or WordPad and do SAVE AS I > see that it is Unicode Text. Notepad indicates that the encoding is > Unicode. Right now I have to do a few things from within Excel (such as > Text to Columns) and eventually save as a true csv file before I can > read it into R and then use it. > > Is there an easy way to solve this from within R? I am also open to easy > SQL Server 2005 solutions. > > I tried the following from within R. > > testDF = read.table("Info06.csv", header = TRUE, sep = ",") > > > testDF2 = iconv(x = testDF, from = "Unicode", to = "") > > Error in iconv(x = testDF, from = "Unicode", to = "") : > > unsupported conversion from 'Unicode' to '' in codepage 1252 > > # The next line did not produce an error message > > > testDF3 = iconv(x = testDF, from = "UTF-8" , to = "") > > > testDF3[1:6, 1:3] > > Error in testDF3[1:6, 1:3] : incorrect number of dimensions > > # The next line did not produce an error message > > > testDF4 = iconv(x = testDF, from = "macroman" , to = "") > > > testDF4[1:6, 1:3] > > Error in testDF4[1:6, 1:3] : incorrect number of dimensions > > > Encoding(testDF3) > > [1] "unknown" > > > Encoding(testDF4) > > [1] "unknown" > > This is the first few lines from WordPad > > Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2 > > 2006-01-03 > 00:00:00.000, at Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834 > > 2006-01-03 > 00:00:00.000, at Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929What's the actual problem? You did not state any. Do you get accentuated characters that are not printed correctly after importing the file? In the two lines above it does not look like there would be any non-ASCII characters in this file, so encoding would not matter. Regards