thr3ads.net - R help - [R] iconv question: SQL Server 2005 to R [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Ira Sharenow

2013-Oct-08 23:02 UTC

[R] iconv question: SQL Server 2005 to R

A colleague is sending me quite a few files that have been saved with MS 
SQL Server 2005. I am using R 2.15.1 on Windows 7.

I am trying to read in the files using standard techniques. Although the 
file has a csv extension when I go to Excel or WordPad and do SAVE AS I 
see that it is Unicode Text. Notepad indicates that the encoding is 
Unicode. Right now I have to do a few things from within Excel (such as 
Text to Columns) and eventually save as a true csv file before I can 
read it into R and then use it.

Is there an easy way to solve this from within R? I am also open to easy 
SQL Server 2005 solutions.

I tried the following from within R.

testDF = read.table("Info06.csv", header = TRUE, sep = ",")
> testDF2 =  iconv(x = testDF, from = "Unicode", to = "")
Error in iconv(x = testDF, from = "Unicode", to = "") :

unsupported conversion from 'Unicode' to '' in codepage 1252

# The next line did not produce an error message
> testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "")
> testDF3[1:6,  1:3]
Error in testDF3[1:6, 1:3] : incorrect number of dimensions

# The next line did not produce an error message
> testDF4 =  iconv(x = testDF, from = "macroman" , to =
"")
> testDF4[1:6,  1:3]
Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>  Encoding(testDF3)
[1] "unknown"
>  Encoding(testDF4)
[1] "unknown"

This is the first few lines from WordPad

Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2

2006-01-03 
00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834

2006-01-03 
00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929


	[[alternative HTML version deleted]]

Prof Brian Ripley

2013-Oct-09 09:16 UTC

head link

[R] iconv question: SQL Server 2005 to R

'Unicode' is a not an encoding.  As the help says

fileEncoding: character string: if non-empty declares the encoding used
           on a file (not a connection) so the character data can be
           re-encoded.  See the ?Encoding? section of the help for
           ?file?, the ?R Data Import/Export Manual? and ?Note?.

The first of the cross references explains this.

On 09/10/2013 00:02, Ira Sharenow wrote:> A colleague is sending me quite a few files that have been saved with MS
> SQL Server 2005. I am using R 2.15.1 on Windows 7.
See the posting guide: your R update is overdue as there have been 5 
releases since then.
> I am trying to read in the files using standard techniques. Although the
> file has a csv extension when I go to Excel or WordPad and do SAVE AS I
> see that it is Unicode Text. Notepad indicates that the encoding is
> Unicode. Right now I have to do a few things from within Excel (such as
> Text to Columns) and eventually save as a true csv file before I can
> read it into R and then use it.
>
> Is there an easy way to solve this from within R? I am also open to easy
> SQL Server 2005 solutions.
>
> I tried the following from within R.
>
> testDF = read.table("Info06.csv", header = TRUE, sep =
",")
>
>> testDF2 =  iconv(x = testDF, from = "Unicode", to =
"")
>
> Error in iconv(x = testDF, from = "Unicode", to = "") :
>
> unsupported conversion from 'Unicode' to '' in codepage
1252
>
> # The next line did not produce an error message
>
>> testDF3 =  iconv(x = testDF, from = "UTF-8" , to =
"")
>
>> testDF3[1:6,  1:3]
>
> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
>
> # The next line did not produce an error message
>
>> testDF4 =  iconv(x = testDF, from = "macroman" , to =
"")
>
>> testDF4[1:6,  1:3]
>
> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>
>>   Encoding(testDF3)
>
> [1] "unknown"
>
>>   Encoding(testDF4)
>
> [1] "unknown"
>
> This is the first few lines from WordPad
>
> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
>
> 2006-01-03
> 00:00:00.000, at
Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
>
> 2006-01-03
> 00:00:00.000, at
Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Milan Bouchet-Valat

2013-Oct-09 09:37 UTC

head link

[R] iconv question: SQL Server 2005 to R

Le mardi 08 octobre 2013 ? 16:02 -0700, Ira Sharenow a ?crit
:> A colleague is sending me quite a few files that have been saved with MS 
> SQL Server 2005. I am using R 2.15.1 on Windows 7.
> 
> I am trying to read in the files using standard techniques. Although the 
> file has a csv extension when I go to Excel or WordPad and do SAVE AS I 
> see that it is Unicode Text. Notepad indicates that the encoding is 
> Unicode. Right now I have to do a few things from within Excel (such as 
> Text to Columns) and eventually save as a true csv file before I can 
> read it into R and then use it.
> 
> Is there an easy way to solve this from within R? I am also open to easy 
> SQL Server 2005 solutions.
>
> I tried the following from within R.
> 
> testDF = read.table("Info06.csv", header = TRUE, sep =
",")
> 
> > testDF2 =  iconv(x = testDF, from = "Unicode", to =
"")
> 
> Error in iconv(x = testDF, from = "Unicode", to = "") :
> 
> unsupported conversion from 'Unicode' to '' in codepage
1252
> 
> # The next line did not produce an error message
> 
> > testDF3 =  iconv(x = testDF, from = "UTF-8" , to =
"")
> 
> > testDF3[1:6,  1:3]
> 
> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
> 
> # The next line did not produce an error message
> 
> > testDF4 =  iconv(x = testDF, from = "macroman" , to =
"")
> 
> > testDF4[1:6,  1:3]
> 
> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
> 
> >  Encoding(testDF3)
> 
> [1] "unknown"
> 
> >  Encoding(testDF4)
> 
> [1] "unknown"
> 
> This is the first few lines from WordPad
> 
> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
> 
> 2006-01-03 
> 00:00:00.000, at
Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
> 
> 2006-01-03 
> 00:00:00.000, at
Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929What's the actual problem? You did not state any. Do you get accentuated
characters that are not printed correctly after importing the file? In
the two lines above it does not look like there would be any non-ASCII
characters in this file, so encoding would not matter.


Regards

R help - Oct 2013 - iconv question: SQL Server 2005 to R

[R] iconv question: SQL Server 2005 to R

[R] iconv question: SQL Server 2005 to R

[R] iconv question: SQL Server 2005 to R